Skip to content

[QST] Does splitK cause computation errors on shapes with very large k?  #876

@umiswing

Description

@umiswing

Hello! I'm using splitK in kGemmSplitKParallel mode with examples/36. Some shapes have a very large k (e.g. 5x1e4). I set the spltk_slices to max(min(256, k/128), 1). I find the kernels with splitK give an error output. I have some questions:

  1. Should I use splitK on shapes with very large k?
  2. Do I need to call cudaDeviceSynchronize() after the splitK kernel?
  3. How can I use streamK in examples/36?

hardware: a100
cuda version: cuda 11.7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions