Skip to content

[QST] Using tile sizes in generator.py can't pass compilation? #617

@umiswing

Description

@umiswing

Hi! I'm trying to change the tile size in examples/36_gather_scatter_fusion to find the best performance. The A, B, C and D in my code is all fp32. According to #612 , I read generator.py and find GenerateSM80_TensorOp_1688(manifest, cuda_version). I use the tile size in its TileDescription list, but get compilation error like this:

/home/me/cutlass/include/cutlass/transform/pitch_linear_thread_map.h(295): error: static assertion failed with "Number of iterations must be non-zero"

The tile settings in my source code:

// This code section describes the tile size a thread block will compute
using ShapeMMAThreadBlock =
    cutlass::gemm::GemmShape<256, 128, 16>;  
// This code section describes tile size a warp will compute
using ShapeMMAWarp = cutlass::gemm::GemmShape<4, 2, 1>;  
// This code section describes the size of MMA op
using ShapeMMAOp = cutlass::gemm::GemmShape<16, 8, 8>;  
...
// Number of pipelines you want to use
constexpr int NumStages = 3;

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions