[QST] Using tile sizes in generator.py can't pass compilation?

Hi! I'm trying to change the tile size in [examples/36_gather_scatter_fusion](https://github.com/NVIDIA/cutlass/tree/master/examples/36_gather_scatter_fusion) to find the best performance. The ```A```, ```B```, ```C``` and ```D``` in my code is all fp32. According to #612 , I read [generator.py](https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py) and find [```GenerateSM80_TensorOp_1688(manifest, cuda_version)```](https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L2314-L2385). I use the tile size in its ```TileDescription``` list, but get compilation error like this:
```shell
/home/me/cutlass/include/cutlass/transform/pitch_linear_thread_map.h(295): error: static assertion failed with "Number of iterations must be non-zero"
```

The tile settings in my source code:
```c++
// This code section describes the tile size a thread block will compute
using ShapeMMAThreadBlock =
    cutlass::gemm::GemmShape<256, 128, 16>;  
// This code section describes tile size a warp will compute
using ShapeMMAWarp = cutlass::gemm::GemmShape<4, 2, 1>;  
// This code section describes the size of MMA op
using ShapeMMAOp = cutlass::gemm::GemmShape<16, 8, 8>;  
...
// Number of pipelines you want to use
constexpr int NumStages = 3;
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Using tile sizes in generator.py can't pass compilation? #617

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[QST] Using tile sizes in generator.py can't pass compilation? #617

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions