[QST] How to set tile size for fp64 in examples/36_gather_scatter_fusion?

Hello! I'm trying to make [examples/36_gather_scatter_fusion](https://github.com/NVIDIA/cutlass/tree/master/examples/36_gather_scatter_fusion) works for fp64. ```A```,```B```,```C``` and ```D``` are all fp64 and row major, with a shape of ```(m,n,k)=(4,4,4)```, and ```index_size=2```.
I found [examples/18_ampere_fp64_tensorop_affine2_gemm](https://github.com/NVIDIA/cutlass/tree/12f4108ac2233022b92f0e3533c23bed399fcf45/examples/18_ampere_fp64_tensorop_affine2_gemm) and set a tile size same as this example:
```c++
// This code section describes the tile size a thread block will compute
using ShapeMMAThreadBlock = cutlass::gemm::GemmShape<128, 128, 16>; 
// This code section describes tile size a warp will compute
using ShapeMMAWarp = cutlass::gemm::GemmShape<32, 64, 16>; 
// This code section describes the size of MMA op
using ShapeMMAOp = cutlass::gemm::GemmShape<8, 8, 4>; 
```
But a complication error occur:
```shell
/home/me/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator.h(353): error: static assertion failed with "Vectors implied by the thread map must be divisible by the access type."
          detected during:
            instantiation of class "cutlass::transform::threadblock::PredicatedTileAccessIterator<Shape_, Element_, cutlass::layout::PitchLinear, AdvanceRank, ThreadMap_, AccessType_, Gather> [with Shape_=cutlass::layout::PitchLinearShape<16, 128>, Element_=double, AdvanceRank=0, ThreadMap_=cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<16, 128>, 256, cutlass::layout::PitchLinearShape<16, 2>, 1>, AccessType_=cutlass::Array<double, 2, true>, Gather=true]" 
```
According to #566, I guess may be I should set the tile size to half of the tile size for fp32, so I set the tile size as:
```c++
using ShapeMMAThreadBlock =
    cutlass::gemm::GemmShape<128, 128, 8>; 
// This code section describes tile size a warp will compute
using ShapeMMAWarp = cutlass::gemm::GemmShape<64, 64, 8>; 
// This code section describes the size of MMA op
using ShapeMMAOp = cutlass::gemm::GemmShape<16, 8, 4>;
``` 
But a similar complication error occur. :(

What tile size works for fp64? Are there any rules I can refer to to set the correct tile size? By the way, is it possible to change tile size to accelerate?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] How to set tile size for fp64 in examples/36_gather_scatter_fusion? #612

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[QST] How to set tile size for fp64 in examples/36_gather_scatter_fusion? #612

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions