Allow cluster sizes across m,n,k to be reported in cutlass profiler #2078

mandroid6 · 2025-02-04T23:46:00Z

Currently cutlass profiler lists down all the arguments to the benchmark but doesn't list down per kernel values for cluster_k, cluster_m and cluster_n.

This change updates the profiler report generation to include these arguments.

Before:

As we see below, the values for cluster_m,cluster_n,cluster_k are missing in the kernel result.

Problem,Provider,OperationKind,Operation,Disposition,Status,gemm_kind,m,n,k,A,B,C,D,alpha,beta,split_k_mode,split_k_slices,batch_count,raster_order,use_pdl,swizzle_size,op_class,accum,cta_m,cta_n,cta_k,cluster_m,cluster_n,cluster_k,stages,warps_m,warps_n,warps_k,inst_m,inst_n,inst_k,min_cc,max_cc,Bytes,Flops,Flops/Byte,Runtime,GB/s,GFLOPs
1,CUTLASS,gemm,cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_1x2x1_0_tnn_align8,incorrect,success,universal,4352,4096,4096,bf16:row,bf16:column,bf16:column,bf16:column,1,0,serial,1,1,heuristic,false,1,tensorop,f32,128,128,64,,,,7,4,2,1,64,128,16,90,90,104857600,146064539648,1392,0.235348,414.944,620633

After:

Problem,Provider,OperationKind,Operation,Disposition,Status,gemm_kind,m,n,k,A,B,C,D,alpha,beta,split_k_mode,split_k_slices,batch_count,raster_order,use_pdl,swizzle_size,op_class,accum,cta_m,cta_n,cta_k,cluster_m,cluster_n,cluster_k,stages,warps_m,warps_n,warps_k,inst_m,inst_n,inst_k,min_cc,max_cc,Bytes,Flops,Flops/Byte,Runtime,GB/s,GFLOPs
1,CUTLASS,gemm,cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_1x2x1_0_tnn_align8,incorrect,success,universal,4352,4096,4096,bf16:row,bf16:column,bf16:column,bf16:column,1,0,serial,1,1,heuristic,false,1,tensorop,f32,128,128,64,1,2,1,7,4,2,1,64,128,16,90,90,104857600,146064539648,1392,0.235348,414.944,620633

Repro commands:

Build cutlass

git clone https://github.com/NVIDIA/cutlass
cd cutlass
mkdir build
cmake .. -DCUTLASS_NVCC_ARCHS=90a -DCUTLASS_LIBRARY_KERNELS=cutlass3x_sm90_tensorop_s*16gemm_bf16_bf16_f32_bf16_bf16_*tnn* -DCUTLASS_ENABLE_TESTS=OFF -GNinja -DCUTLASS_LIBRARY_INSTANTIATION_LEVEL=9992 -DCUTLASS_LIBRARY_OPERATIONS=Gemm

Run profiler

 ./tools/profiler/cutlass_profiler --operation=Gemm --output=data --dist=gaussian,mean:0.0,stddev:1.0,scale:-1 --m=4352 --n=4096 --k=4096 --A=bf16:row --B=bf16:column --C=bf16:column --D=bf16:column

Currently cutlass profiler lists down all the arguments to the benchmark but doesn't list down per kernel values for cluster_k, cluster_m and cluster_n. This change updates the profiler report generation to include these arguments.

mandroid6 · 2025-02-05T00:03:44Z

@hwu36 @kerrmudgeon

hwu36 · 2025-02-05T12:04:35Z

@itramble , could you please review first?

mandroid6 · 2025-02-11T18:46:41Z

@itramble could you help take a look? (cc @hwu36 )

itramble · 2025-02-12T19:29:24Z

Hi @mandroid6, thanks for raising this. I think this was changed recently. As of today, I see:

Problem,Provider,OperationKind,Operation,Disposition,Status,gemm_kind,m,n,k,A,B,C,D,alpha,beta,split_k_mode,split_k_slices,batch_count,raster_order,runtime_input_datatype_a,runtime_input_datatype_b,use_pdl,enable_sm90_mixed_dtype_shuffle_test,swizzle_size,op_class,accum,cta_m,cta_n,cta_k,cluster_m,cluster_n,cluster_k,cluster_m_fallback,cluster_n_fallback,cluster_k_fallback,stages,warps_m,warps_n,warps_k,inst_m,inst_n,inst_k,min_cc,max_cc,Bytes,Flops,Flops/Byte,Runtime,GB/s,GFLOPs
1,CUTLASS,gemm,cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_1x2x1_0_tnn_align8,incorrect,success,universal,4352,4096,4096,bf16:row,bf16:column,bf16:column,bf16:column,1,0,serial,1,1,heuristic,invalid,invalid,false,false,1,tensorop,f32,128,128,64,1,1,1,0,0,0,7,4,2,1,64,128,16,90,90,104857600,146064539648,1392,0.359317,271.783,406506

Unfortunately, this is not entirely correct either. We currently report the "cluster*" arguments that were passed to the profiler (or defaults, see here). We do this because there is a new Blackwell feature for using runtime cluster shapes (described here) in addition to static compile-time cluster shapes that were supported for Hopper. Runtime cluster shapes are indicated when one of operation_desc.tile_description.cluster_shape.m/n/k() is 0. When none of the cluster_shapes are 0 (true for Hopper CUTLASS kernels), then your change is correct.

github-actions · 2025-03-14T20:06:06Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2025-06-12T20:06:38Z

This PR has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

github-actions bot added the inactive-30d label Mar 14, 2025

github-actions bot added the inactive-90d label Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow cluster sizes across m,n,k to be reported in cutlass profiler #2078

Allow cluster sizes across m,n,k to be reported in cutlass profiler #2078

Uh oh!

mandroid6 commented Feb 4, 2025 •

edited

Loading

Uh oh!

mandroid6 commented Feb 5, 2025

Uh oh!

hwu36 commented Feb 5, 2025 •

edited

Loading

Uh oh!

mandroid6 commented Feb 11, 2025

Uh oh!

itramble commented Feb 12, 2025

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Uh oh!

Allow cluster sizes across m,n,k to be reported in cutlass profiler #2078

Are you sure you want to change the base?

Allow cluster sizes across m,n,k to be reported in cutlass profiler #2078

Uh oh!

Conversation

mandroid6 commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Repro commands:

Uh oh!

mandroid6 commented Feb 5, 2025

Uh oh!

hwu36 commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mandroid6 commented Feb 11, 2025

Uh oh!

itramble commented Feb 12, 2025

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Uh oh!

mandroid6 commented Feb 4, 2025 •

edited

Loading

hwu36 commented Feb 5, 2025 •

edited

Loading