[BUG] Sm90 & Sm100 Array gemm kernels read ahead of `wait_on_dependent_grids()`

### Which component has the problem?

CUTLASS C++

### Bug Report

**Describe the bug**

First, the first element of the array containing problem shapes are accessed in the constructor of group tile scheduler.
https://github.com/NVIDIA/cutlass/blob/8debf77437753beca676eb3c6bf97b56a5f9fd68/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp#L267

https://github.com/NVIDIA/cutlass/blob/8debf77437753beca676eb3c6bf97b56a5f9fd68/include/cutlass/gemm/group_array_problem_shape.hpp#L68

TileScheduler is constructed before the `wait_on_dependent_grids()`, which could risk a chance reading the pointer arrays before dependent data gets flushed into global memory by a preceding kernel.  

https://github.com/NVIDIA/cutlass/blob/8debf77437753beca676eb3c6bf97b56a5f9fd68/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp#L466-L475

https://github.com/NVIDIA/cutlass/blob/8debf77437753beca676eb3c6bf97b56a5f9fd68/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp#L836-L842


This would cause a race condition when PDL is enabled.

**Steps/Code to reproduce bug**

Hard to reproduce as it's a race condition.


I wish there was some compiler support to spot PDL related bugs, e.g. ,print a warning log when codes in a kernel access to global address before wait_on_dependent_grids().  It was frustrating to figure out this kind of issues. 

	auto scheduler = [&] () {
	// Group scheduler requires a different constructor that takes a response ptr
	if constexpr (cute::is_same_v<SchedulerTag, GroupScheduler>) {
	return TileScheduler{params.scheduler, shared_storage.scheduler_response};
	}
	else {
	return TileScheduler{params.scheduler};
	}
	} ();

	TileScheduler scheduler(
	(!IsTensorMapUpdateAsync \|\| is_participant.sched \|\| is_participant.tensor_map_updater)
	? &shared_storage.clc_response[0][0]
	: &shared_storage.clc_response[1][0],
	params.scheduler,
	block_id_in_cluster
	);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Sm90 & Sm100 Array gemm kernels read ahead of `wait_on_dependent_grids()` #2962

Which component has the problem?

Bug Report

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Sm90 & Sm100 Array gemm kernels read ahead of wait_on_dependent_grids() #2962

Description

Which component has the problem?

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[BUG] Sm90 & Sm100 Array gemm kernels read ahead of `wait_on_dependent_grids()` #2962