[QST] `wgmma_sm90.cu` tutorial pipelining

**What is your question?**
In the wgmma_sm90 tutorial, why is the pipelined [inner loop executed](https://github.com/NVIDIA/cutlass/blob/6c6b78550e0752309c1f85dc9c1f6e636c8d751c/examples/cute/tutorial/wgmma_sm90.cu#L221) `k_tile_count > -K_PIPE_MAX` times?

The producers load `K_PIPE_MAX` tiles [here](https://github.com/NVIDIA/cutlass/blob/6c6b78550e0752309c1f85dc9c1f6e636c8d751c/examples/cute/tutorial/wgmma_sm90.cu#L169-L180) before entering the main loop, so wouldn't there be `k_tile_count - K_PIPE_MAX` tiles remaining?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] `wgmma_sm90.cu` tutorial pipelining #2173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] wgmma_sm90.cu tutorial pipelining #2173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[QST] `wgmma_sm90.cu` tutorial pipelining #2173