Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Why use blockIdx.z as the indexing method for batch #1050

Closed
piDack opened this issue Aug 16, 2023 · 2 comments
Closed

[QST] Why use blockIdx.z as the indexing method for batch #1050

piDack opened this issue Aug 16, 2023 · 2 comments

Comments

@piDack
Copy link

piDack commented Aug 16, 2023

What is your question?

In CUDA the range of blockIdx.z is [1-65535],blockIdx.x has a bigger range,why blockIdx.z is used as the index of batch in CUTLASS when gemm_universal mode=batch.

      ptr_A += threadblock_tile_offset.k() * params.batch_stride_A;
      ptr_B += threadblock_tile_offset.k() * params.batch_stride_B;

If blockIdx.x and blockIdx.y=1 in some special case, then is it reasonable if I use blockIdx.x as batch index by changing the code.

thanks.

@piDack
Copy link
Author

piDack commented Aug 18, 2023

It will cause performance slow down

@piDack piDack closed this as completed Aug 18, 2023
@hwu36
Copy link
Collaborator

hwu36 commented Aug 18, 2023

you need to change threadblock layout algorithms accordingly if you change the usage of grid dimensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants