Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Swap block x and z dimension for conv2d NHWC schedule #9087

Merged
merged 1 commit into from Sep 23, 2021

Conversation

masahi
Copy link
Member

@masahi masahi commented Sep 23, 2021

In the cuda conv2d NHWC schedule, the number of blocks launched in the Z dimension is (roughly, modulo constant divisor) H * W / 4. According to deviceQuery, the max grid z dimension is 65536:

  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

When H and W are large, it is very likely to generate an invalid schedule, because we try to launch too many blocks in the Z dimension. For example, here is an error that I hit when the input size is (800, 750). I cannot avoid this error even after auto tuning, since the block z size stays fixed during tuning. Without the change in this PR, I cannot ever run this model unless I use the auto scheduler.


  Check failed: ret == 0 (-1 vs. 0) : TVMError: CUDALaunch Error: CUDA_ERROR_INVALID_VALUE
 grid=(8,1,150000),  block=(4,4,1)

My solution is simply to swap the use of block x and z dimension, since we can launch far more blocks in the x dim as deviceQuery shows above.

cc @vinx13 @junrushao1994 @Hzfengsy

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @masahi

@vinx13 vinx13 merged commit 1b595c0 into apache:main Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants