Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources #2971

Merged
merged 2 commits into from
May 19, 2021

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented May 18, 2021

  • adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make
    sure that cudaErrorLaunchOutOfResources on some GPUs

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

  • It adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make
    sure that cudaErrorLaunchOutOfResources on some GPUs

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make sure that cudaErrorLaunchOutOfResources on some GPUs
  • Affected modules and functionalities:
    fft_postprocess.cuh
  • Key points relevant for the review:
    NA
  • Validation and testing:
    CI
  • Documentation (including examples):
    NA

JIRA TASK: [NA]

…tOfResources

- adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make
  sure that cudaErrorLaunchOutOfResources on some GPUs

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented May 18, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2382032]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2382036]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2382037]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2382037]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2382036]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2382032]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2382032]: BUILD PASSED

@@ -143,7 +143,9 @@ __global__ void ConvertTimeMajorSpectrogram(
}

template <typename Out, typename In, typename Convert = identity>
__global__ void TransposeBatch(
__global__ void
__launch_bounds__(32*kBlock)
Copy link
Contributor

@mzient mzient May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__launch_bounds__(32*kBlock)
__launch_bounds__(kBlock*kBlock)

That's how this kernel is invoked in L345 - but now that I look at it, it would make sense to modify it a tiny bit (not in this PR, I guess).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented May 19, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2386250]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2386250]: BUILD PASSED

@JanuszL JanuszL changed the base branch from master to main May 19, 2021 12:22
@JanuszL JanuszL merged commit bcbd0ff into NVIDIA:main May 19, 2021
@JanuszL JanuszL deleted the launch_bounds branch May 19, 2021 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants