Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources #2971

JanuszL · 2021-05-18T12:59:15Z

adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make
sure that cudaErrorLaunchOutOfResources on some GPUs

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

It adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make
sure that cudaErrorLaunchOutOfResources on some GPUs

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make sure that cudaErrorLaunchOutOfResources on some GPUs
Affected modules and functionalities:
fft_postprocess.cuh
Key points relevant for the review:
NA
Validation and testing:
CI
Documentation (including examples):
NA

JIRA TASK: [NA]

…tOfResources - adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make sure that cudaErrorLaunchOutOfResources on some GPUs Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2021-05-18T12:59:27Z

!build

dali-automaton · 2021-05-18T13:01:32Z

CI MESSAGE: [2382032]: BUILD STARTED

dali-automaton · 2021-05-18T13:02:01Z

CI MESSAGE: [2382036]: BUILD STARTED

dali-automaton · 2021-05-18T13:02:24Z

CI MESSAGE: [2382037]: BUILD STARTED

dali-automaton · 2021-05-18T15:26:40Z

CI MESSAGE: [2382037]: BUILD PASSED

dali-automaton · 2021-05-18T16:26:51Z

CI MESSAGE: [2382036]: BUILD PASSED

dali-automaton · 2021-05-18T17:37:13Z

CI MESSAGE: [2382032]: BUILD FAILED

dali-automaton · 2021-05-18T17:49:42Z

CI MESSAGE: [2382032]: BUILD PASSED

mzient · 2021-05-19T09:51:53Z

dali/kernels/signal/fft/fft_postprocess.cuh

@@ -143,7 +143,9 @@ __global__ void ConvertTimeMajorSpectrogram(
 }

 template <typename Out, typename In, typename Convert = identity>
-__global__ void TransposeBatch(
+__global__ void
+__launch_bounds__(32*kBlock)


Suggested change

__launch_bounds__(32*kBlock)

__launch_bounds__(kBlock*kBlock)

That's how this kernel is invoked in L345 - but now that I look at it, it would make sense to modify it a tiny bit (not in this PR, I guess).

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2021-05-19T10:32:56Z

!build

dali-automaton · 2021-05-19T10:36:22Z

CI MESSAGE: [2386250]: BUILD STARTED

dali-automaton · 2021-05-19T11:51:43Z

CI MESSAGE: [2386250]: BUILD PASSED

Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOu…

401d160

…tOfResources - adds launch bounds to TransposeBatch kernel inside fft_postprocess.cuh to make sure that cudaErrorLaunchOutOfResources on some GPUs Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

jantonguirao assigned klecki and mzient May 19, 2021

mzient reviewed May 19, 2021

View reviewed changes

Review fix

163bc7f

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

klecki approved these changes May 19, 2021

View reviewed changes

JanuszL changed the base branch from master to main May 19, 2021 12:22

mzient approved these changes May 19, 2021

View reviewed changes

JanuszL merged commit bcbd0ff into NVIDIA:main May 19, 2021

JanuszL deleted the launch_bounds branch May 19, 2021 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources #2971

Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources #2971

JanuszL commented May 18, 2021

JanuszL commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

mzient May 19, 2021 •

edited

Loading

JanuszL May 19, 2021

JanuszL commented May 19, 2021

dali-automaton commented May 19, 2021

dali-automaton commented May 19, 2021

	__launch_bounds__(32*kBlock)
	__launch_bounds__(kBlock*kBlock)

Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources #2971

Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources #2971

Conversation

JanuszL commented May 18, 2021

Why we need this PR?

What happened in this PR?

JanuszL commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

dali-automaton commented May 18, 2021

mzient May 19, 2021 • edited Loading

Choose a reason for hiding this comment

JanuszL May 19, 2021

Choose a reason for hiding this comment

JanuszL commented May 19, 2021

dali-automaton commented May 19, 2021

dali-automaton commented May 19, 2021

mzient May 19, 2021 •

edited

Loading