Optimize compilation time for the common case #400

gevtushenko · 2021-10-29T13:40:14Z

This PR contains compilation time improvement for the common case when agents fit into default shared memory size (48 KB).

gevtushenko · 2021-10-29T13:55:05Z

gpuCI: NVIDIA/thrust#1557
DVS: 30594457

alliepiper

Thanks @senior-zero, this is a huge improvement -- I'm seeing compilation time improve on both nvcc and nvc++. It isn't quite back to before, but it's a significant reduction!

Compile times for the thrust::sort test program:

Compiler	Old merge sort	Current merge sort	This PR
nvcc	18.79s	29.10s (+55%)	22.44s (+19%)
nvc++	61.81s	75.61s (+22%)	65.80s (+6%)

LGTM -- Let's get this merged once the tests are passing and see how this impacts the total build time.

cub/device/dispatch/dispatch_merge_sort.cuh

alliepiper · 2021-10-29T19:05:23Z

Related to NVBugs 3418930 and 3419768.

dongxiao92 · 2021-10-30T03:43:32Z

Could you help to explain why these changes can reduce compilation time?
From the code changes, I understand that 1) instantiation for DeviceMergeSortBlockSortKernel with use_vshmem=true may not be needed and 2) compilation for code in this if branch may not be needed.
Does the compilation time reduction come from these two changes?

gevtushenko · 2021-10-30T09:44:47Z

Could you help to explain why these changes can reduce compilation time? From the code changes, I understand that 1) instantiation for DeviceMergeSortBlockSortKernel with use_vshmem=true may not be needed and 2) compilation for code in this if branch may not be needed. Does the compilation time reduction come from these two changes?

Hello, @dongxiao92!

Two specializations of merge sort kernels exist:

kernel	use vshmem=false	use vshmem=true
DeviceMergeSortBlockSortKernel	+	+
DeviceMergeSortMergeKernel	+	+

Since the check for available shared memory is performed at runtime, we had to compile for both cases (in generic case). This patch relies on the thrust approach which consists of comparing kernel shared memory size requirements with the default available shared memory size (48KB). This check can be done at compile time. If we know that virtual shared memory is not required at compile-time, there's no need to compile merge sort kernels twice.

Optimize compilation time for the common case

da950dc

gevtushenko requested a review from alliepiper October 29, 2021 13:40

gevtushenko added the testing: gpuCI in progress Started gpuCI testing. label Oct 29, 2021

gevtushenko added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Oct 29, 2021

alliepiper approved these changes Oct 29, 2021

View reviewed changes

cub/device/dispatch/dispatch_merge_sort.cuh Show resolved Hide resolved

alliepiper added this to the 1.16.0 milestone Oct 29, 2021

alliepiper added helps: nvc++ Helps or needed by NVC++. nvbug Has an associated internal NVIDIA NVBug. P0: must have Absolutely necessary. Critical issue, major blocker, etc. labels Oct 29, 2021

gevtushenko added testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Oct 29, 2021

gevtushenko added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Oct 30, 2021

Document merge sort vshmem specializations

ea92d95

gevtushenko merged commit 8c32c79 into NVIDIA:main Oct 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize compilation time for the common case #400

Optimize compilation time for the common case #400

gevtushenko commented Oct 29, 2021

gevtushenko commented Oct 29, 2021

alliepiper left a comment

alliepiper commented Oct 29, 2021

dongxiao92 commented Oct 30, 2021

gevtushenko commented Oct 30, 2021

Optimize compilation time for the common case #400

Optimize compilation time for the common case #400

Conversation

gevtushenko commented Oct 29, 2021

gevtushenko commented Oct 29, 2021

alliepiper left a comment

Choose a reason for hiding this comment

alliepiper commented Oct 29, 2021

dongxiao92 commented Oct 30, 2021

gevtushenko commented Oct 30, 2021