-
Notifications
You must be signed in to change notification settings - Fork 333
Closed
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Performance
Component
CUB
Describe the bug
#3138 introduced 120% performance regression in cub::DeviceScan::Sum on Hopper.
Looking at PR description:
No changes in SASS for cub.test.device_scan.lid_0.types_0 except kernel symbol names
I assume that we checked SASS only on default set of architectures from our preset "60;70;80".
diff old.sass new.sass | wc -l
88
Mentioned refactoring also touches SM90 policies, which results in the following diff:
diff old.90.sass new.90.sass | wc -l
83618
We should verify SM90 SASS differences on other algorithms we refactored lately.
How to Reproduce
git checkout 52b4b671c111fc289b2f2247a16eff21d5a5a1a8
./ci/build_cub.sh -arch 90
cuobjdump --dump-sass build/cuda12.6-gcc13/cub-cpp17/bin/cub.bench.scan.exclusive.sum.base > new.90.sass
git checkout HEAD~
./ci/build_cub.sh -arch 90
cuobjdump --dump-sass build/cuda12.6-gcc13/cub-cpp17/bin/cub.bench.scan.exclusive.sum.base > old.90.sass
diff old.90.sass new.90.sass | wc -l
83618Expected behavior
No SASS difference on any architecture after tuning refactoring
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done