Skip to content

Port thrust::generate[_n] to cub::DeviceTransform::Fill#5807

Merged
bernhardmgruber merged 4 commits intoNVIDIA:mainfrom
bernhardmgruber:generate
Sep 9, 2025
Merged

Port thrust::generate[_n] to cub::DeviceTransform::Fill#5807
bernhardmgruber merged 4 commits intoNVIDIA:mainfrom
bernhardmgruber:generate

Conversation

@bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Sep 8, 2025

Fixes: #5806

Benchmark on RTX 5090 (INCLUDING FIX FROM #5889, re-benchmarked retroactively)

## [0] NVIDIA GeForce RTX 5090

|  T{ct}  |  Elements  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|---------|------------|------------|-------------|------------|-------------|------------|---------|----------|
|   I8    |    2^16    |   8.222 us |      92.21% |   7.530 us |      28.97% |  -0.693 us |  -8.42% |   SAME   |
|   I8    |    2^20    |   9.227 us |      19.86% |   7.724 us |      10.34% |  -1.503 us | -16.29% |   FAST   |
|   I8    |    2^24    |  22.833 us |       3.24% |  13.549 us |      12.73% |  -9.284 us | -40.66% |   FAST   |
|   I8    |    2^28    | 250.505 us |       0.50% | 164.240 us |       2.17% | -86.266 us | -34.44% |   FAST   |
|   I16   |    2^16    |   9.542 us |      89.42% |   7.863 us |       6.10% |  -1.678 us | -17.59% |   FAST   |
|   I16   |    2^20    |   8.604 us |       9.87% |   8.255 us |       9.86% |  -0.349 us |  -4.06% |   SAME   |
|   I16   |    2^24    |  23.350 us |       4.48% |  22.695 us |       4.84% |  -0.655 us |  -2.81% |   SAME   |
|   I16   |    2^28    | 321.748 us |       1.46% | 323.162 us |       0.54% |   1.414 us |   0.44% |   SAME   |
|   I32   |    2^16    |   7.879 us |      10.17% |   7.915 us |      12.44% |   0.036 us |   0.46% |   SAME   |
|   I32   |    2^20    |   8.761 us |       5.70% |   8.866 us |       7.96% |   0.106 us |   1.21% |   SAME   |
|   I32   |    2^24    |  42.617 us |       1.92% |  43.178 us |       2.70% |   0.562 us |   1.32% |   SAME   |
|   I32   |    2^28    | 642.610 us |       0.93% | 642.545 us |       0.38% |  -0.065 us |  -0.01% |   SAME   |
|   I64   |    2^16    |   8.302 us |      15.11% |   7.623 us |      12.04% |  -0.679 us |  -8.17% |   SAME   |
|   I64   |    2^20    |  13.073 us |     132.22% |  10.098 us |       3.95% |  -2.975 us | -22.76% |   FAST   |
|   I64   |    2^24    |  83.907 us |       6.08% |  82.589 us |       1.30% |  -1.318 us |  -1.57% |   FAST   |
|   I64   |    2^28    |   1.286 ms |       0.86% |   1.284 ms |       0.44% |  -2.369 us |  -0.18% |   SAME   |
|  I128   |    2^16    |   7.989 us |       9.10% |   7.982 us |       4.76% |  -0.008 us |  -0.09% |   SAME   |
|  I128   |    2^20    |  13.036 us |       7.70% |  12.991 us |       4.13% |  -0.045 us |  -0.34% |   SAME   |
|  I128   |    2^24    | 163.006 us |      12.37% | 161.599 us |       0.78% |  -1.407 us |  -0.86% |   FAST   |
|  I128   |    2^28    |   2.555 ms |       0.56% |   2.554 ms |       0.42% |  -1.182 us |  -0.05% |   SAME   |
|   F32   |    2^16    |   7.976 us |      11.23% |   7.917 us |      11.11% |  -0.059 us |  -0.74% |   SAME   |
|   F32   |    2^20    |   8.822 us |      14.11% |   8.730 us |      12.24% |  -0.092 us |  -1.04% |   SAME   |
|   F32   |    2^24    |  42.173 us |       2.01% |  42.499 us |       2.34% |   0.326 us |   0.77% |   SAME   |
|   F32   |    2^28    | 642.122 us |       1.10% | 641.546 us |       0.44% |  -0.577 us |  -0.09% |   SAME   |
|   F64   |    2^16    |   7.836 us |      18.04% |   7.631 us |      17.08% |  -0.205 us |  -2.61% |   SAME   |
|   F64   |    2^20    |  10.180 us |      17.71% |  10.027 us |      11.09% |  -0.153 us |  -1.50% |   SAME   |
|   F64   |    2^24    |  83.014 us |       2.47% |  82.756 us |       1.43% |  -0.259 us |  -0.31% |   SAME   |
|   F64   |    2^28    |   1.286 ms |       1.28% |   1.284 ms |       0.35% |  -1.206 us |  -0.09% |   SAME   |

@bernhardmgruber bernhardmgruber requested review from a team as code owners September 8, 2025 12:19
@github-project-automation github-project-automation bot moved this to Todo in CCCL Sep 8, 2025
@bernhardmgruber bernhardmgruber changed the title Generate Port thrust::generate[_n] to cub::DeviceTransform::Fill Sep 8, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Sep 8, 2025
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2025

🥳 CI Workflow Results

🟩 Finished in 3h 22m: Pass: 100%/154 | Total: 5d 18h | Max: 3h 20m | Hits: 76%/177577

See results here.

Comment on lines +1 to +2
// SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
// SPDX-License-Identifier: BSD-3-Clause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: should this be Apache-2.0 WITH LLVM-exception like the other file? Or did we use BSD here because this is an existing file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think I modified the file hard enough to justify updating the copyright year, but changing the license feels a bit too much. I know new files should be Apache. Let's stay with the current license.

@bernhardmgruber bernhardmgruber merged commit e93c9c3 into NVIDIA:main Sep 9, 2025
167 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Sep 9, 2025
@bernhardmgruber bernhardmgruber deleted the generate branch September 9, 2025 19:13
Comment on lines +46 to +47
throw_on_error(status, "generate_n: failed inside CUB");
return result + count;),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: This is missing a stream synchronization here. Fixed in #5889

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Use cub::DeviceTransform::Fill in thrust::generate[_n]

3 participants