Fix tiny problem sizes for warpspeed scan by bernhardmgruber · Pull Request #7921 · NVIDIA/cccl

bernhardmgruber · 2026-03-06T19:19:40Z

bernhardmgruber · 2026-03-06T19:20:36Z

cub/cub/detail/warpspeed/squad/load_store.cuh

    constexpr ::cuda::std::uint16_t byteMask  = 0xFFFF;
    const ::cuda::std::uint16_t byteMaskStart = byteMask << cpAsyncOobInfo.smemStartSkipBytes;
-    const ::cuda::std::uint16_t byteMaskEnd   = byteMask >> (16 - cpAsyncOobInfo.smemEndBytesAfter16BBoundary);
+    const ::cuda::std::uint16_t byteMaskEnd   = byteMask >> (16 - cpAsyncOobInfo.smemEndBytesAfter16BBoundary) % 16;


@ahendriksen is there any smarter way to shift byteMask by smemEndBytesAfter16BBoundary, but leave it when it's 16? Would a predicated shift be faster?

Can't think of a quicker way off the top of my head. I always hoped that the compiler would figure out the best way to compute all these values.

A modulo 16 operation is just and AND by ~0xF, which should be quicker than computing the predicate.

Alright, thx!

github-actions · 2026-03-06T23:14:05Z

🥳 CI Workflow Results

🟩 Finished in 3h 52m: Pass: 100%/249 | Total: 9d 03h | Max: 3h 51m | Hits: 71%/155156

See results here.

bernhardmgruber added 3 commits March 5, 2026 18:17

Handle 0 items in warpspeed scan

ea03e19

Test 0 num_Items

44de595

Fix

66da4ca

bernhardmgruber requested a review from a team as a code owner March 6, 2026 19:19

bernhardmgruber requested a review from srinivasyadav18 March 6, 2026 19:19

github-project-automation bot added this to CCCL Mar 6, 2026

github-project-automation bot moved this to Todo in CCCL Mar 6, 2026

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Mar 6, 2026

bernhardmgruber commented Mar 6, 2026

View reviewed changes

miscco approved these changes Mar 7, 2026

View reviewed changes

bernhardmgruber merged commit 578d64b into NVIDIA:main Mar 9, 2026
271 checks passed

github-project-automation bot moved this from In Review to Done in CCCL Mar 9, 2026

bernhardmgruber deleted the scan_fixxes branch March 9, 2026 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tiny problem sizes for warpspeed scan#7921

Fix tiny problem sizes for warpspeed scan#7921
bernhardmgruber merged 3 commits intoNVIDIA:mainfrom
bernhardmgruber:scan_fixxes

bernhardmgruber commented Mar 6, 2026

Uh oh!

bernhardmgruber Mar 6, 2026

Uh oh!

ahendriksen Mar 8, 2026

Uh oh!

bernhardmgruber Mar 9, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bernhardmgruber commented Mar 6, 2026

Uh oh!

bernhardmgruber Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

ahendriksen Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 6, 2026

🥳 CI Workflow Results

🟩 Finished in 3h 52m: Pass: 100%/249 | Total: 9d 03h | Max: 3h 51m | Hits: 71%/155156

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants