Skip to content

Improve DeviceScan env test coverage and clean up stale code#8280

Merged
gonidelis merged 5 commits intoNVIDIA:mainfrom
gonidelis:scan_test_impr
Apr 8, 2026
Merged

Improve DeviceScan env test coverage and clean up stale code#8280
gonidelis merged 5 commits intoNVIDIA:mainfrom
gonidelis:scan_test_impr

Conversation

@gonidelis
Copy link
Copy Markdown
Member

@gonidelis gonidelis commented Apr 2, 2026

Taking the opportunity from a recent review from one of my env PRs I took the liberty to clean up the scan tests mainly for two reasons:

  1. use non-identity init values to make sure they are factored in properly
  2. apply scan/sum ops in more than 1 elements (which in exclusive* it's just the init value) to make sure without loss of generality that the results is correct for the full input range and not just the edge case
  3. clean-up some dead leftover code
  4. Add block_size_recording_iterator_t for ExclusiveSum tuning test since ExclusiveSum doesn't accept a custom op, we use a custom input iterator that records blockDim.x as a side effect, allowing us to verify the tuning policy is applied
  5. Guard (previously unguarded) blockDim.x writes in block_size_check_t with if (threadIdx.x == 0)

* Use non-identity init values so the init actually affects scan results
* Use multi-element inputs so the scan op is exercised, not just init
* Remove unused cudaGetDevice/PtxVersion calls, use ptx_arch_id instead
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 2, 2026
@gonidelis gonidelis requested a review from a team as a code owner April 2, 2026 17:00
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 2, 2026
@github-actions

This comment has been minimized.

@gonidelis gonidelis enabled auto-merge (squash) April 7, 2026 14:35
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🥳 CI Workflow Results

🟩 Finished in 46m 12s: Pass: 100%/213 | Total: 2d 10h | Max: 31m 11s | Hits: 96%/124738

See results here.

@gonidelis gonidelis merged commit 228ea4c into NVIDIA:main Apr 8, 2026
230 of 232 checks passed
gonidelis added a commit to gonidelis/cccl that referenced this pull request Apr 8, 2026
…8280)

* Use non-identity init values so the init actually affects scan results
* Use multi-element inputs so the scan op is exercised, not just init
* Remove unused cudaGetDevice/PtxVersion calls, use ptx_arch_id instead
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants