[cudax] Initial cudax::coop::reduce prototype#9154
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthroughimportant: WalkthroughAdds ChangesCooperative Group Reduction
Possibly related PRs
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (8)
cudax/include/cuda/experimental/__coop/reduce.cuh (1)
1-99: ⚡ Quick winsuggestion: Add a brief “unstable API / subject to change without notice” note in this header to satisfy CUDA Experimental API documentation requirements.
As per coding guidelines, "
cudax/include/**/*.{h,hpp,cuh}: All CUDA Experimental APIs must be documented as unstable and subject to change without notice."cudax/include/cuda/experimental/coop.cuh (1)
1-26: ⚡ Quick winsuggestion: Add an unstable-API note in this public CUDA Experimental header so consumers are explicitly warned that the surface may change without notice.
As per coding guidelines, "
cudax/include/**/*.{h,hpp,cuh}: All CUDA Experimental APIs must be documented as unstable and subject to change without notice."cudax/test/coop/reduce/this_thread.cu (1)
138-138: ⚡ Quick winsuggestion: swap the
REQUIRE_THATactual/expected sides for relative-match correctness and clearer failures.Use
REQUIRE_THAT(test_results, Catch::Matchers::WithinRel(expected_data, T{0.05}))so tolerance is anchored to the reference value and diagnostics read as actual vs expected.cudax/test/coop/reduce/this_warp.cu (2)
31-60: ⚡ Quick winimportant: remove the duplicated license and include block.
The file repeats the full prologue and headers verbatim, which adds maintenance noise and can hide future include drift.
As per coding guidelines, remove unused/duplicate code artifacts.
174-174: ⚡ Quick winsuggestion: flip
REQUIRE_THATargument order in the floating-point check.Use actual result as the checked value and reference as matcher target to preserve expected relative-tolerance semantics and error output.
cudax/test/coop/reduce/this_block.cu (3)
31-60: ⚡ Quick winimportant: remove the duplicated file prologue/includes.
The second copy is redundant and should be dropped to keep this test source maintainable.
As per coding guidelines, remove unused/duplicate code artifacts.
197-207: ⚡ Quick winimportant: extend block-scope test coverage to include
num_items2 and 3.Current block tests only validate item counts 1 and 4, leaving intermediate per-thread item paths untested.
Also applies to: 236-237, 263-264
175-175: ⚡ Quick winsuggestion: use actual-result-first ordering in
REQUIRE_THATfor floating checks.This keeps relative matching anchored to the reference and improves failure readability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1921fa4b-a5d1-45b0-b57f-eaaa367e3367
📒 Files selected for processing (7)
cub/cub/warp/warp_reduce.cuhcudax/include/cuda/experimental/__coop/reduce.cuhcudax/include/cuda/experimental/coop.cuhcudax/test/CMakeLists.txtcudax/test/coop/reduce/this_block.cucudax/test/coop/reduce/this_thread.cucudax/test/coop/reduce/this_warp.cu
This comment has been minimized.
This comment has been minimized.
0efc515 to
4e8ecd1
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
cudax/test/coop/reduce/this_thread.cu (2)
11-17: ⚡ Quick winsuggestion: add a direct include for
cuda::std::numeric_limitsinstead of relying on transitive includes.As per coding guidelines, "Files must include all headers related to symbols they use" and "avoid transitive includes and only include the most precise headers needed."
Also applies to: 225-225, 245-245
59-105: ⚡ Quick winsuggestion:
operator_to_std/operator_to_std_tare unused in this test and should be removed to keep the test surface minimal.As per coding guidelines, "Remove unused code, variables, functions, types, template parameters, and headers".
cudax/test/coop/reduce/this_warp.cu (1)
11-17: ⚡ Quick winsuggestion: add a direct include for
cuda::std::numeric_limitsto avoid transitive include dependency.As per coding guidelines, "Files must include all headers related to symbols they use" and "avoid transitive includes and only include the most precise headers needed."
Also applies to: 144-144, 165-165
cudax/test/coop/reduce/this_block.cu (1)
11-17: ⚡ Quick winsuggestion: add a direct include for
cuda::std::numeric_limitsinstead of relying on transitive includes.As per coding guidelines, "Files must include all headers related to symbols they use" and "avoid transitive includes and only include the most precise headers needed."
Also applies to: 148-148, 174-174
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f8e9175d-5759-4110-af71-613709b62b09
📒 Files selected for processing (8)
cub/cub/warp/warp_reduce.cuhcudax/cmake/cudaxBuildCompilerTargets.cmakecudax/include/cuda/experimental/__coop/reduce.cuhcudax/include/cuda/experimental/coop.cuhcudax/test/CMakeLists.txtcudax/test/coop/reduce/this_block.cucudax/test/coop/reduce/this_thread.cucudax/test/coop/reduce/this_warp.cu
🚧 Files skipped from review as they are similar to previous changes (5)
- cudax/include/cuda/experimental/coop.cuh
- cub/cub/warp/warp_reduce.cuh
- cudax/test/CMakeLists.txt
- cudax/include/cuda/experimental/__coop/reduce.cuh
- cudax/cmake/cudaxBuildCompilerTargets.cmake
This comment has been minimized.
This comment has been minimized.
| const auto __result = _WarpReduce{__scratch}.Reduce(__thread_data, __red_fn); | ||
| return (gpu_thread.is_root_rank(__group)) ? ::cuda::std::optional{__result} : ::cuda::std::nullopt; |
There was a problem hiding this comment.
I am confused by this line. Is this because the value is only valid in the leader thread? Should we broadcast it rather than diverging further?
Otherwise, why do we even compute it if its not desired
There was a problem hiding this comment.
This is because only the root rank has the correct value. CUB does the same thing, but returns garbage for non-root ranks. I think returning optional is the better way.
We've already discussed this before and we agreed that we would start like this and add an API that would also broadcast the result in the future.
4e8ecd1 to
885df76
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7934d393-277b-45fe-93b6-0348f06eef46
📒 Files selected for processing (6)
cudax/include/cuda/experimental/__coop/reduce.cuhcudax/include/cuda/experimental/coop.cuhcudax/test/CMakeLists.txtcudax/test/coop/reduce/this_block.cucudax/test/coop/reduce/this_thread.cucudax/test/coop/reduce/this_warp.cu
✅ Files skipped from review due to trivial changes (1)
- cudax/include/cuda/experimental/coop.cuh
🚧 Files skipped from review as they are similar to previous changes (3)
- cudax/include/cuda/experimental/__coop/reduce.cuh
- cudax/test/coop/reduce/this_block.cu
- cudax/test/coop/reduce/this_warp.cu
This comment has been minimized.
This comment has been minimized.
d774f1f to
37dac27
Compare
🥳 CI Workflow Results🟩 Finished in 33m 24s: Pass: 100%/55 | Total: 4h 20m | Max: 33m 17s | Hits: 97%/34333See results here. |
This PR implements
cudax::coop::reducefunction prototype forcudax::this_(thread|warp|block). The function dispatches tocub::equivalents for now and allocates all of the shared memory itself.