[CUB] Adds benchmarks for batched indexed top-k (aka batched arg top-k)#9288
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
SummaryThis PR adds nvbench benchmarks for indexed (arg) batched top-k that output both keys and per-segment indices, closing issue ChangesNew File
New Benchmark Translation Unit
Modified Benchmark Translation Unit
Implementation notes
Walkthroughimportant: Adds a shared header with pattern enums and a device key generator, a new indexed segmented top-k nvbench benchmark that emits per-segment indices, and refactors the keys-only benchmark to use the shared utilities and record segment-size memory reads. ChangesIndexed segmented top-k benchmark
Assessment against linked issues
Suggested reviewers
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
cub/benchmarks/bench/segmented_topk/variable/common.cuh (3)
152-153: ⚡ Quick winsuggestion: Mark
valid_patternsas constThis global is never modified and should be
constfor clarity and potential compiler optimization.Proposed fix
-const std::vector<std::string> valid_patterns = { +const std::vector<std::string> const valid_patterns = { "random", "quantized_random", "relu_quantized", "tie_heavy", "pivot_tie"};Or better, use
constexprwithstd::array:-const std::vector<std::string> valid_patterns = { - "random", "quantized_random", "relu_quantized", "tie_heavy", "pivot_tie"}; +constexpr std::array<const char*, 5> valid_patterns = { + "random", "quantized_random", "relu_quantized", "tie_heavy", "pivot_tie"};
33-56: 💤 Low valuesuggestion: Add function annotations and noexcept specification
Per guidelines, functions should be marked with
_CCCL_HOST_APIand usenoexcept(false)if they can throw. While benchmark utilities may have relaxed requirements, annotations improve clarity.Proposed fix
-[[nodiscard]] pattern_kind string_to_pattern(const std::string& pattern) +[[nodiscard]] inline _CCCL_HOST_API pattern_kind string_to_pattern(const std::string& pattern) noexcept(false)Source: Coding guidelines
58-149: 💤 Low valuesuggestion: Add host API annotation to
gen_dataThis template is host-only (returns
thrust::device_vector). Per guidelines it should be annotated, though benchmark code may have relaxed rules.Proposed fix
template <int MaxSegmentSize, int K> -[[nodiscard]] thrust::device_vector<float> +[[nodiscard]] inline _CCCL_HOST_API thrust::device_vector<float> gen_data(int num_segments, pattern_kind pattern, const cuda::std::int64_t* d_seg_sizes)Source: Coding guidelines
cub/benchmarks/bench/segmented_topk/variable/indexed.cu (1)
59-63: suggestion: The constant_iterator + counting_iterator pattern at line 60 correctly produces segment-local indices [0, segment_size) for each segment. The agent dereferencesd_value_segments_it[segment_id]to extract the segment-specific counting_iterator (line 251 in agent_batched_topk.cuh), then BlockLoad reads from it, generating the expected per-segment index sequence. Pattern is sound but worth documenting more explicitly in comments since the iterator-of-iterators chain is unconventional.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2061071e-4ea9-4569-b0e7-0b56cd3365f4
📒 Files selected for processing (3)
cub/benchmarks/bench/segmented_topk/variable/common.cuhcub/benchmarks/bench/segmented_topk/variable/indexed.cucub/benchmarks/bench/segmented_topk/variable/keys.cu
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 617ce461-cf61-45a1-b55c-6f5c437e4d50
📒 Files selected for processing (2)
cub/benchmarks/bench/segmented_topk/variable/indexed.cucub/benchmarks/bench/segmented_topk/variable/keys.cu
🚧 Files skipped from review as they are similar to previous changes (1)
- cub/benchmarks/bench/segmented_topk/variable/indexed.cu
34276b2 to
f639e25
Compare
🥳 CI Workflow Results🟩 Finished in 36m 15s: Pass: 100%/242 | Total: 1d 09h | Max: 33m 56s | Hits: 99%/147326See results here. |
Closes #9287