Fixed issue where row reductions added an extra unnecessary wrapper#1148
Fixed issue where row reductions added an extra unnecessary wrapper#1148cliffburdick merged 2 commits intomainfrom
Conversation
|
/build |
Greptile SummaryThis PR corrects the CUB version guard from the broken Confidence Score: 5/5Safe to merge — the version guard fix is correct and the new fast paths mirror the already-reviewed ExecReduce/ExecSum pattern. All previously flagged P1 issues (incorrect CUB version guard in ExecMin and ExecMax) are now resolved with the correct No files require special attention.
|
| Filename | Overview |
|---|---|
| include/matx/transforms/cub.h | Fixes the CUB version guard from the incorrect >= 3 && >= 2 to the correct `> 3 |
| test/00_operators/ReductionTests.cu | Adds SegmentedMin and SegmentedMax typed tests covering 2D→1D, 3D→2D, 3D→1D, and full-reduction-to-scalar cases with correct expected values. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["ExecMin / ExecMax called"] --> B{"OutputTensor::Rank() > 0?"}
B -- No --> C["cub::DeviceReduce::Min/Max\n(scalar output)"]
B -- Yes --> D{"CUB >= 3.2?"}
D -- No --> E["ReduceInput with iterator offsets\n(legacy path)"]
D -- Yes --> F{"is_tensor_view AND contiguous?"}
F -- Yes --> G["cub::DeviceSegmentedReduce::Min/Max\nwith fixed seg_size\n(fast path — NEW)"]
F -- No --> H["ReduceInput with iterator offsets\n(iterator path)"]
Reviews (2): Last reviewed commit: "Fixing CUB macro for when CUB 4.0 comes ..." | Re-trigger Greptile
|
/build |
No description provided.