Skip to content

Fixed issue where row reductions added an extra unnecessary wrapper#1148

Merged
cliffburdick merged 2 commits intomainfrom
cburdick/940
Apr 9, 2026
Merged

Fixed issue where row reductions added an extra unnecessary wrapper#1148
cliffburdick merged 2 commits intomainfrom
cburdick/940

Conversation

@cliffburdick
Copy link
Copy Markdown
Collaborator

No description provided.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR corrects the CUB version guard from the broken CUB_MAJOR_VERSION >= 3 && CUB_MINOR_VERSION >= 2 (already flagged in a prior review thread) to the semantically correct CUB_MAJOR_VERSION > 3 || (CUB_MAJOR_VERSION == 3 && CUB_MINOR_VERSION >= 2) across all four affected reduction functions. It also adds the contiguous-tensor fast path (direct pointer, fixed segment size) to ExecMin and ExecMax, matching the pattern already present in ExecReduce and ExecSum, and backs the new paths with comprehensive segmented reduction tests.

Confidence Score: 5/5

Safe to merge — the version guard fix is correct and the new fast paths mirror the already-reviewed ExecReduce/ExecSum pattern.

All previously flagged P1 issues (incorrect CUB version guard in ExecMin and ExecMax) are now resolved with the correct > 3 || (== 3 && >= 2) expression. No new logic or correctness issues were found. The new tests cover the relevant shapes and validate expected values correctly.

No files require special attention.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
include/matx/transforms/cub.h Fixes the CUB version guard from the incorrect >= 3 && >= 2 to the correct `> 3
test/00_operators/ReductionTests.cu Adds SegmentedMin and SegmentedMax typed tests covering 2D→1D, 3D→2D, 3D→1D, and full-reduction-to-scalar cases with correct expected values.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["ExecMin / ExecMax called"] --> B{"OutputTensor::Rank() > 0?"}
    B -- No --> C["cub::DeviceReduce::Min/Max\n(scalar output)"]
    B -- Yes --> D{"CUB >= 3.2?"}
    D -- No --> E["ReduceInput with iterator offsets\n(legacy path)"]
    D -- Yes --> F{"is_tensor_view AND contiguous?"}
    F -- Yes --> G["cub::DeviceSegmentedReduce::Min/Max\nwith fixed seg_size\n(fast path — NEW)"]
    F -- No --> H["ReduceInput with iterator offsets\n(iterator path)"]
Loading

Reviews (2): Last reviewed commit: "Fixing CUB macro for when CUB 4.0 comes ..." | Re-trigger Greptile

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage is 91.745%cburdick/940 into main. No base build found for main.

@cliffburdick cliffburdick merged commit cd207a7 into main Apr 9, 2026
1 check passed
@cliffburdick cliffburdick deleted the cburdick/940 branch April 9, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants