Skip to content

Use Thrust for multithreaded host algorithms#1181

Merged
cliffburdick merged 4 commits into
mainfrom
cburdick/host-thrust-parallel-algorithms
May 26, 2026
Merged

Use Thrust for multithreaded host algorithms#1181
cliffburdick merged 4 commits into
mainfrom
cburdick/host-thrust-parallel-algorithms

Conversation

@cliffburdick
Copy link
Copy Markdown
Collaborator

The CPU sorting, reduction and prefix sum only used a single-threaded version. This adds the parallel versions via thrust algorithms, Speedups are shown below on a 20-core ARM:

image

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR replaces single-threaded std:: algorithm calls in the host-executor paths for sort, reduction, prefix-sum, and related operations with a new host_algorithms.h dispatch layer that routes to thrust::omp when MATX_EN_OMP is defined and the input exceeds configurable element thresholds, falling back to the original std:: call otherwise.

  • include/matx/transforms/host_algorithms.h (new): RAII ScopedOmpNumThreads guard, per-call threshold checks (HostThrustMinReductionElements = 16K, HostThrustMinScanElements = 16K, HostThrustMinSortElements = 1K), and wrappers for reduce, sort, sort-copy, inclusive-scan, min/max-element, any/all-of, and unique.
  • include/matx/core/iterator.h: Adds operator-, operator--, and full relational operators to RandomOperatorIterator so Thrust algorithms can consume it as a random-access input iterator; also makes RandomOperatorOutputIterator::operator* const (with const_cast for the write path) to satisfy Thrust's const-iterator requirements.
  • examples/simple_radar_pipeline.h and test/01_radar/MultiChannelRadarPipeline.cu: Lazy CFAR-norm computation and suite-level test setup are incidental refactors bundled in the PR.

Confidence Score: 5/5

Safe to merge; all changes are well-guarded behind OMP availability and element-count thresholds, with correct sequential fallbacks throughout.

The dispatch layer is defensive — it checks for OMP availability, nested-parallelism, thread count, and minimum data sizes before ever calling Thrust. All existing std:: call sites are preserved as fallbacks. The host_sort_copy semantics are correct because every current call site passes equal-sized input and output ranges. The iterator changes were required for Thrust compatibility and mirror operators already present on RandomOperatorOutputIterator.

No files require special attention; the two findings are non-blocking performance observations on host_algorithms.h.

Important Files Changed

Filename Overview
include/matx/transforms/host_algorithms.h New header providing parallel Thrust-OMP dispatch wrappers with threshold-based fallbacks to std:: algorithms; host_any_of/host_all_of lose short-circuit behavior vs std:: counterparts.
include/matx/core/iterator.h Adds operator-, operator--, and relational operators to RandomOperatorIterator for Thrust compatibility; changes reference alias to value_type (by-value); makes RandomOperatorOutputIterator::operator* const with const_cast.
include/matx/transforms/reduce.h All std:: reduce/scan/min/max/any/all calls replaced with parallel-aware detail::host_* wrappers; also fixes a latent bug where sum_impl rank-0 used lin.Size(0) instead of TotalSize(in) for multi-dimensional inputs.
include/matx/transforms/cub.h Replaces direct std::sort, std::partial_sort_copy, std::partial_sum, and std::unique calls with detail::host_* wrappers; semantically equivalent for all call sites where input and output sizes match.
examples/simple_radar_pipeline.h Defers conv2d CFAR norm computation to first use via a lazy-init flag; ComputeCFARNorm is called in both CFARDetections() and GetnormT().
test/01_radar/MultiChannelRadarPipeline.cu Converts per-test SetUp/TearDown to suite-level SetUpTestSuite/TearDownTestSuite with inline static members, so the Python backend is initialized once per type parameterization instead of once per test case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["host_* wrapper called"] --> B{MATX_EN_OMP enabled?}
    B -- No --> F["std:: algorithm - sequential fallback"]
    B -- Yes --> C{MODE != SINGLE?}
    C -- No --> F
    C -- Yes --> D{threads gt 1 and not omp_in_parallel?}
    D -- No --> F
    D -- Yes --> E{distance >= min_elements threshold?}
    E -- No --> F
    E -- Yes --> G["ScopedOmpNumThreads guard - set N threads"]
    G --> H["thrust:: algorithm with omp::par"]
    H --> I["Restore previous thread count"]
    I --> Z[Return]
    F --> Z
Loading

Reviews (5): Last reviewed commit: "Avoid Thrust overhead for small host ran..." | Re-trigger Greptile

Comment thread include/matx/transforms/host_algorithms.h Outdated
Comment thread include/matx/transforms/host_algorithms.h
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

2 similar comments
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

This passed but CI timed out when making coverage files.

@cliffburdick cliffburdick merged commit 580b8cc into main May 26, 2026
1 check failed
@cliffburdick cliffburdick deleted the cburdick/host-thrust-parallel-algorithms branch May 26, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant