Use Thrust for multithreaded host algorithms#1181
Conversation
|
/build |
Greptile SummaryThis PR replaces single-threaded
Confidence Score: 5/5Safe to merge; all changes are well-guarded behind OMP availability and element-count thresholds, with correct sequential fallbacks throughout. The dispatch layer is defensive — it checks for OMP availability, nested-parallelism, thread count, and minimum data sizes before ever calling Thrust. All existing No files require special attention; the two findings are non-blocking performance observations on Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["host_* wrapper called"] --> B{MATX_EN_OMP enabled?}
B -- No --> F["std:: algorithm - sequential fallback"]
B -- Yes --> C{MODE != SINGLE?}
C -- No --> F
C -- Yes --> D{threads gt 1 and not omp_in_parallel?}
D -- No --> F
D -- Yes --> E{distance >= min_elements threshold?}
E -- No --> F
E -- Yes --> G["ScopedOmpNumThreads guard - set N threads"]
G --> H["thrust:: algorithm with omp::par"]
H --> I["Restore previous thread count"]
I --> Z[Return]
F --> Z
Reviews (5): Last reviewed commit: "Avoid Thrust overhead for small host ran..." | Re-trigger Greptile |
|
/build |
2 similar comments
|
/build |
|
/build |
|
/build |
1 similar comment
|
/build |
|
/build |
|
This passed but CI timed out when making coverage files. |
The CPU sorting, reduction and prefix sum only used a single-threaded version. This adds the parallel versions via thrust algorithms, Speedups are shown below on a 20-core ARM: