Implement CPU polyphase resampler#1190
Conversation
Greptile SummaryThis PR adds CPU polyphase resampler support to the previously CUDA-only
Confidence Score: 4/5Mostly safe to merge; the CPU resampling algorithm and executor dispatch are correct, but reusing the same ResamplePolyOp instance across multiple run() calls will access freed memory due to the missing prerun_done_ reset in PostRun. The CPU accumulation loop, filter-index bounds, batch dispatch, and even-filter zero-padding all check out against the independent reference implementation in the tests. The one concrete defect is PostRun freeing ptr without resetting prerun_done_, which causes a use-after-free on any second run() — this path is now reachable for both CUDA and host executors. include/matx/operators/resample_poly.h — specifically the PostRun / prerun_done_ lifecycle Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["resample_poly(in, f, up, down)"] --> B["ResamplePolyOp::PreRun"]
B --> C{is_host_executor?}
C -- Yes --> D["AllocateTempTensor\n(MATX_HOST_MALLOC_MEMORY)"]
C -- No --> E["AllocateTempTensor\n(MATX_DEVICE_MEMORY)"]
D --> F["Exec → resample_poly_impl(exec)"]
E --> F
F --> G{executor type}
G -- cudaExecutor --> H["matxResamplePoly1DInternal\n(cudaStream_t)"]
G -- HostExecutor --> I["matxResamplePoly1DInternal\n(HostExecutor)"]
G -- other --> J["static_assert fails"]
H --> K["GPU kernel dispatch\n(ElemBlock/WarpCentric/PhaseBlock)"]
I --> L{MATX_EN_OMP\n& threads>1?}
L -- Yes --> M["#pragma omp parallel for\nbatch_idx parallelism"]
L -- No --> N["Serial batch loop"]
M --> O["run_batch(batch_idx)\nfor each 1D signal"]
N --> O
|
1e8205e to
129bb2a
Compare
|
@greptile review |
|
/build |
129bb2a to
87656b5
Compare
Closes #767