Skip to content

Implement CPU polyphase resampler#1190

Merged
cliffburdick merged 1 commit into
mainfrom
cburdick/issue-767-cpu-resample-poly
May 27, 2026
Merged

Implement CPU polyphase resampler#1190
cliffburdick merged 1 commit into
mainfrom
cburdick/issue-767-cpu-resample-poly

Conversation

@cliffburdick
Copy link
Copy Markdown
Collaborator

@cliffburdick cliffburdick commented May 26, 2026

Closes #767

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cliffburdick cliffburdick requested a review from tbensonatl May 26, 2026 20:51
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR adds CPU polyphase resampler support to the previously CUDA-only resample_poly transform by introducing a new HostExecutor overload of matxResamplePoly1DInternal and updating the operator/impl dispatch chain to route by executor type via if constexpr.

  • New CPU kernel iterates over batches (optionally in parallel via OpenMP) and replicates the polyphase accumulation logic including even-filter zero-padding semantics; the new ApplyOpWithIdx helper enables generic-operator indexing via cuda::std::apply.
  • AllocateTempTensor gains an optional host_memory_space parameter (default: pinned MATX_HOST_MEMORY) so the operator can request pageable malloc memory for host-only execution paths.
  • ResamplePolyOp::PostRun frees ptr but does not reset prerun_done_ — a second .run() on the same operator instance would skip PreRun, leaving tmp_out_ pointing at freed memory before calling Exec. Other operators in the codebase (e.g., sum, sort) explicitly reset this flag in PostRun.

Confidence Score: 4/5

Mostly safe to merge; the CPU resampling algorithm and executor dispatch are correct, but reusing the same ResamplePolyOp instance across multiple run() calls will access freed memory due to the missing prerun_done_ reset in PostRun.

The CPU accumulation loop, filter-index bounds, batch dispatch, and even-filter zero-padding all check out against the independent reference implementation in the tests. The one concrete defect is PostRun freeing ptr without resetting prerun_done_, which causes a use-after-free on any second run() — this path is now reachable for both CUDA and host executors.

include/matx/operators/resample_poly.h — specifically the PostRun / prerun_done_ lifecycle

Important Files Changed

Filename Overview
include/matx/transforms/resample_poly.h Adds new HostExecutor overload of matxResamplePoly1DInternal, updates resample_poly_impl to dispatch by executor type via constexpr if, and adds ApplyOpWithIdx helper. Core CPU algorithm logic appears correct; minor naming clarity issue noted.
include/matx/operators/resample_poly.h Extends Exec() to accept HostExecutor and dispatches MATX_HOST_MALLOC_MEMORY for temp tensor allocation; static_assert updated for both executor types. Follows established operator pattern.
include/matx/core/operator_utils.h Adds optional host_memory_space parameter to AllocateTempTensor (defaulting to MATX_HOST_MEMORY) to allow callers to request pageable memory for CPU-only paths.
test/00_transform/ResamplePoly.cu Adds SmallSignalsMatchReference and BatchedSignalsMatchReference host-executor tests covering float, double, and complex types, with an independent reference implementation for cross-validation.
docs_input/executor_compatibility.rst Updates the executor compatibility table to reflect that resample_poly now supports host executor in addition to CUDA.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["resample_poly(in, f, up, down)"] --> B["ResamplePolyOp::PreRun"]
    B --> C{is_host_executor?}
    C -- Yes --> D["AllocateTempTensor\n(MATX_HOST_MALLOC_MEMORY)"]
    C -- No --> E["AllocateTempTensor\n(MATX_DEVICE_MEMORY)"]
    D --> F["Exec → resample_poly_impl(exec)"]
    E --> F
    F --> G{executor type}
    G -- cudaExecutor --> H["matxResamplePoly1DInternal\n(cudaStream_t)"]
    G -- HostExecutor --> I["matxResamplePoly1DInternal\n(HostExecutor)"]
    G -- other --> J["static_assert fails"]
    H --> K["GPU kernel dispatch\n(ElemBlock/WarpCentric/PhaseBlock)"]
    I --> L{MATX_EN_OMP\n& threads>1?}
    L -- Yes --> M["#pragma omp parallel for\nbatch_idx parallelism"]
    L -- No --> N["Serial batch loop"]
    M --> O["run_batch(batch_idx)\nfor each 1D signal"]
    N --> O
Loading

Comments Outside Diff (1)

  1. include/matx/operators/resample_poly.h, line 154-166 (link)

    P1 prerun_done_ not reset in PostRun

    PostRun frees ptr but never resets prerun_done_ to false. If the same ResamplePolyOp instance is passed to a second .run() call, PreRun short-circuits at the early-return guard, ptr / tmp_out_ are stale (the memory was freed), and Exec writes into freed memory. This pattern is pre-existing for the CUDA path, but the new CPU path inherits the same exposure. Several other operators in the codebase (e.g., sum.h, sort.h) reset prerun_done_ = false in their PostRun; the same fix should be applied here.

Reviews (4): Last reviewed commit: "Implement CPU polyphase resampler" | Re-trigger Greptile

Comment thread include/matx/core/operator_utils.h
Comment thread include/matx/transforms/resample_poly.h Outdated
@cliffburdick cliffburdick force-pushed the cburdick/issue-767-cpu-resample-poly branch from 1e8205e to 129bb2a Compare May 26, 2026 21:04
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage is 93.449%cburdick/issue-767-cpu-resample-poly into main. No base build found for main.

@cliffburdick cliffburdick force-pushed the cburdick/issue-767-cpu-resample-poly branch from 129bb2a to 87656b5 Compare May 27, 2026 16:41
@cliffburdick cliffburdick merged commit 7ef1425 into main May 27, 2026
@cliffburdick cliffburdick deleted the cburdick/issue-767-cpu-resample-poly branch May 27, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Implement CPU version of polyphase resampler

2 participants