perf: per-block chain processing + batched NAM process_buffer (2.8×)#249
Merged
Conversation
Both engine paths looped chain.process(sample) one sample at a time, even though AmplifierChain::process_block and the Stage::process_block trait method already existed. Call process_block instead so per-stage work (bypass branch, stage-list walk) happens once per block rather than once per sample, and so stages with batched process_block overrides are actually exercised. Measured on the analog chain: ~31% faster at 1x oversampling, converging to parity as oversampling rises and real DSP work dominates loop overhead.
NamStage inherited the default Stage::process_block (a per-sample process_sample loop), so the engine's per-block path never reached nam-rs's batched Model::process_buffer. Override process_block to apply input gain, run process_buffer over the block, then apply output gain and the dry/wet mix, using a preallocated scratch buffer for the dry signal so steady-state processing never allocates on the RT thread. On the standard WaveNet reference model this cuts the NAM chain block from ~824us to ~293us per 128-sample block (2.8x; ~64% less CPU), matching the raw process_buffer ceiling. A parity test asserts the block path matches the per-sample path within 1e-5. Vendor reference_standard.nam (MIT, from nam-rs) into tests/fixtures so the parity test and NAM benchmark groups run deterministically in CI rather than depending on a user's gitignored nam/ models. Add is_active() to NamStage and NAM benchmark groups (chain sample-vs-block + raw process_buffer ceiling).
Contributor
There was a problem hiding this comment.
Pull request overview
This PR wires AmplifierChain::process_block into both engine processing paths (previously the engine looped per-sample even though process_block existed) and adds a NamStage::process_block override that uses nam-rs's batched Model::process_buffer with a preallocated scratch buffer for the dry signal. A vendored MIT reference NAM model is committed under tests/fixtures/ to back a new parity test and dedicated NAM benchmark groups.
Changes:
- Replace per-sample loops in
process_without_upsampling/ oversampled path withchain.process_block(). - Add
NamStage::process_block(batched, allocation-free in steady state) andis_active()accessor; add parity test against the per-sample path. - Add
bench_nam_sample_vs_blockandbench_nam_buffer_vs_samplebenchmark groups; vendorreference_standard.namwith attribution README.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| rustortion-core/src/audio/engine.rs | Switch both process paths to block processing |
| rustortion-core/src/amp/stages/nam.rs | Add batched process_block, dry-scratch buffer, is_active(), and parity test |
| rustortion-core/benches/chain.rs | New NAM chain + raw model bench groups using vendored fixture |
| rustortion-core/tests/fixtures/README.md | Attribution/license notes for vendored reference_standard.nam |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The model loads from tests/fixtures/, not the workspace nam/ directory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
process_blockexisted on theStagetrait andAmplifierChainbut the engine never called it — both process paths loopedchain.process(sample)one sample at a time. And even once wired in, the NAM stage left the big win on the table becauseNamStageinherited the default per-sample trait loop instead of calling nam-rs's batchedModel::process_buffer.This PR fixes both.
Changes
engine.rs— both process paths now callchain.process_block()instead of a per-sample loop. Per-stage work (bypass branch, stage-list walk) happens once per block, and stages with batchedprocess_blockoverrides are actually exercised.NamStage::process_blockoverride — applies input gain → batchedprocess_buffer→ output gain + dry/wet mix, using a preallocated scratch buffer for the dry signal so steady-state processing never allocates on the RT thread.is_active()accessor onNamStage.reference_standard.nam(MIT, from nam-rstests/fixtures/) intorustortion-core/tests/fixtures/with attribution, so the parity test and NAM benchmark groups run deterministically in CI rather than depending on a user's gitignorednam/models.process_buffervsprocess_sampleceiling group tochain.rs.Results
For NAM users this is roughly a 64% CPU cut on the dominant stage — landing on nam-rs's raw
process_bufferceiling (288 µs) — and it's live in the real engine path, not just the benchmark. The analog-chain win is loop-order/cache-locality, largest at low oversampling and converging to parity as real DSP work dominates.Verification
make lint— cleanmake test— all pass, includingblock_matches_per_sample_with_real_model