perf: per-block chain processing + batched NAM process_buffer (2.8×) by OpenSauce · Pull Request #249 · OpenSauce/rustortion

OpenSauce · 2026-05-31T14:44:52Z

Summary

process_block existed on the Stage trait and AmplifierChain but the engine never called it — both process paths looped chain.process(sample) one sample at a time. And even once wired in, the NAM stage left the big win on the table because NamStage inherited the default per-sample trait loop instead of calling nam-rs's batched Model::process_buffer.

This PR fixes both.

Changes

engine.rs — both process paths now call chain.process_block() instead of a per-sample loop. Per-stage work (bypass branch, stage-list walk) happens once per block, and stages with batched process_block overrides are actually exercised.
NamStage::process_block override — applies input gain → batched process_buffer → output gain + dry/wet mix, using a preallocated scratch buffer for the dry signal so steady-state processing never allocates on the RT thread.
Parity test — asserts the block path matches the per-sample path within 1e-5.
is_active() accessor on NamStage.
Vendored reference_standard.nam (MIT, from nam-rs tests/fixtures/) into rustortion-core/tests/fixtures/ with attribution, so the parity test and NAM benchmark groups run deterministically in CI rather than depending on a user's gitignored nam/ models.
Benchmarks — added a NAM chain sample-vs-block group and a raw process_buffer vs process_sample ceiling group to chain.rs.

Results

Path	Per-sample	Block	Speedup
Analog chain, 1x	10.5 µs	7.0 µs	1.5×
Analog chain, 16x	114 µs	112 µs	~1.0×
NAM chain, 1x	824 µs	293 µs	2.8×

For NAM users this is roughly a 64% CPU cut on the dominant stage — landing on nam-rs's raw process_buffer ceiling (288 µs) — and it's live in the real engine path, not just the benchmark. The analog-chain win is loop-order/cache-locality, largest at low oversampling and converging to parity as real DSP work dominates.

Verification

make lint — clean
make test — all pass, including block_matches_per_sample_with_real_model

Both engine paths looped chain.process(sample) one sample at a time, even though AmplifierChain::process_block and the Stage::process_block trait method already existed. Call process_block instead so per-stage work (bypass branch, stage-list walk) happens once per block rather than once per sample, and so stages with batched process_block overrides are actually exercised. Measured on the analog chain: ~31% faster at 1x oversampling, converging to parity as oversampling rises and real DSP work dominates loop overhead.

NamStage inherited the default Stage::process_block (a per-sample process_sample loop), so the engine's per-block path never reached nam-rs's batched Model::process_buffer. Override process_block to apply input gain, run process_buffer over the block, then apply output gain and the dry/wet mix, using a preallocated scratch buffer for the dry signal so steady-state processing never allocates on the RT thread. On the standard WaveNet reference model this cuts the NAM chain block from ~824us to ~293us per 128-sample block (2.8x; ~64% less CPU), matching the raw process_buffer ceiling. A parity test asserts the block path matches the per-sample path within 1e-5. Vendor reference_standard.nam (MIT, from nam-rs) into tests/fixtures so the parity test and NAM benchmark groups run deterministically in CI rather than depending on a user's gitignored nam/ models. Add is_active() to NamStage and NAM benchmark groups (chain sample-vs-block + raw process_buffer ceiling).

Copilot

Pull request overview

This PR wires AmplifierChain::process_block into both engine processing paths (previously the engine looped per-sample even though process_block existed) and adds a NamStage::process_block override that uses nam-rs's batched Model::process_buffer with a preallocated scratch buffer for the dry signal. A vendored MIT reference NAM model is committed under tests/fixtures/ to back a new parity test and dedicated NAM benchmark groups.

Changes:

Replace per-sample loops in process_without_upsampling / oversampled path with chain.process_block().
Add NamStage::process_block (batched, allocation-free in steady state) and is_active() accessor; add parity test against the per-sample path.
Add bench_nam_sample_vs_block and bench_nam_buffer_vs_sample benchmark groups; vendor reference_standard.nam with attribution README.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File	Description
rustortion-core/src/audio/engine.rs	Switch both process paths to block processing
rustortion-core/src/amp/stages/nam.rs	Add batched `process_block`, dry-scratch buffer, `is_active()`, and parity test
rustortion-core/benches/chain.rs	New NAM chain + raw model bench groups using vendored fixture
rustortion-core/tests/fixtures/README.md	Attribution/license notes for vendored `reference_standard.nam`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The model loads from tests/fixtures/, not the workspace nam/ directory.

OpenSauce added 2 commits May 31, 2026 15:40

Copilot AI review requested due to automatic review settings May 31, 2026 14:44

Copilot started reviewing on behalf of OpenSauce May 31, 2026 14:45 View session

Copilot AI reviewed May 31, 2026

View reviewed changes

Comment thread rustortion-core/benches/chain.rs Outdated

fix(bench): correct NAM bench skip message to reference tests/fixtures

621242d

The model loads from tests/fixtures/, not the workspace nam/ directory.

OpenSauce merged commit 3e9ea24 into main May 31, 2026
7 checks passed

OpenSauce deleted the perf/nam-block-processing branch May 31, 2026 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: per-block chain processing + batched NAM process_buffer (2.8×)#249

perf: per-block chain processing + batched NAM process_buffer (2.8×)#249
OpenSauce merged 3 commits into
mainfrom
perf/nam-block-processing

OpenSauce commented May 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OpenSauce commented May 31, 2026

Summary

Changes

Results

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants