Skip to content

perf(rt): eliminate permit_alloc sites on the RT audio path (#239)#254

Merged
OpenSauce merged 4 commits into
mainfrom
fix/239-rt-alloc-elimination
Jun 5, 2026
Merged

perf(rt): eliminate permit_alloc sites on the RT audio path (#239)#254
OpenSauce merged 4 commits into
mainfrom
fix/239-rt-alloc-elimination

Conversation

@OpenSauce
Copy link
Copy Markdown
Owner

@OpenSauce OpenSauce commented Jun 3, 2026

Closes #239.

Removes all six permit_alloc escape hatches from audio/engine.rs and routes the remaining drop-on-RT cases through rt_drop.retire, so the real-time process() path and the engine message-handler arms allocate and deallocate nothing. Verified by an expanded allocation audit.

Allocations removed from Engine::process()

Site Fix
peak_meter.rs Arc::new per block Three atomics (AtomicU32×2 + AtomicBool) — no refcounting
recorder.rs Vec::with_capacity per block Pre-allocated recycling buffer pool (new max_block_samples arg)
pitch_shifter construction on RT Built off-thread in set_pitch_shift, shipped as SetPitchShift(Option<Box<PitchShifter>>)

Dealloc-on-RT routed through rt_drop.retire

  • SetSamplerssamplers held as Box<Samplers>; swap pointers, retire the box directly (no Box::new type-erasure on RT).
  • SwapIrConvolver — cabinet holds Box<Convolver>; swap in place, retire the whole PreparedIr (old convolver + name String) off-RT.
  • SetInputFiltersmem::replace + retire instead of dropping the previous filters on assignment.
  • AddStageAmplifierChain reserves DEFAULT_CHAIN_CAPACITY up front so the insert doesn't reallocate.

Recorder: now fully RT-safe (non-blocking)

The recorder also dropped its blocking send (a parking call that can stall the audio thread on disk I/O — as much an RT hazard as allocating). record_block now try_sends a pooled buffer and, if the writer falls behind, drops the block and bumps an overruns() counter rather than blocking. The buffer pool is sized by audio rate (BUFFER_SECONDS of slack) — bounded (no RT alloc) but effectively unbounded in practice (~1.5 MB). A disk stall degrades the recording (a gap) instead of the live output (an xrun), matching DAW behavior.

Tests

  • New message_arms module: queues each message off-thread and drains it via handle_messages() inside the alloc audit — defeats the "warm-up blindness" the issue called out.
  • New metronome-processing test; wrapped the standalone tuner and samplers tests.
  • assert_no_alloc moved to dev-dependencies (test-only now).

Result: zero permit_alloc remain in any src/; make lint clean (incl. -D clippy::pedantic -D clippy::nursery); all tests pass — 28 no_alloc audit tests, 153 core lib, 7 standalone engine, plus the rest.

Out of scope (per the issue — separate PRs)

  • Plugin RustortionPlugin::process — flip nih-plug's assert_process_allocs feature.
  • Standalone JACK callback — wrap in assert_no_alloc for dev/debug builds.
  • Tuner readout (Arc::new + String when pitch detection fires) — not on the audio-output path; not in the issue's permit_alloc list.

OpenSauce added 2 commits June 3, 2026 23:03
#239)

Remove all six `permit_alloc` escape hatches from `audio/engine.rs` and route
the remaining drop-on-RT cases through `rt_drop.retire`, so the real-time
process path and message-handler arms allocate and deallocate nothing.

- peak_meter: replace `Arc<ArcSwap<PeakMeterInfo>>` with three atomics
  (`AtomicU32` x2 + `AtomicBool`) — no per-block `Arc::new`.
- recorder: pre-allocate a recycling buffer pool in `Recorder::new` (sized via
  a new `max_block_samples` arg) instead of `Vec::with_capacity` per block.
- pitch shifter: construct `PitchShifter` off the RT thread in
  `EngineHandle::set_pitch_shift` and ship it as
  `SetPitchShift(Option<Box<PitchShifter>>)`; the RT side just swaps + retires.
- samplers: hold as `Box<Samplers>` so `SetSamplers` swaps pointers and retires
  the old box directly (no `Box::new` type-erasure on RT).
- IR cabinet: hold convolver as `Box<Convolver>`; `SwapIrConvolver` swaps in
  place and retires the whole `PreparedIr` (old convolver + name) off-RT.
- input filters: `SetInputFilters` now `mem::replace` + retire instead of
  dropping the previous filters on assignment.
- chain: reserve `DEFAULT_CHAIN_CAPACITY` up front so `AddStage` doesn't realloc.

Tests: add a `message_arms` module that queues each message off-thread and
drains it via `handle_messages()` inside the alloc audit (defeats warm-up
blindness); add a metronome processing test; wrap the standalone tuner and
samplers tests. Move `assert_no_alloc` to dev-dependencies (test-only now).
Follow-up on the #239 recorder fix. The previous step removed the per-block
`Vec` allocation but kept a blocking `send` for backpressure — which can park
the audio thread on disk I/O (and crossbeam's parking allocates). Blocking is
as much an RT hazard as allocating.

Make the handoff fully non-blocking: `record_block` `try_send`s a pooled buffer
and, if the writer has fallen behind, drops the block and records an overrun
instead of blocking. The audio output thread is never held hostage by the disk
— a disk stall now degrades the recording (a gap) rather than the live output
(an xrun), matching how DAWs behave.

- Size the buffer pool / handoff channel by audio rate (`BUFFER_SECONDS` of
  slack) so the writer can lag several seconds before anything is dropped —
  bounded (no RT alloc) but effectively unbounded in practice (~1.5 MB).
- Add an `overruns()` counter so dropped audio is observable rather than silent.
- test_recorder verifies the buffered path stays lossless under load.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes real-time (RT) thread allocation escape hatches and restructures several engine components so that the audio process() path and message-handler drains are allocation-free under normal operation, with expanded no_alloc audits to prevent regressions.

Changes:

  • Reworked RT hot-path components to avoid per-block heap traffic (peak meter atomics, recorder buffer pool, pitch shifter constructed off-thread and swapped in via messages).
  • Routed RT-thread deallocation cases through rt_drop.retire by swapping boxed payloads (samplers, IR convolver, input filters, pitch shifter).
  • Expanded allocation-audit test coverage, including message-arm draining inside the asserted scope and a metronome processing audit.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
rustortion-standalone/tests/engine.rs Wraps steady-state sampler/tuner-enabled paths in allocation audits.
rustortion-standalone/src/gui/app.rs Passes buffer size into start_recording for recorder pool sizing.
rustortion-plugin/src/ir_helper.rs Boxes convolver in PreparedIr to enable RT-safe swapping/retire.
rustortion-core/tests/no_alloc.rs Adds message-arm drain audits and metronome processing audit; updates tests for new APIs.
rustortion-core/src/ir/load_service.rs Boxes convolver in PreparedIr before sending to engine.
rustortion-core/src/ir/cabinet.rs Stores convolver as Box<Convolver>; adds RT-safe pointer swap API.
rustortion-core/src/audio/recorder.rs Implements preallocated buffer pool + non-blocking try_send recorder path with overrun counting.
rustortion-core/src/audio/peak_meter.rs Replaces ArcSwap updates with atomics to avoid per-block Arc::new.
rustortion-core/src/audio/engine.rs Removes permit_alloc usage; swaps boxed samplers/pitch shifter; retires old objects via rt_drop.
rustortion-core/src/amp/chain.rs Reserves default stage capacity up front to avoid RT reallocations on AddStage.
rustortion-core/Cargo.toml Moves assert_no_alloc to dev-dependencies (test-only).
rustortion-core/benches/common/mod.rs Updates IR cabinet setup to use set_convolver.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 342 to 345
EngineMessage::SetSamplers(new_samplers) => {
let old = std::mem::replace(&mut self.samplers, *new_samplers);
permit_alloc(|| self.rt_drop.retire(Box::new(old)));
let old = std::mem::replace(&mut self.samplers, new_samplers);
self.rt_drop.retire(old);
debug!("Samplers swapped");
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in 83d0ff7 (and the related dealloc gaps in dafb090).

retire() now leaks the box via mem::forget on a full/disconnected channel instead of dropping it on the RT thread, so no free() runs on the audio thread under channel pressure. The event is surfaced through a new leaked() counter, and the channel capacity was raised 16→64. A unit test (retire_leaks_instead_of_dropping_when_full) asserts the overflow path leaks rather than deallocates.

While here, two more alloc-on-RT paths in the same spirit were closed: the recorder no longer constructs an anyhow error on RT when its writer thread has exited, and the amp chain's AddStage path no longer reallocates its stage Vec past the reserved capacity (it rejects and retires off-RT, with the UI capping chain length).

OpenSauce added 2 commits June 5, 2026 20:34
- rt_drop: retire() leaked the Box on the RT thread when the bounded
  channel was full or disconnected (free() on RT). Leak it via
  mem::forget instead and surface the event through a leaked() counter;
  bump channel capacity 16 -> 64.
- recorder: record_block constructed an anyhow error (alloc) on RT when
  the writer thread had exited. Count it as an overrun instead; return
  () rather than Result.
- recorder pool: size for max(period, MAX_BUFFER_FRAMES) so a
  mid-recording JACK buffer-size increase doesn't drop every block.
insert_stage (the RT AddStage path) reallocated the stages Vec once a
chain grew past the reserved capacity, reintroducing alloc (and dealloc
of the old buffer) on the audio thread. A standalone user could trigger
it by adding stages live past the reserve.

- Reserve raised 16 -> 64 (~1.5 KB) and made the hard cap.
- insert_stage now refuses to grow past reserved capacity, returning the
  rejected stage so the engine retires it via rt_drop instead of freeing
  it on RT.
- UI caps the chain at DEFAULT_CHAIN_CAPACITY and disables Add Stage at
  the cap.
- no_alloc: add add_stage_at_capacity_does_not_allocate covering the
  capacity-crossing drain.
@OpenSauce OpenSauce marked this pull request as ready for review June 5, 2026 19:57
@OpenSauce OpenSauce merged commit e355682 into main Jun 5, 2026
7 checks passed
@OpenSauce OpenSauce deleted the fix/239-rt-alloc-elimination branch June 5, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eliminate permit_alloc sites on the RT audio path

2 participants