feat: Add channel downmixing for microphone audio frames #1373

richiemcilroy · 2025-11-10T17:49:20Z

Introduces channel downmixing support for microphone audio sources, ensuring that audio frames are converted to mono when required. The main changes include adding logic to downmix multi-channel audio to mono, updating how audio frames are processed

Summary by CodeRabbit

New Features
- Microphone source now handles multi-channel audio downmixing, automatically converting stereo to mono when needed.
Bug Fixes / Improvements
- Audio mixer delivery improved: retries when outbound buffer is full, preserves timestamps, and reuses frames to ensure consistent output timing and accurate sample accounting.
Tests
- Added tests validating microphone channel downmixing behavior.

coderabbitai · 2025-11-10T17:49:33Z

Walkthrough

Adds per-frame downmix-to-mono logic in the microphone source (supports multiple SampleFormat variants) and introduces a private feed lock field on Microphone; changes the audio mixer to build a single frame for send attempts and retry sending (sleeping briefly) when the channel is full.

Changes

Cohort / File(s)	Change Summary
Microphone: downmix, sample handling, struct update `crates/recording/src/sources/microphone.rs`	Adds `maybe_downmix_channels`, `downmix_to_mono`, and helpers (`average_frame_sample`, `sample_to_f64`, `write_sample_from_f64`) to conditionally downmix multi-channel frames to mono across SampleFormat variants (U8, I16, I32, I64, F32, F64). Reworks frame wrapping to use possibly-downmixed packed data, computes AudioInfo with MAX_CHANNELS target, adds private `_lock: Arc<MicrophoneFeedLock>` field on `Microphone`, and adds tests for downmix behavior.
Audio mixer: send retry loop and frame reuse `crates/recording/src/sources/audio_mixer.rs`	Changes amix duration mode to `longest`, constructs the `AudioFrame` once and reuses it for send attempts, replaces a single `try_send` with a retry loop that sleeps briefly when the channel is full (retries until sent or error), updates `samples_out` from actual frame length, and preserves timestamp alignment via frame origin.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Mic as Microphone
  participant Down as Downmixer
  participant Mixer as AudioMixer
  participant Ch as OutputChannel

  Mic->>Down: captured multi-channel frame
  Down-->>Mic: maybe_downmix_channels -> borrowed or owned mono/packed frame
  Mic->>Mixer: push AudioFrame (with packed data & AudioInfo)
  Note right of Mixer `#E8F0FF`: Build frame once\nreuse across retries
  Mixer->>Ch: try_send(frame)
  alt channel full
    Ch--xMixer: Full
    Mixer->>Mixer: sleep(short)
    Mixer->>Ch: retry try_send(frame) (loop)
  else other error
    Ch--xMixer: Error -> Mixer returns Err
  end
  Ch-->>Mixer: Ack (sent)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas to focus:
- crates/recording/src/sources/microphone.rs: correctness of sample conversions, clamping/rounding behavior, memory ownership when returning borrowed vs owned buffers, and test coverage for edge sample values and formats.
- crates/recording/src/sources/audio_mixer.rs: correctness of retry/backpressure logic, sleep duration impacts, frame timestamp/origin preservation, and possible performance/latency implications.
- Integration: ensure public struct change (_lock field) doesn't break external construction/usage or serialization.

Possibly related PRs

Fix: Support multi-channel audio recording for USB audio interfaces #1176: touches microphone channel handling and AudioInfo/MAX_AUDIO_CHANNELS — directly related to the microphone downmix and target-channel logic.
Hold camera and mic locks while recording is active #1162: adds/propagates Arc<MicrophoneFeedLock> on Microphone — directly related to the added _lock field and initialization changes.
Refactor audio input feed to use actor #937: refactors MicrophoneFeed/MicrophoneFeedLock and related types used by the microphone source — likely a dependency of the lock/downmix plumbing.

Poem

🐇
I nibble frames by moonlit code,
Stereo threads into a single road,
Each sample averaged, rounded, neat,
A tiny lock keeps rhythm sweet,
Hoppity hop — the stream’s complete! 🎧

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding channel downmixing for microphone audio frames, which is the primary focus of the modifications in microphone.rs.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch instant-mono-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

crates/recording/src/sources/microphone.rs (4)
72-91: Consider adding doc comments.

The function logic is sound with proper validation and fallback handling. However, adding doc comments would improve maintainability by documenting the parameters, return value, and behavior.

Example:
/// Downmixes audio data from multiple channels to fewer channels.
/// Returns borrowed data if no downmixing is needed, or owned data if downmixed.
///
/// # Arguments
/// * `data` - Raw audio sample bytes
/// * `format` - Sample format (U8, I16, F32, etc.)
/// * `source_channels` - Number of channels in source data
/// * `target_channels` - Desired number of output channels
fn maybe_downmix_channels<'a>(
93-111: LGTM! Consider adding doc comments.

The implementation is robust with proper overflow protection (checked_mul) and validation. The frame alignment check ensures data integrity. Adding doc comments would document the algorithm and edge cases.

113-200: LGTM! Consider adding doc comments.

The helper functions correctly handle sample format conversions with appropriate clamping for integer formats and rounding. The use of from_ne_bytes / to_ne_bytes is appropriate for in-memory audio processing. Adding doc comments would clarify the purpose and expected inputs for each function.

202-234: LGTM! Consider expanding test coverage.

The tests validate core functionality: stereo-to-mono downmixing with correct averaging and zero-copy optimization for mono input. Consider adding tests for:

Other sample formats (I16, I32, I64, U8)

Edge cases (empty data, unsupported format)

Multiple source channels (3+)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25f0bfb and e4cca52.

📒 Files selected for processing (1)

crates/recording/src/sources/microphone.rs (4 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust code using rustfmt and ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.

Files:

crates/recording/src/sources/microphone.rs

crates/*/src/**/*

📄 CodeRabbit inference engine (AGENTS.md)

Rust crates should place tests within the src/ and/or a sibling tests/ directory for each crate inside crates/*.

Files:

crates/recording/src/sources/microphone.rs

🧠 Learnings (1)

📚 Learning: 2025-10-28T08:39:42.230Z

Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.

Applied to files:

crates/recording/src/sources/microphone.rs

🧬 Code graph analysis (1)

crates/recording/src/sources/microphone.rs (4)

crates/recording/src/sources/screen_capture/windows.rs (2)

audio_info (44-54)

audio_info (369-371)

crates/recording/src/output_pipeline/core.rs (7)

audio_info (740-742)

audio_info (826-826)

new (307-325)

new (675-677)

new (718-720)

new (751-753)

new (795-812)

crates/recording/src/feeds/microphone.rs (1)

audio_info (243-245)

crates/media-info/src/lib.rs (2)

new (29-45)

sample_size (121-123)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Clippy (aarch64-apple-darwin, macos-latest)
GitHub Check: Analyze (rust)

🔇 Additional comments (2)

crates/recording/src/sources/microphone.rs (2)

7-11: LGTM!

The constant declaration is clear and the additional imports are appropriate for the downmixing functionality.

31-62: LGTM!

The setup logic correctly computes source and target channel counts and applies downmixing in the audio processing pipeline. The use of Cow for packed data ensures zero-copy when no downmixing is needed.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/recording/src/sources/microphone.rs (2)

44-56: Consider logging failures in the downmix-and-send loop.

The spawned task silently ignores send errors (line 51-56). If the channel is full or closed, frames are dropped without any indication. Consider logging failed sends for observability, especially since this could indicate backpressure or pipeline issues.

                 let _ = audio_tx
                     .send(AudioFrame::new(
                         audio_info.wrap_frame(packed.as_ref()),
                         frame.timestamp,
                     ))
-                    .await;
+                    .await
+                    .inspect_err(|e| {
+                        eprintln!("Failed to send microphone audio frame: {e}");
+                    });

202-234: Consider expanding test coverage to other sample formats.

The current tests only cover SampleFormat::F32. Adding test cases for integer formats (e.g., I16, I32) would increase confidence in the downmix logic across different sample types.

Example test for I16:

#[test]
fn downmixes_stereo_i16_to_mono() {
    let frames = [(1000i16, -500i16), (2000i16, 2000i16)];
    let mut data = Vec::new();

    for (left, right) in frames {
        data.extend_from_slice(&left.to_ne_bytes());
        data.extend_from_slice(&right.to_ne_bytes());
    }

    let downmixed = maybe_downmix_channels(&data, SampleFormat::I16, 2, 1);
    let owned = downmixed.into_owned();
    assert_eq!(owned.len(), frames.len() * std::mem::size_of::<i16>());

    let first = i16::from_ne_bytes(owned[0..2].try_into().unwrap());
    let second = i16::from_ne_bytes(owned[2..4].try_into().unwrap());

    assert_eq!(first, 250);  // (1000 + -500) / 2
    assert_eq!(second, 2000); // (2000 + 2000) / 2
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d6a3c8 and 9fe8c3d.

📒 Files selected for processing (1)

crates/recording/src/sources/microphone.rs (4 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust code using rustfmt and ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.

Files:

crates/recording/src/sources/microphone.rs

crates/*/src/**/*

📄 CodeRabbit inference engine (AGENTS.md)

Rust crates should place tests within the src/ and/or a sibling tests/ directory for each crate inside crates/*.

Files:

crates/recording/src/sources/microphone.rs

🧠 Learnings (2)

📓 Common learnings

Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.

📚 Learning: 2025-10-28T08:39:42.230Z

Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.

Applied to files:

crates/recording/src/sources/microphone.rs

🧬 Code graph analysis (1)

crates/recording/src/sources/microphone.rs (3)

crates/recording/src/sources/screen_capture/windows.rs (2)

audio_info (44-54)

audio_info (369-371)

crates/recording/src/feeds/microphone.rs (1)

audio_info (243-245)

crates/media-info/src/lib.rs (2)

new (29-45)

sample_size (121-123)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (rust)

🔇 Additional comments (7)

crates/recording/src/sources/microphone.rs (7)

11-11: LGTM: Clear constant for target channel count.

The constant is well-named and correctly set to 1 for mono output.

31-34: LGTM: Proper channel computation.

The code correctly extracts source channel info and computes target channel info for downmixing.

93-111: LGTM: Solid downmix implementation with proper validation.

The function correctly validates frame alignment, pre-allocates the output buffer, and iterates through frames to compute the averaged mono samples.

113-123: LGTM: Correct sample size mapping.

All sample format sizes are accurate, and unsupported formats gracefully return None.

125-139: LGTM: Simple averaging for channel downmix.

The equal-weight averaging approach is appropriate for microphone audio. More sophisticated downmixing (e.g., weighted center/LFE channels) is typically only needed for surround sound sources.

141-171: LGTM: Comprehensive sample format reading.

The function correctly handles all supported sample formats using native endian byte order, which is appropriate for local audio processing.

173-200: LGTM: Safe sample writing with clamping.

The function properly rounds and clamps integer samples to prevent overflow, and consistently uses native endian byte order.

coderabbitai · 2025-11-10T21:34:05Z

crates/recording/src/sources/microphone.rs

+    if target_channels == 1 {
+        if let Some(samples) = downmix_to_mono(data, format, source_channels) {
+            Cow::Owned(samples)
+        } else {
+            Cow::Borrowed(data)
+        }
+    } else {
+        Cow::Borrowed(data)
+    }
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Potential data/metadata mismatch when downmix fails.

If downmix_to_mono returns None (line 83), the code falls back to the original multi-channel data (line 86). However, the AudioInfo at line 53 indicates mono channels. This mismatch could cause encoder errors or corrupted audio downstream.

Consider either:

Logging the failure and using a zeroed mono frame, or

Propagating the error to halt the stream

Run the following to check how AudioInfo mismatches are handled downstream:

🏁 Script executed:

#!/bin/bash # Search for how wrapped frames are processed in the encoder rg -nP --type=rust -C3 'AudioFrame.*wrap_frame|fn.*wrap_frame'

Length of output: 1273

--- 🏁 Script executed: ```shell #!/bin/bash # Get full implementation of wrap_frame_with_max_channels sed -n '137,190p' crates/media-info/src/lib.rs

Length of output: 2253

🏁 Script executed:

#!/bin/bash # Check how maybe_downmix_channels is called and if AudioInfo is updated rg -nP --type=rust -C5 'maybe_downmix_channels|downmix_to_mono'

Length of output: 5057

🏁 Script executed:

#!/bin/bash # Search for AudioInfo modification after downmix attempt rg -nP --type=rust -C3 'AudioInfo.*target_channels|channels.*=' crates/recording/src/

Length of output: 2191

--- 🏁 Script executed: ```shell #!/bin/bash # Check how frames are actually processed after wrap_frame rg -nP --type=rust -C3 'wrap_frame.*packed|packed.*wrap_frame' crates/recording/src/sources/microphone.rs

Length of output: 277

🏁 Script executed:

#!/bin/bash # Check what with_max_channels does rg -nP --type=rust -B2 -A8 'fn with_max_channels'

Length of output: 580

Fix critical audio data/metadata mismatch when downmix fails.

When downmix_to_mono returns None (line 83), the code returns the original multi-channel data unchanged. However, audio_info at line 32 already has channels set to the target (mono via with_max_channels). This causes wrap_frame at line 53 to interpret multi-channel data with mono metadata:

wrap_frame_with_max_channels line 147 calculates packed_sample_size = sample_size * self.channels (1)

But the actual data has stereo stride (sample_size * 2)

This misalignment corrupts frame count and chunk reading, causing silent audio data corruption

Required fix (lines 82–91):

Either propagate the downmix failure to halt the stream, or create a valid mono frame from the source data (e.g., downsampled or silence). Do not return misaligned data.

🤖 Prompt for AI Agents

In crates/recording/src/sources/microphone.rs around lines 82 to 91, when downmix_to_mono(data, format, source_channels) returns None the code currently returns the original multi‑channel data while audio_info (line 32) already reports mono, causing a data/metadata mismatch and corrupted frames; fix by not returning misaligned data — either propagate the failure (return an Err or None so the stream terminates upstream) or synthesize a valid mono buffer (for example generate a mono buffer by averaging/combining the source channels into a newly owned Vec<u8> or, if combining is impossible, fill a zeroed mono buffer of the correct length and sample size) and return Cow::Owned of that buffer so audio_info channels and the data layout always match.

Add channel downmixing for microphone audio frames

e4cca52

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

richiemcilroy added 2 commits November 10, 2025 21:17

fix: Recording system audio with mic on Windows (Instant mode)

4d6a3c8

Use is_multiple_of for frame size validation

9fe8c3d

richiemcilroy merged commit 3bbf641 into main Nov 10, 2025
15 of 16 checks passed

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add channel downmixing for microphone audio frames #1373

feat: Add channel downmixing for microphone audio frames #1373

richiemcilroy commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add channel downmixing for microphone audio frames #1373

feat: Add channel downmixing for microphone audio frames #1373

Conversation

richiemcilroy commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richiemcilroy commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading