Skip to content

Conversation

@richiemcilroy
Copy link
Member

@richiemcilroy richiemcilroy commented Nov 10, 2025

Introduces channel downmixing support for microphone audio sources, ensuring that audio frames are converted to mono when required. The main changes include adding logic to downmix multi-channel audio to mono, updating how audio frames are processed

Summary by CodeRabbit

  • New Features

    • Microphone source now handles multi-channel audio downmixing, automatically converting stereo to mono when needed.
  • Bug Fixes / Improvements

    • Audio mixer delivery improved: retries when outbound buffer is full, preserves timestamps, and reuses frames to ensure consistent output timing and accurate sample accounting.
  • Tests

    • Added tests validating microphone channel downmixing behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 10, 2025

Walkthrough

Adds per-frame downmix-to-mono logic in the microphone source (supports multiple SampleFormat variants) and introduces a private feed lock field on Microphone; changes the audio mixer to build a single frame for send attempts and retry sending (sleeping briefly) when the channel is full.

Changes

Cohort / File(s) Change Summary
Microphone: downmix, sample handling, struct update
crates/recording/src/sources/microphone.rs
Adds maybe_downmix_channels, downmix_to_mono, and helpers (average_frame_sample, sample_to_f64, write_sample_from_f64) to conditionally downmix multi-channel frames to mono across SampleFormat variants (U8, I16, I32, I64, F32, F64). Reworks frame wrapping to use possibly-downmixed packed data, computes AudioInfo with MAX_CHANNELS target, adds private _lock: Arc<MicrophoneFeedLock> field on Microphone, and adds tests for downmix behavior.
Audio mixer: send retry loop and frame reuse
crates/recording/src/sources/audio_mixer.rs
Changes amix duration mode to longest, constructs the AudioFrame once and reuses it for send attempts, replaces a single try_send with a retry loop that sleeps briefly when the channel is full (retries until sent or error), updates samples_out from actual frame length, and preserves timestamp alignment via frame origin.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Mic as Microphone
  participant Down as Downmixer
  participant Mixer as AudioMixer
  participant Ch as OutputChannel

  Mic->>Down: captured multi-channel frame
  Down-->>Mic: maybe_downmix_channels -> borrowed or owned mono/packed frame
  Mic->>Mixer: push AudioFrame (with packed data & AudioInfo)
  Note right of Mixer `#E8F0FF`: Build frame once\nreuse across retries
  Mixer->>Ch: try_send(frame)
  alt channel full
    Ch--xMixer: Full
    Mixer->>Mixer: sleep(short)
    Mixer->>Ch: retry try_send(frame) (loop)
  else other error
    Ch--xMixer: Error -> Mixer returns Err
  end
  Ch-->>Mixer: Ack (sent)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas to focus:
    • crates/recording/src/sources/microphone.rs: correctness of sample conversions, clamping/rounding behavior, memory ownership when returning borrowed vs owned buffers, and test coverage for edge sample values and formats.
    • crates/recording/src/sources/audio_mixer.rs: correctness of retry/backpressure logic, sleep duration impacts, frame timestamp/origin preservation, and possible performance/latency implications.
    • Integration: ensure public struct change (_lock field) doesn't break external construction/usage or serialization.

Possibly related PRs

Poem

🐇
I nibble frames by moonlit code,
Stereo threads into a single road,
Each sample averaged, rounded, neat,
A tiny lock keeps rhythm sweet,
Hoppity hop — the stream’s complete! 🎧

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding channel downmixing for microphone audio frames, which is the primary focus of the modifications in microphone.rs.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch instant-mono-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
crates/recording/src/sources/microphone.rs (4)

72-91: Consider adding doc comments.

The function logic is sound with proper validation and fallback handling. However, adding doc comments would improve maintainability by documenting the parameters, return value, and behavior.

Example:

/// Downmixes audio data from multiple channels to fewer channels.
/// Returns borrowed data if no downmixing is needed, or owned data if downmixed.
///
/// # Arguments
/// * `data` - Raw audio sample bytes
/// * `format` - Sample format (U8, I16, F32, etc.)
/// * `source_channels` - Number of channels in source data
/// * `target_channels` - Desired number of output channels
fn maybe_downmix_channels<'a>(

93-111: LGTM! Consider adding doc comments.

The implementation is robust with proper overflow protection (checked_mul) and validation. The frame alignment check ensures data integrity. Adding doc comments would document the algorithm and edge cases.


113-200: LGTM! Consider adding doc comments.

The helper functions correctly handle sample format conversions with appropriate clamping for integer formats and rounding. The use of from_ne_bytes / to_ne_bytes is appropriate for in-memory audio processing. Adding doc comments would clarify the purpose and expected inputs for each function.


202-234: LGTM! Consider expanding test coverage.

The tests validate core functionality: stereo-to-mono downmixing with correct averaging and zero-copy optimization for mono input. Consider adding tests for:

  • Other sample formats (I16, I32, I64, U8)
  • Edge cases (empty data, unsupported format)
  • Multiple source channels (3+)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25f0bfb and e4cca52.

📒 Files selected for processing (1)
  • crates/recording/src/sources/microphone.rs (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust code using rustfmt and ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.

Files:

  • crates/recording/src/sources/microphone.rs
crates/*/src/**/*

📄 CodeRabbit inference engine (AGENTS.md)

Rust crates should place tests within the src/ and/or a sibling tests/ directory for each crate inside crates/*.

Files:

  • crates/recording/src/sources/microphone.rs
🧠 Learnings (1)
📚 Learning: 2025-10-28T08:39:42.230Z
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.

Applied to files:

  • crates/recording/src/sources/microphone.rs
🧬 Code graph analysis (1)
crates/recording/src/sources/microphone.rs (4)
crates/recording/src/sources/screen_capture/windows.rs (2)
  • audio_info (44-54)
  • audio_info (369-371)
crates/recording/src/output_pipeline/core.rs (7)
  • audio_info (740-742)
  • audio_info (826-826)
  • new (307-325)
  • new (675-677)
  • new (718-720)
  • new (751-753)
  • new (795-812)
crates/recording/src/feeds/microphone.rs (1)
  • audio_info (243-245)
crates/media-info/src/lib.rs (2)
  • new (29-45)
  • sample_size (121-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Clippy (aarch64-apple-darwin, macos-latest)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (2)
crates/recording/src/sources/microphone.rs (2)

7-11: LGTM!

The constant declaration is clear and the additional imports are appropriate for the downmixing functionality.


31-62: LGTM!

The setup logic correctly computes source and target channel counts and applies downmixing in the audio processing pipeline. The use of Cow for packed data ensures zero-copy when no downmixing is needed.

@richiemcilroy richiemcilroy merged commit 3bbf641 into main Nov 10, 2025
15 of 16 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/recording/src/sources/microphone.rs (2)

44-56: Consider logging failures in the downmix-and-send loop.

The spawned task silently ignores send errors (line 51-56). If the channel is full or closed, frames are dropped without any indication. Consider logging failed sends for observability, especially since this could indicate backpressure or pipeline issues.

                 let _ = audio_tx
                     .send(AudioFrame::new(
                         audio_info.wrap_frame(packed.as_ref()),
                         frame.timestamp,
                     ))
-                    .await;
+                    .await
+                    .inspect_err(|e| {
+                        eprintln!("Failed to send microphone audio frame: {e}");
+                    });

202-234: Consider expanding test coverage to other sample formats.

The current tests only cover SampleFormat::F32. Adding test cases for integer formats (e.g., I16, I32) would increase confidence in the downmix logic across different sample types.

Example test for I16:

#[test]
fn downmixes_stereo_i16_to_mono() {
    let frames = [(1000i16, -500i16), (2000i16, 2000i16)];
    let mut data = Vec::new();

    for (left, right) in frames {
        data.extend_from_slice(&left.to_ne_bytes());
        data.extend_from_slice(&right.to_ne_bytes());
    }

    let downmixed = maybe_downmix_channels(&data, SampleFormat::I16, 2, 1);
    let owned = downmixed.into_owned();
    assert_eq!(owned.len(), frames.len() * std::mem::size_of::<i16>());

    let first = i16::from_ne_bytes(owned[0..2].try_into().unwrap());
    let second = i16::from_ne_bytes(owned[2..4].try_into().unwrap());

    assert_eq!(first, 250);  // (1000 + -500) / 2
    assert_eq!(second, 2000); // (2000 + 2000) / 2
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d6a3c8 and 9fe8c3d.

📒 Files selected for processing (1)
  • crates/recording/src/sources/microphone.rs (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust code using rustfmt and ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.

Files:

  • crates/recording/src/sources/microphone.rs
crates/*/src/**/*

📄 CodeRabbit inference engine (AGENTS.md)

Rust crates should place tests within the src/ and/or a sibling tests/ directory for each crate inside crates/*.

Files:

  • crates/recording/src/sources/microphone.rs
🧠 Learnings (2)
📓 Common learnings
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.
📚 Learning: 2025-10-28T08:39:42.230Z
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.

Applied to files:

  • crates/recording/src/sources/microphone.rs
🧬 Code graph analysis (1)
crates/recording/src/sources/microphone.rs (3)
crates/recording/src/sources/screen_capture/windows.rs (2)
  • audio_info (44-54)
  • audio_info (369-371)
crates/recording/src/feeds/microphone.rs (1)
  • audio_info (243-245)
crates/media-info/src/lib.rs (2)
  • new (29-45)
  • sample_size (121-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (7)
crates/recording/src/sources/microphone.rs (7)

11-11: LGTM: Clear constant for target channel count.

The constant is well-named and correctly set to 1 for mono output.


31-34: LGTM: Proper channel computation.

The code correctly extracts source channel info and computes target channel info for downmixing.


93-111: LGTM: Solid downmix implementation with proper validation.

The function correctly validates frame alignment, pre-allocates the output buffer, and iterates through frames to compute the averaged mono samples.


113-123: LGTM: Correct sample size mapping.

All sample format sizes are accurate, and unsupported formats gracefully return None.


125-139: LGTM: Simple averaging for channel downmix.

The equal-weight averaging approach is appropriate for microphone audio. More sophisticated downmixing (e.g., weighted center/LFE channels) is typically only needed for surround sound sources.


141-171: LGTM: Comprehensive sample format reading.

The function correctly handles all supported sample formats using native endian byte order, which is appropriate for local audio processing.


173-200: LGTM: Safe sample writing with clamping.

The function properly rounds and clamps integer samples to prevent overflow, and consistently uses native endian byte order.

Comment on lines +82 to +91
if target_channels == 1 {
if let Some(samples) = downmix_to_mono(data, format, source_channels) {
Cow::Owned(samples)
} else {
Cow::Borrowed(data)
}
} else {
Cow::Borrowed(data)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Potential data/metadata mismatch when downmix fails.

If downmix_to_mono returns None (line 83), the code falls back to the original multi-channel data (line 86). However, the AudioInfo at line 53 indicates mono channels. This mismatch could cause encoder errors or corrupted audio downstream.

Consider either:

  1. Logging the failure and using a zeroed mono frame, or
  2. Propagating the error to halt the stream

Run the following to check how AudioInfo mismatches are handled downstream:


🏁 Script executed:

#!/bin/bash
# Search for how wrapped frames are processed in the encoder
rg -nP --type=rust -C3 'AudioFrame.*wrap_frame|fn.*wrap_frame'

Length of output: 1273



---

🏁 Script executed:

```shell
#!/bin/bash
# Get full implementation of wrap_frame_with_max_channels
sed -n '137,190p' crates/media-info/src/lib.rs

Length of output: 2253


🏁 Script executed:

#!/bin/bash
# Check how maybe_downmix_channels is called and if AudioInfo is updated
rg -nP --type=rust -C5 'maybe_downmix_channels|downmix_to_mono'

Length of output: 5057


🏁 Script executed:

#!/bin/bash
# Search for AudioInfo modification after downmix attempt
rg -nP --type=rust -C3 'AudioInfo.*target_channels|channels.*=' crates/recording/src/

Length of output: 2191



---

🏁 Script executed:

```shell
#!/bin/bash
# Check how frames are actually processed after wrap_frame
rg -nP --type=rust -C3 'wrap_frame.*packed|packed.*wrap_frame' crates/recording/src/sources/microphone.rs

Length of output: 277


🏁 Script executed:

#!/bin/bash
# Check what with_max_channels does
rg -nP --type=rust -B2 -A8 'fn with_max_channels'

Length of output: 580


Fix critical audio data/metadata mismatch when downmix fails.

When downmix_to_mono returns None (line 83), the code returns the original multi-channel data unchanged. However, audio_info at line 32 already has channels set to the target (mono via with_max_channels). This causes wrap_frame at line 53 to interpret multi-channel data with mono metadata:

  • wrap_frame_with_max_channels line 147 calculates packed_sample_size = sample_size * self.channels (1)
  • But the actual data has stereo stride (sample_size * 2)
  • This misalignment corrupts frame count and chunk reading, causing silent audio data corruption

Required fix (lines 82–91):

Either propagate the downmix failure to halt the stream, or create a valid mono frame from the source data (e.g., downsampled or silence). Do not return misaligned data.

🤖 Prompt for AI Agents
In crates/recording/src/sources/microphone.rs around lines 82 to 91, when
downmix_to_mono(data, format, source_channels) returns None the code currently
returns the original multi‑channel data while audio_info (line 32) already
reports mono, causing a data/metadata mismatch and corrupted frames; fix by not
returning misaligned data — either propagate the failure (return an Err or None
so the stream terminates upstream) or synthesize a valid mono buffer (for
example generate a mono buffer by averaging/combining the source channels into a
newly owned Vec<u8> or, if combining is impossible, fill a zeroed mono buffer of
the correct length and sample size) and return Cow::Owned of that buffer so
audio_info channels and the data layout always match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants