-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: Add channel downmixing for microphone audio frames #1373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds per-frame downmix-to-mono logic in the microphone source (supports multiple SampleFormat variants) and introduces a private feed lock field on Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Mic as Microphone
participant Down as Downmixer
participant Mixer as AudioMixer
participant Ch as OutputChannel
Mic->>Down: captured multi-channel frame
Down-->>Mic: maybe_downmix_channels -> borrowed or owned mono/packed frame
Mic->>Mixer: push AudioFrame (with packed data & AudioInfo)
Note right of Mixer `#E8F0FF`: Build frame once\nreuse across retries
Mixer->>Ch: try_send(frame)
alt channel full
Ch--xMixer: Full
Mixer->>Mixer: sleep(short)
Mixer->>Ch: retry try_send(frame) (loop)
else other error
Ch--xMixer: Error -> Mixer returns Err
end
Ch-->>Mixer: Ack (sent)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
crates/recording/src/sources/microphone.rs (4)
72-91: Consider adding doc comments.The function logic is sound with proper validation and fallback handling. However, adding doc comments would improve maintainability by documenting the parameters, return value, and behavior.
Example:
/// Downmixes audio data from multiple channels to fewer channels. /// Returns borrowed data if no downmixing is needed, or owned data if downmixed. /// /// # Arguments /// * `data` - Raw audio sample bytes /// * `format` - Sample format (U8, I16, F32, etc.) /// * `source_channels` - Number of channels in source data /// * `target_channels` - Desired number of output channels fn maybe_downmix_channels<'a>(
93-111: LGTM! Consider adding doc comments.The implementation is robust with proper overflow protection (
checked_mul) and validation. The frame alignment check ensures data integrity. Adding doc comments would document the algorithm and edge cases.
113-200: LGTM! Consider adding doc comments.The helper functions correctly handle sample format conversions with appropriate clamping for integer formats and rounding. The use of
from_ne_bytes/to_ne_bytesis appropriate for in-memory audio processing. Adding doc comments would clarify the purpose and expected inputs for each function.
202-234: LGTM! Consider expanding test coverage.The tests validate core functionality: stereo-to-mono downmixing with correct averaging and zero-copy optimization for mono input. Consider adding tests for:
- Other sample formats (I16, I32, I64, U8)
- Edge cases (empty data, unsupported format)
- Multiple source channels (3+)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
crates/recording/src/sources/microphone.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Format Rust code usingrustfmtand ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.
Files:
crates/recording/src/sources/microphone.rs
crates/*/src/**/*
📄 CodeRabbit inference engine (AGENTS.md)
Rust crates should place tests within the
src/and/or a siblingtests/directory for each crate insidecrates/*.
Files:
crates/recording/src/sources/microphone.rs
🧠 Learnings (1)
📚 Learning: 2025-10-28T08:39:42.230Z
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.
Applied to files:
crates/recording/src/sources/microphone.rs
🧬 Code graph analysis (1)
crates/recording/src/sources/microphone.rs (4)
crates/recording/src/sources/screen_capture/windows.rs (2)
audio_info(44-54)audio_info(369-371)crates/recording/src/output_pipeline/core.rs (7)
audio_info(740-742)audio_info(826-826)new(307-325)new(675-677)new(718-720)new(751-753)new(795-812)crates/recording/src/feeds/microphone.rs (1)
audio_info(243-245)crates/media-info/src/lib.rs (2)
new(29-45)sample_size(121-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Clippy (aarch64-apple-darwin, macos-latest)
- GitHub Check: Analyze (rust)
🔇 Additional comments (2)
crates/recording/src/sources/microphone.rs (2)
7-11: LGTM!The constant declaration is clear and the additional imports are appropriate for the downmixing functionality.
31-62: LGTM!The setup logic correctly computes source and target channel counts and applies downmixing in the audio processing pipeline. The use of
Cowforpackeddata ensures zero-copy when no downmixing is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/recording/src/sources/microphone.rs (2)
44-56: Consider logging failures in the downmix-and-send loop.The spawned task silently ignores send errors (line 51-56). If the channel is full or closed, frames are dropped without any indication. Consider logging failed sends for observability, especially since this could indicate backpressure or pipeline issues.
let _ = audio_tx .send(AudioFrame::new( audio_info.wrap_frame(packed.as_ref()), frame.timestamp, )) - .await; + .await + .inspect_err(|e| { + eprintln!("Failed to send microphone audio frame: {e}"); + });
202-234: Consider expanding test coverage to other sample formats.The current tests only cover
SampleFormat::F32. Adding test cases for integer formats (e.g.,I16,I32) would increase confidence in the downmix logic across different sample types.Example test for I16:
#[test] fn downmixes_stereo_i16_to_mono() { let frames = [(1000i16, -500i16), (2000i16, 2000i16)]; let mut data = Vec::new(); for (left, right) in frames { data.extend_from_slice(&left.to_ne_bytes()); data.extend_from_slice(&right.to_ne_bytes()); } let downmixed = maybe_downmix_channels(&data, SampleFormat::I16, 2, 1); let owned = downmixed.into_owned(); assert_eq!(owned.len(), frames.len() * std::mem::size_of::<i16>()); let first = i16::from_ne_bytes(owned[0..2].try_into().unwrap()); let second = i16::from_ne_bytes(owned[2..4].try_into().unwrap()); assert_eq!(first, 250); // (1000 + -500) / 2 assert_eq!(second, 2000); // (2000 + 2000) / 2 }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
crates/recording/src/sources/microphone.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Format Rust code usingrustfmtand ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.
Files:
crates/recording/src/sources/microphone.rs
crates/*/src/**/*
📄 CodeRabbit inference engine (AGENTS.md)
Rust crates should place tests within the
src/and/or a siblingtests/directory for each crate insidecrates/*.
Files:
crates/recording/src/sources/microphone.rs
🧠 Learnings (2)
📓 Common learnings
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.
📚 Learning: 2025-10-28T08:39:42.230Z
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.
Applied to files:
crates/recording/src/sources/microphone.rs
🧬 Code graph analysis (1)
crates/recording/src/sources/microphone.rs (3)
crates/recording/src/sources/screen_capture/windows.rs (2)
audio_info(44-54)audio_info(369-371)crates/recording/src/feeds/microphone.rs (1)
audio_info(243-245)crates/media-info/src/lib.rs (2)
new(29-45)sample_size(121-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (rust)
🔇 Additional comments (7)
crates/recording/src/sources/microphone.rs (7)
11-11: LGTM: Clear constant for target channel count.The constant is well-named and correctly set to 1 for mono output.
31-34: LGTM: Proper channel computation.The code correctly extracts source channel info and computes target channel info for downmixing.
93-111: LGTM: Solid downmix implementation with proper validation.The function correctly validates frame alignment, pre-allocates the output buffer, and iterates through frames to compute the averaged mono samples.
113-123: LGTM: Correct sample size mapping.All sample format sizes are accurate, and unsupported formats gracefully return
None.
125-139: LGTM: Simple averaging for channel downmix.The equal-weight averaging approach is appropriate for microphone audio. More sophisticated downmixing (e.g., weighted center/LFE channels) is typically only needed for surround sound sources.
141-171: LGTM: Comprehensive sample format reading.The function correctly handles all supported sample formats using native endian byte order, which is appropriate for local audio processing.
173-200: LGTM: Safe sample writing with clamping.The function properly rounds and clamps integer samples to prevent overflow, and consistently uses native endian byte order.
| if target_channels == 1 { | ||
| if let Some(samples) = downmix_to_mono(data, format, source_channels) { | ||
| Cow::Owned(samples) | ||
| } else { | ||
| Cow::Borrowed(data) | ||
| } | ||
| } else { | ||
| Cow::Borrowed(data) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Potential data/metadata mismatch when downmix fails.
If downmix_to_mono returns None (line 83), the code falls back to the original multi-channel data (line 86). However, the AudioInfo at line 53 indicates mono channels. This mismatch could cause encoder errors or corrupted audio downstream.
Consider either:
- Logging the failure and using a zeroed mono frame, or
- Propagating the error to halt the stream
Run the following to check how AudioInfo mismatches are handled downstream:
🏁 Script executed:
#!/bin/bash
# Search for how wrapped frames are processed in the encoder
rg -nP --type=rust -C3 'AudioFrame.*wrap_frame|fn.*wrap_frame'Length of output: 1273
---
🏁 Script executed:
```shell
#!/bin/bash
# Get full implementation of wrap_frame_with_max_channels
sed -n '137,190p' crates/media-info/src/lib.rs
Length of output: 2253
🏁 Script executed:
#!/bin/bash
# Check how maybe_downmix_channels is called and if AudioInfo is updated
rg -nP --type=rust -C5 'maybe_downmix_channels|downmix_to_mono'Length of output: 5057
🏁 Script executed:
#!/bin/bash
# Search for AudioInfo modification after downmix attempt
rg -nP --type=rust -C3 'AudioInfo.*target_channels|channels.*=' crates/recording/src/Length of output: 2191
---
🏁 Script executed:
```shell
#!/bin/bash
# Check how frames are actually processed after wrap_frame
rg -nP --type=rust -C3 'wrap_frame.*packed|packed.*wrap_frame' crates/recording/src/sources/microphone.rs
Length of output: 277
🏁 Script executed:
#!/bin/bash
# Check what with_max_channels does
rg -nP --type=rust -B2 -A8 'fn with_max_channels'Length of output: 580
Fix critical audio data/metadata mismatch when downmix fails.
When downmix_to_mono returns None (line 83), the code returns the original multi-channel data unchanged. However, audio_info at line 32 already has channels set to the target (mono via with_max_channels). This causes wrap_frame at line 53 to interpret multi-channel data with mono metadata:
wrap_frame_with_max_channelsline 147 calculatespacked_sample_size = sample_size * self.channels(1)- But the actual data has stereo stride (
sample_size * 2) - This misalignment corrupts frame count and chunk reading, causing silent audio data corruption
Required fix (lines 82–91):
Either propagate the downmix failure to halt the stream, or create a valid mono frame from the source data (e.g., downsampled or silence). Do not return misaligned data.
🤖 Prompt for AI Agents
In crates/recording/src/sources/microphone.rs around lines 82 to 91, when
downmix_to_mono(data, format, source_channels) returns None the code currently
returns the original multi‑channel data while audio_info (line 32) already
reports mono, causing a data/metadata mismatch and corrupted frames; fix by not
returning misaligned data — either propagate the failure (return an Err or None
so the stream terminates upstream) or synthesize a valid mono buffer (for
example generate a mono buffer by averaging/combining the source channels into a
newly owned Vec<u8> or, if combining is impossible, fill a zeroed mono buffer of
the correct length and sample size) and return Cow::Owned of that buffer so
audio_info channels and the data layout always match.
Introduces channel downmixing support for microphone audio sources, ensuring that audio frames are converted to mono when required. The main changes include adding logic to downmix multi-channel audio to mono, updating how audio frames are processed
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Tests