-
Notifications
You must be signed in to change notification settings - Fork 12
More audio utils #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More audio utils #173
Conversation
WalkthroughRefactors the RTC audio pipeline to a PCM-first design: adds PcmData, AudioFormat enum, and Resampler; AudioStreamTrack now accepts PcmData, performs normalization/resampling and 20ms framing; updates package exports; adds two documentation files; removes legacy framerate-based audio tests and updates PCM/resampler tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant AudioStreamTrack
participant PCM_Buffer as PCM_Buffer
participant Resampler
participant AV as AVFrame
Note over AudioStreamTrack: New PCM-first write/recv flow
Caller->>AudioStreamTrack: write(pcm: PcmData)
activate AudioStreamTrack
AudioStreamTrack->>PCM_Buffer: enqueue PcmData
deactivate AudioStreamTrack
Caller->>AudioStreamTrack: recv()
activate AudioStreamTrack
AudioStreamTrack->>PCM_Buffer: request 20ms PCM
activate PCM_Buffer
alt PCM missing or format mismatch
PCM_Buffer->>Resampler: _normalize_pcm(pcm)
activate Resampler
Resampler->>Resampler: resample / adjust channels / convert format
Resampler-->>PCM_Buffer: normalized PcmData
deactivate Resampler
PCM_Buffer->>PCM_Buffer: pad with silence if needed
else sufficient PCM available
PCM_Buffer->>PCM_Buffer: slice 20ms PCM segment
end
PCM_Buffer-->>AudioStreamTrack: 20ms PCM segment
deactivate PCM_Buffer
AudioStreamTrack->>AV: build AudioFrame from PCM
AudioStreamTrack-->>Caller: AudioFrame (20ms)
deactivate AudioStreamTrack
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
AGENTS.md(1 hunks)CLAUDE.md(1 hunks)getstream/video/rtc/__init__.py(2 hunks)getstream/video/rtc/audio_track.py(2 hunks)getstream/video/rtc/track_util.py(7 hunks)tests/rtc/test_audio_track.py(0 hunks)tests/rtc/test_pcm_data.py(6 hunks)
💤 Files with no reviewable changes (1)
- tests/rtc/test_audio_track.py
🧰 Additional context used
🧬 Code graph analysis (4)
getstream/video/rtc/audio_track.py (1)
getstream/video/rtc/track_util.py (8)
PcmData(88-1398)to_bytes(532-567)append(718-848)head(1292-1398)tail(1184-1290)duration(176-232)to_int16(655-716)to_float32(590-653)
getstream/video/rtc/__init__.py (2)
getstream/video/rtc/audio_track.py (1)
AudioStreamTrack(15-317)getstream/video/rtc/track_util.py (3)
PcmData(88-1398)Resampler(1401-1589)AudioFormat(30-80)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
to_ndarray(41-46)to_ndarray(322-323)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (10)
PcmData(88-1398)AudioFormat(30-80)Resampler(1401-1589)clear(878-903)to_int16(655-716)from_data(329-399)resample(508-530)resample(1433-1488)append(718-848)from_av_frame(402-506)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Tests (3.13)
- GitHub Check: Tests (3.12)
- GitHub Check: Tests (3.10)
- GitHub Check: Tests (3.11)
🔇 Additional comments (8)
getstream/video/rtc/track_util.py (8)
401-506: LGTM with observations on shape handling.The implementation correctly maps PyAV formats, handles empty frames, and preserves timestamps. The deinterleaving logic for packed stereo formats (lines 459-465) correctly reshapes
[L0,R0,L1,R1,...]to[[L0,L1,...], [R0,R1,...]].One observation: The shape normalization (lines 466-482) ensures mono is always 1D and stereo is always 2D
(channels, samples), which is consistent with the codebase conventions.
508-530: LGTM! Clean delegation pattern.The refactored
resamplemethod correctly delegates to the newResamplerclass, simplifying the API and improving separation of concerns.
569-588: LGTM! Consistent WAV format handling.The method correctly uses
to_int16()to ensure 16-bit PCM output, which is the standard for WAV files.
655-716: LGTM! Correct clipping and scaling for int16 conversion.The implementation correctly:
- Returns self when already in s16 format (optimization)
- Clips float32 values to [-1.0, 1.0] before conversion (line 703)
- Scales by 32767.0 which is appropriate for symmetric conversion
Note: Using 32767.0 (rather than 32768.0) ensures that both 1.0 and -1.0 map to valid int16 values, avoiding asymmetric clipping. This is a standard practice for audio processing.
764-765: LGTM! Consistent format conversion in append.The update correctly uses
to_int16()for s16 format conversion, maintaining consistency with the f32 conversion on line 763.
1401-1431: LGTM! Well-structured stateless resampler.The
Resamplerclass design is clean:
- Stateless design is appropriate for chunk-based streaming
- Three-step conversion process (rate, channels, format) is logical
- Timestamp preservation is correctly implemented
- Format validation in constructor prevents invalid configurations
Also applies to: 1433-1488
1897-1923: LGTM! Consistent use of conversion methods.The helper function correctly uses
to_int16()for s16 format conversion, maintaining consistency with theto_float32()conversion for f32 format.
1553-1583: No issues found. Code correctly follows existing conversion patterns.The scaling asymmetry in the code under review is intentional and consistent with the existing conversion methods in the same file:
- Lines 640 and 1574: both use
/ 32768.0for s16 → f32- Lines 703 and 1579: both use
* 32767.0for f32 → s16This pattern correctly handles the asymmetric int16 range
[-32768, 32767]and is a standard approach in audio processing. Round-trip conversion fidelity is acceptable given the inherent constraints of the formats.
| self, | ||
| sample_rate: int = 48000, | ||
| channels: int = 1, | ||
| format: str = "s16", | ||
| max_queue_size: int = 100, | ||
| ): | ||
| """ | ||
| Initialize an AudioStreamTrack that reads data from a queue. | ||
| Initialize an AudioStreamTrack that accepts PcmData objects. | ||
| Args: | ||
| framerate: Sample rate in Hz (default: 8000) | ||
| stereo: Whether to use stereo output (default: False) | ||
| format: Audio format (default: "s16") | ||
| max_queue_size: Maximum number of frames to keep in queue (default: 100) | ||
| sample_rate: Target sample rate in Hz (default: 48000) | ||
| channels: Number of channels - 1=mono, 2=stereo (default: 1) | ||
| format: Audio format - "s16" or "f32" (default: "s16") | ||
| max_queue_size: Maximum number of PcmData objects in queue (default: 100) | ||
| """ | ||
| super().__init__() | ||
| self.framerate = framerate | ||
| self.stereo = stereo | ||
| self.sample_rate = sample_rate | ||
| self.channels = channels | ||
| self.format = format | ||
| self.layout = "stereo" if stereo else "mono" | ||
| self.max_queue_size = max_queue_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Canonicalize the track format before building frames
AudioStreamTrack only treats the exact strings "s16" and "f32" as special. If a caller follows the PCM helpers and instantiates the track with format="float32" (or "int16"), _normalize_pcm skips conversion, yet the av_format fallback still defaults to "s16". That sends float32 bytes into an int16 AudioFrame, resulting in heavily corrupted audio on playback. Please normalize/validate the format once in __init__ (re-using AudioFormat) and reuse those canonical values in _normalize_pcm and the AV format selection so aliases do not break the stream.
-from getstream.video.rtc.track_util import PcmData
+from getstream.video.rtc.track_util import AudioFormat, PcmData
@@
- format: str = "s16",
+ format: str = AudioFormat.S16,
@@
- self.sample_rate = sample_rate
- self.channels = channels
- self.format = format
+ self.sample_rate = sample_rate
+ self.channels = channels
+ fmt = format.value if isinstance(format, AudioFormat) else str(format).lower()
+ if fmt == "float32":
+ fmt = AudioFormat.F32
+ elif fmt == "int16":
+ fmt = AudioFormat.S16
+ self.format = AudioFormat.validate(fmt)
@@
- if self.format == "s16":
+ if self.format == AudioFormat.S16:
av_format = "s16" # Packed int16
- elif self.format == "f32":
+ elif self.format == AudioFormat.F32:
av_format = "flt" # Packed float32
else:
av_format = "s16" # Default to s16
@@
- if self.format == "s16" and pcm.format != "s16":
+ if self.format == AudioFormat.S16 and pcm.format != AudioFormat.S16:
pcm = pcm.to_int16()
- elif self.format == "f32" and pcm.format != "f32":
+ elif self.format == AudioFormat.F32 and pcm.format != AudioFormat.F32:
pcm = pcm.to_float32()Also applies to: 162-168, 312-315
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
getstream/video/rtc/track_util.py (1)
1519-1527: Fix interpolation index calculation to prevent out-of-bounds access.This issue was flagged in a previous review: the index calculation at line 1522 can produce out-of-bounds indices when upsampling.
in_indices = out_indices * (num_samples / out_length)When upsampling (e.g., 16kHz→48kHz), the last output index maps to a position beyond the last input sample, causing
np.interpto extrapolate incorrectly. Examples: 16kHz→48kHz exceeds bounds by 0.67 samples, 44.1kHz→48kHz by 0.08 samples.Apply this fix to ensure the last output sample maps to the last input sample:
- # Create interpolation indices - # Map output sample positions back to input sample positions - out_indices = np.arange(out_length) - in_indices = out_indices * (num_samples / out_length) + # Create interpolation indices + # Map output sample positions back to input sample positions + # Ensure last output maps to last input to prevent extrapolation + out_indices = np.arange(out_length) + if out_length == 1: + # Edge case: single output sample maps to middle of input + in_indices = np.array([num_samples / 2.0]) + else: + in_indices = out_indices * ((num_samples - 1) / (out_length - 1))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
getstream/video/rtc/track_util.py(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
to_ndarray(41-46)to_ndarray(322-323)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Tests (3.10)
- GitHub Check: Tests (3.12)
- GitHub Check: Tests (3.11)
- GitHub Check: Tests (3.13)
🔇 Additional comments (4)
getstream/video/rtc/track_util.py (4)
508-530: LGTM: Clean delegation to Resampler.The refactoring to use the new
Resamplerclass simplifies the public API while maintaining the same functionality.
578-580: LGTM: Cleaner int16 conversion.Using
to_int16()centralizes the conversion logic and improves maintainability.
765-765: LGTM: Consistent format conversion.Using
to_int16()ensures consistent conversion behavior across the codebase.
1925-1925: LGTM: Centralized format conversion.Using
to_int16()maintains consistency with the refactoring pattern throughout this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
getstream/video/rtc/track_util.py (1)
432-454: Critical: Incomplete format mapping breaks packed format detection.The fallback at line 439 sets
pcm_format = AudioFormat.S16for unsupported PyAV formats (u8, s32, s64, dbl and their planar variants), butframe_formatretains the original format name (e.g., "s32", "dbl").At line 454, the
is_packedcheck (frame_format in ("s16", "flt")) returns False for unsupported formats, even if they are packed. This causes incorrect shape normalization at lines 459-482, leading to silent audio data corruption.Impact: If a PyAV frame uses any format other than s16/s16p/flt/fltp (e.g., s32, dbl, u8), the audio data will be silently corrupted due to incorrect deinterleaving.
Required fix (choose one):
- Handle all 12 PyAV formats explicitly with correct dtype mappings
- Update fallback to also set
frame_format = "s16"when converting to S16- Update
is_packedcheck to include all packed format namesApply this diff to fix option 2:
else: pcm_format = AudioFormat.S16 dtype = np.int16 + frame_format = "s16" # Update frame_format for correct is_packed detection
🧹 Nitpick comments (1)
getstream/video/rtc/track_util.py (1)
1556-1558: Consider adding explicit clipping for stereo-to-mono conversion.Line 1558 averages two int16 channels:
np.mean(samples, axis=0).astype(samples.dtype). While mathematically safe (the mean of two int16 values stays within int16 range), theastypecast could be more robust with explicit clipping.Apply this diff for additional safety:
elif from_channels == 2 and to_channels == 1: # Stereo to mono: average the two channels - return np.mean(samples, axis=0).astype(samples.dtype) + mono = np.mean(samples, axis=0) + if samples.dtype == np.int16: + mono = np.clip(mono, -32768, 32767) + return mono.astype(samples.dtype)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
getstream/video/rtc/track_util.py(7 hunks)tests/rtc/test_pcm_data.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (10)
PcmData(88-1398)AudioFormat(30-80)Resampler(1401-1600)clear(878-903)to_int16(655-716)from_data(329-399)resample(508-530)resample(1437-1492)append(718-848)from_av_frame(402-506)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
to_ndarray(41-46)to_ndarray(322-323)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Tests (3.10)
- GitHub Check: Tests (3.12)
🔇 Additional comments (8)
tests/rtc/test_pcm_data.py (4)
3-6: LGTM! Imports are well-organized.The new imports (
av,Fraction) and updated public API imports (AudioFormat,Resampler) are necessary for testing the newfrom_av_frame()functionality and the Resampler class.
978-1082: Excellent test coverage for to_int16().The test suite covers:
- Basic float32→int16 conversion with correct scaling
- Idempotency (returns self when already s16)
- Metadata preservation (pts, dts, time_base)
- Stereo handling
- Clipping of out-of-range values
- Wrong dtype handling via from_data()
All test assertions are correct and the expected values are properly calculated.
1084-1353: Outstanding Resampler test coverage.The test suite thoroughly validates:
- Upsampling and downsampling (16kHz↔48kHz)
- Channel conversions (mono↔stereo) with correct duplication/averaging
- Format conversions (s16↔f32) with proper scaling
- Real-time streaming scenarios (20ms chunks, consistent across multiple chunks)
- Timestamp preservation
- Edge cases (empty audio, single sample)
- Linear interpolation quality (monotonic increases for ramps)
- Statelessness verification (identical inputs produce identical outputs)
The streaming test at lines 1318-1352 is particularly valuable—it verifies that chunked processing is stateless, which is critical for the PR's goal of simplifying streaming audio use-cases.
1355-1543: Comprehensive from_av_frame() test coverage.The test suite validates:
- Multiple formats: s16p (planar int16), fltp (planar float32), s16 (packed/interleaved)
- Mono and stereo audio
- Timestamp extraction (pts, dts) and time_base conversion (Fraction→float)
- Proper deinterleaving of packed formats (lines 1415-1450)
- Integration with Resampler for real-world workflows
- Edge cases: empty frames, single frames
- Standard WebRTC rates (48kHz) with 20ms chunks (960 samples)
The packed stereo test (lines 1415-1450) is particularly valuable as it verifies correct deinterleaving of [L0,R0,L1,R1,...] to [[L0,L1,...],[R0,R1,...]] format.
getstream/video/rtc/track_util.py (4)
655-716: LGTM! Correct float32→int16 conversion.The implementation correctly:
- Returns self when already int16 (idempotent)
- Clips float values to [-1.0, 1.0] before scaling (line 703)
- Scales by 32767.0 (correct for float→int16, not 32768)
- Preserves all metadata (pts, dts, time_base, channels)
- Handles non-ndarray samples via from_bytes/from_data normalization
The conversion formula matches audio standards and the comprehensive test coverage validates correctness.
1494-1534: Interpolation fix correctly applied.The index calculation at line 1529 now correctly uses:
in_indices = out_indices * ((num_samples - 1) / (out_length - 1))This ensures the last output sample maps exactly to the last input sample, preventing the out-of-bounds issue identified in previous reviews. The edge case for
out_length == 1(lines 1520-1522) is also properly handled.Verified: The fix prevents out-of-bounds errors in common resampling scenarios (16kHz→48kHz, 44.1kHz→48kHz, etc.) that were flagged in previous review.
578-579: Correct integration of to_int16() method.The new
to_int16()method is properly integrated at all conversion points:
- Line 578:
to_wav_bytes()correctly converts to int16 (WAV requires 16-bit PCM)- Line 765:
append()uses it when target format is s16/int16- Line 1932:
_normalize_audio_format()uses it for s16 normalizationAll integration points maintain correct format conversion behavior.
Also applies to: 765-765, 1932-1932
508-530: Clean delegation to new Resampler class.The refactored
resample()method maintains backward compatibility by delegating to the newResamplerclass (lines 527-530). The early return for no-op cases (line 523) preserves efficiency.This design supports the PR's goal of providing a simpler stateless resampler while keeping the existing API intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/rtc/test_pcm_data.py (1)
803-810: Keep the'float32'string constructor test intact.The docstring still claims we are exercising the
'float32'string path, but the call now passesAudioFormat.F32. This drops coverage of the plain-string input, which we still support elsewhere. Please keep the argument as"float32"(or add a second assertion) so we continue validating that code path.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
getstream/video/rtc/track_util.py(7 hunks)tests/rtc/test_pcm_data.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (10)
PcmData(88-1400)AudioFormat(30-80)Resampler(1403-1602)clear(880-905)to_int16(657-718)from_data(329-399)resample(510-532)resample(1439-1494)append(720-850)from_av_frame(402-508)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
to_ndarray(41-46)to_ndarray(322-323)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Tests (3.13)
- GitHub Check: Tests (3.12)
- GitHub Check: Tests (3.10)
- GitHub Check: Tests (3.11)
| samples = pcm.samples | ||
| current_rate = pcm.sample_rate | ||
| current_channels = pcm.channels | ||
| current_format = pcm.format | ||
|
|
||
| # Step 1: Adjust sample rate if needed | ||
| if current_rate != self.sample_rate: | ||
| if current_channels == 1: | ||
| samples = self._resample_1d(samples, current_rate, self.sample_rate) | ||
| else: | ||
| # Resample each channel independently | ||
| resampled_channels = [] | ||
| for ch in range(current_channels): | ||
| resampled_ch = self._resample_1d( | ||
| samples[ch], current_rate, self.sample_rate | ||
| ) | ||
| resampled_channels.append(resampled_ch) | ||
| samples = np.array(resampled_channels) | ||
| current_rate = self.sample_rate | ||
|
|
||
| # Step 2: Adjust channels if needed | ||
| if current_channels != self.channels: | ||
| samples = self._adjust_channels(samples, current_channels, self.channels) | ||
| current_channels = self.channels | ||
|
|
||
| # Step 3: Adjust format if needed | ||
| if current_format != self.format: | ||
| samples = self._adjust_format(samples, current_format, self.format) | ||
| current_format = self.format | ||
|
|
||
| # Create new PcmData with resampled audio, preserving timestamps | ||
| return PcmData( | ||
| samples=samples, | ||
| sample_rate=self.sample_rate, | ||
| format=self.format, | ||
| channels=self.channels, | ||
| pts=pcm.pts, | ||
| dts=pcm.dts, | ||
| time_base=pcm.time_base, | ||
| ) | ||
|
|
||
| def _resample_1d( | ||
| self, samples: np.ndarray, from_rate: int, to_rate: int | ||
| ) -> np.ndarray: | ||
| """ | ||
| Resample a 1D array using linear interpolation. | ||
| Args: | ||
| samples: 1D input samples | ||
| from_rate: Input sample rate | ||
| to_rate: Output sample rate | ||
| Returns: | ||
| Resampled 1D array | ||
| """ | ||
| if from_rate == to_rate: | ||
| return samples | ||
|
|
||
| # Calculate output length | ||
| num_samples = len(samples) | ||
| duration = num_samples / from_rate | ||
| out_length = int(np.round(duration * to_rate)) | ||
|
|
||
| if out_length == 0: | ||
| return np.array([], dtype=samples.dtype) | ||
|
|
||
| # Handle edge case: single output sample | ||
| if out_length == 1: | ||
| # Return the first sample | ||
| return np.array([samples[0]], dtype=samples.dtype) | ||
|
|
||
| # Create interpolation indices | ||
| # Map output sample positions back to input sample positions | ||
| # Use (num_samples - 1) / (out_length - 1) to ensure the last output | ||
| # sample maps exactly to the last input sample, preventing out-of-bounds | ||
| out_indices = np.arange(out_length) | ||
| in_indices = out_indices * ((num_samples - 1) / (out_length - 1)) | ||
|
|
||
| # Linear interpolation | ||
| resampled = np.interp(in_indices, np.arange(num_samples), samples) | ||
|
|
||
| return resampled.astype(samples.dtype) | ||
|
|
||
| def _adjust_channels( | ||
| self, samples: np.ndarray, from_channels: int, to_channels: int | ||
| ) -> np.ndarray: | ||
| """ | ||
| Adjust number of channels (mono <-> stereo conversion). | ||
| Args: | ||
| samples: Input samples | ||
| from_channels: Input channel count | ||
| to_channels: Output channel count | ||
| Returns: | ||
| Samples with adjusted channel count | ||
| """ | ||
| if from_channels == to_channels: | ||
| return samples | ||
|
|
||
| if from_channels == 1 and to_channels == 2: | ||
| # Mono to stereo: duplicate the mono channel | ||
| return np.array([samples, samples]) | ||
| elif from_channels == 2 and to_channels == 1: | ||
| # Stereo to mono: average the two channels | ||
| return np.mean(samples, axis=0).astype(samples.dtype) | ||
| else: | ||
| raise ValueError( | ||
| f"Unsupported channel conversion: {from_channels} -> {to_channels}" | ||
| ) | ||
|
|
||
| def _adjust_format( | ||
| self, samples: np.ndarray, from_format: str, to_format: str | ||
| ) -> np.ndarray: | ||
| """ | ||
| Convert between s16 and f32 formats. | ||
| Args: | ||
| samples: Input samples | ||
| from_format: Input format ("s16" or "f32") | ||
| to_format: Output format ("s16" or "f32") | ||
| Returns: | ||
| Samples in the target format | ||
| """ | ||
| if from_format == to_format: | ||
| return samples | ||
|
|
||
| if from_format == "s16" and to_format == "f32": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle time-major PCM before resampling.
Resampler.resample() now assumes pcm.samples is channel-major. However, PcmData explicitly supports (samples, channels) layouts (see test_to_bytes_interleaves_from_time_major). With such time-major input, samples[ch] in the channel loop grabs entire rows of the time axis instead of per-channel data, so resampling/upmixing produces truncated or nonsensical audio; _adjust_channels likewise averages across the wrong axis. Example:
time_major = np.array(
[[1, -1],
[2, -2],
[3, -3],
[4, -4]], dtype=np.int16)
pcm = PcmData(samples=time_major, sample_rate=16000, format="s16", channels=2)
Resampler(format="s16", sample_rate=48000, channels=2).resample(pcm)Currently yields two short “channels” built from the first two rows instead of the full stereo signal. Please canonicalize to channel-major (and flatten mono) before resampling so both layouts work. One possible fix:
samples = pcm.samples
current_rate = pcm.sample_rate
current_channels = pcm.channels
current_format = pcm.format
+
+ if isinstance(samples, np.ndarray):
+ if current_channels == 1 and samples.ndim == 2:
+ # Accept (1, N) or (N, 1) inputs for mono
+ if samples.shape[0] == 1:
+ samples = samples.reshape(-1)
+ elif samples.shape[1] == 1:
+ samples = samples.reshape(-1)
+ elif current_channels > 1 and samples.ndim == 2:
+ if samples.shape[0] == current_channels:
+ pass # already channel-major
+ elif samples.shape[1] == current_channels:
+ samples = samples.T
+ else:
+ raise ValueError(
+ f"Unexpected multi-channel shape {samples.shape} for channels={current_channels}"
+ )
AudioStreamTrack.writemethod to acceptPcmDataso callers do not have to pass bytesfrom_av_frameconstructor forPcmDatathat accepts anav.AudioFramePcmData.to_int16methodSummary by CodeRabbit
New Features
Documentation
Tests