Skip to content

Conversation

@tbarbugli
Copy link
Member

@tbarbugli tbarbugli commented Nov 3, 2025

  • Change AudioStreamTrack.write method to accept PcmData so callers do not have to pass bytes
  • Added new from_av_frame constructor for PcmData that accepts an av.AudioFrame
  • Added PcmData.to_int16 method
  • Added a new audio resampler based on numpy, it is not as fancy as av's audio resampler but it is stateless and simpler to use for streaming audio use-cases

Summary by CodeRabbit

  • New Features

    • Enhanced audio API with a PCM-first workflow: resampling, channel/format conversion, buffering, and public audio utilities for more robust audio handling.
  • Documentation

    • Added two new docs detailing project setup and Python testing guidelines.
  • Tests

    • Test suite changes: removal of legacy audio-track tests and updated coverage for PCM conversion, resampling, format handling, and metadata preservation.

@coderabbitai
Copy link

coderabbitai bot commented Nov 3, 2025

Walkthrough

Refactors the RTC audio pipeline to a PCM-first design: adds PcmData, AudioFormat enum, and Resampler; AudioStreamTrack now accepts PcmData, performs normalization/resampling and 20ms framing; updates package exports; adds two documentation files; removes legacy framerate-based audio tests and updates PCM/resampler tests.

Changes

Cohort / File(s) Summary
Documentation
AGENTS.md, CLAUDE.md
Add new docs covering project setup, dependency rules, generate.sh, and pytest testing conventions, fixtures, assets, and run guidance.
Package exports
getstream/video/rtc/__init__.py
Export AudioStreamTrack, PcmData, Resampler, and AudioFormat from the rtc package.
Audio track refactor
getstream/video/rtc/audio_track.py
Replace framerate/stereo constructor with sample_rate/channels/format; write() now accepts PcmData; internal buffering is PCM-centric with normalization/resampling to target sample_rate/channels/format, 20ms framing, silence padding, and revised queue semantics.
PCM utilities & resampling
getstream/video/rtc/track_util.py
Add AudioFormat enum, PcmData.from_av_frame(), PcmData.to_int16(), and a Resampler class; refactor resample logic to delegate to Resampler; update format/channel conversion and WAV output helpers.
Tests removed
tests/rtc/test_audio_track.py
Remove legacy tests tied to framerate/stereo-based AudioStreamTrack behavior.
Tests updated
tests/rtc/test_pcm_data.py
Update tests to use AudioFormat/Resampler; adjust expectations for format conversions, resampling, from_av_frame(), metadata preservation, and to_int16 behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant AudioStreamTrack
    participant PCM_Buffer as PCM_Buffer
    participant Resampler
    participant AV as AVFrame

    Note over AudioStreamTrack: New PCM-first write/recv flow
    Caller->>AudioStreamTrack: write(pcm: PcmData)
    activate AudioStreamTrack
    AudioStreamTrack->>PCM_Buffer: enqueue PcmData
    deactivate AudioStreamTrack

    Caller->>AudioStreamTrack: recv()
    activate AudioStreamTrack
    AudioStreamTrack->>PCM_Buffer: request 20ms PCM
    activate PCM_Buffer
    alt PCM missing or format mismatch
        PCM_Buffer->>Resampler: _normalize_pcm(pcm)
        activate Resampler
        Resampler->>Resampler: resample / adjust channels / convert format
        Resampler-->>PCM_Buffer: normalized PcmData
        deactivate Resampler
        PCM_Buffer->>PCM_Buffer: pad with silence if needed
    else sufficient PCM available
        PCM_Buffer->>PCM_Buffer: slice 20ms PCM segment
    end
    PCM_Buffer-->>AudioStreamTrack: 20ms PCM segment
    deactivate PCM_Buffer
    AudioStreamTrack->>AV: build AudioFrame from PCM
    AudioStreamTrack-->>Caller: AudioFrame (20ms)
    deactivate AudioStreamTrack
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay close attention to getstream/video/rtc/audio_track.py for buffering, 20ms boundary logic, silence padding, queue overflow handling, and pts/time_base calculations.
  • Review getstream/video/rtc/track_util.py Resampler algorithms, planar vs packed frame handling in PcmData.from_av_frame(), and numeric correctness in to_int16() and format conversions.
  • Verify updated tests in tests/rtc/test_pcm_data.py cover edge cases previously in the removed tests/rtc/test_audio_track.py.

Possibly related PRs

  • Audio utils #170 — Overlapping changes to PcmData/Resampler/AudioFormat and audio track resampling APIs; likely closely related or dependent.

Poem

🐰 I munched the bytes and hopped the rate,

PcmData hummed at twenty-millisecond gait,
Resampler stitched channels neat and true,
Frames now padded, timed, and new,
A rabbit cheers — fresh audio for you.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'More audio utils' is vague and generic. While it relates to audio functionality additions, it uses non-descriptive language that doesn't convey the specific changes (PcmData enhancements, AudioStreamTrack API update, new Resampler class). Provide a more specific title summarizing the main changes, such as 'Enhance AudioStreamTrack to accept PcmData and add audio utilities' or 'Add audio resampler and expand PcmData API'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch audio-more

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4b68c09 and 57cd3e3.

📒 Files selected for processing (7)
  • AGENTS.md (1 hunks)
  • CLAUDE.md (1 hunks)
  • getstream/video/rtc/__init__.py (2 hunks)
  • getstream/video/rtc/audio_track.py (2 hunks)
  • getstream/video/rtc/track_util.py (7 hunks)
  • tests/rtc/test_audio_track.py (0 hunks)
  • tests/rtc/test_pcm_data.py (6 hunks)
💤 Files with no reviewable changes (1)
  • tests/rtc/test_audio_track.py
🧰 Additional context used
🧬 Code graph analysis (4)
getstream/video/rtc/audio_track.py (1)
getstream/video/rtc/track_util.py (8)
  • PcmData (88-1398)
  • to_bytes (532-567)
  • append (718-848)
  • head (1292-1398)
  • tail (1184-1290)
  • duration (176-232)
  • to_int16 (655-716)
  • to_float32 (590-653)
getstream/video/rtc/__init__.py (2)
getstream/video/rtc/audio_track.py (1)
  • AudioStreamTrack (15-317)
getstream/video/rtc/track_util.py (3)
  • PcmData (88-1398)
  • Resampler (1401-1589)
  • AudioFormat (30-80)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
  • to_ndarray (41-46)
  • to_ndarray (322-323)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (10)
  • PcmData (88-1398)
  • AudioFormat (30-80)
  • Resampler (1401-1589)
  • clear (878-903)
  • to_int16 (655-716)
  • from_data (329-399)
  • resample (508-530)
  • resample (1433-1488)
  • append (718-848)
  • from_av_frame (402-506)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Tests (3.13)
  • GitHub Check: Tests (3.12)
  • GitHub Check: Tests (3.10)
  • GitHub Check: Tests (3.11)
🔇 Additional comments (8)
getstream/video/rtc/track_util.py (8)

401-506: LGTM with observations on shape handling.

The implementation correctly maps PyAV formats, handles empty frames, and preserves timestamps. The deinterleaving logic for packed stereo formats (lines 459-465) correctly reshapes [L0,R0,L1,R1,...] to [[L0,L1,...], [R0,R1,...]].

One observation: The shape normalization (lines 466-482) ensures mono is always 1D and stereo is always 2D (channels, samples), which is consistent with the codebase conventions.


508-530: LGTM! Clean delegation pattern.

The refactored resample method correctly delegates to the new Resampler class, simplifying the API and improving separation of concerns.


569-588: LGTM! Consistent WAV format handling.

The method correctly uses to_int16() to ensure 16-bit PCM output, which is the standard for WAV files.


655-716: LGTM! Correct clipping and scaling for int16 conversion.

The implementation correctly:

  • Returns self when already in s16 format (optimization)
  • Clips float32 values to [-1.0, 1.0] before conversion (line 703)
  • Scales by 32767.0 which is appropriate for symmetric conversion

Note: Using 32767.0 (rather than 32768.0) ensures that both 1.0 and -1.0 map to valid int16 values, avoiding asymmetric clipping. This is a standard practice for audio processing.


764-765: LGTM! Consistent format conversion in append.

The update correctly uses to_int16() for s16 format conversion, maintaining consistency with the f32 conversion on line 763.


1401-1431: LGTM! Well-structured stateless resampler.

The Resampler class design is clean:

  • Stateless design is appropriate for chunk-based streaming
  • Three-step conversion process (rate, channels, format) is logical
  • Timestamp preservation is correctly implemented
  • Format validation in constructor prevents invalid configurations

Also applies to: 1433-1488


1897-1923: LGTM! Consistent use of conversion methods.

The helper function correctly uses to_int16() for s16 format conversion, maintaining consistency with the to_float32() conversion for f32 format.


1553-1583: No issues found. Code correctly follows existing conversion patterns.

The scaling asymmetry in the code under review is intentional and consistent with the existing conversion methods in the same file:

  • Lines 640 and 1574: both use / 32768.0 for s16 → f32
  • Lines 703 and 1579: both use * 32767.0 for f32 → s16

This pattern correctly handles the asymmetric int16 range [-32768, 32767] and is a standard approach in audio processing. Round-trip conversion fidelity is acceptable given the inherent constraints of the formats.

Comment on lines +33 to 52
self,
sample_rate: int = 48000,
channels: int = 1,
format: str = "s16",
max_queue_size: int = 100,
):
"""
Initialize an AudioStreamTrack that reads data from a queue.
Initialize an AudioStreamTrack that accepts PcmData objects.
Args:
framerate: Sample rate in Hz (default: 8000)
stereo: Whether to use stereo output (default: False)
format: Audio format (default: "s16")
max_queue_size: Maximum number of frames to keep in queue (default: 100)
sample_rate: Target sample rate in Hz (default: 48000)
channels: Number of channels - 1=mono, 2=stereo (default: 1)
format: Audio format - "s16" or "f32" (default: "s16")
max_queue_size: Maximum number of PcmData objects in queue (default: 100)
"""
super().__init__()
self.framerate = framerate
self.stereo = stereo
self.sample_rate = sample_rate
self.channels = channels
self.format = format
self.layout = "stereo" if stereo else "mono"
self.max_queue_size = max_queue_size
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Canonicalize the track format before building frames

AudioStreamTrack only treats the exact strings "s16" and "f32" as special. If a caller follows the PCM helpers and instantiates the track with format="float32" (or "int16"), _normalize_pcm skips conversion, yet the av_format fallback still defaults to "s16". That sends float32 bytes into an int16 AudioFrame, resulting in heavily corrupted audio on playback. Please normalize/validate the format once in __init__ (re-using AudioFormat) and reuse those canonical values in _normalize_pcm and the AV format selection so aliases do not break the stream.

-from getstream.video.rtc.track_util import PcmData
+from getstream.video.rtc.track_util import AudioFormat, PcmData
@@
-        format: str = "s16",
+        format: str = AudioFormat.S16,
@@
-        self.sample_rate = sample_rate
-        self.channels = channels
-        self.format = format
+        self.sample_rate = sample_rate
+        self.channels = channels
+        fmt = format.value if isinstance(format, AudioFormat) else str(format).lower()
+        if fmt == "float32":
+            fmt = AudioFormat.F32
+        elif fmt == "int16":
+            fmt = AudioFormat.S16
+        self.format = AudioFormat.validate(fmt)
@@
-        if self.format == "s16":
+        if self.format == AudioFormat.S16:
             av_format = "s16"  # Packed int16
-        elif self.format == "f32":
+        elif self.format == AudioFormat.F32:
             av_format = "flt"  # Packed float32
         else:
             av_format = "s16"  # Default to s16
@@
-        if self.format == "s16" and pcm.format != "s16":
+        if self.format == AudioFormat.S16 and pcm.format != AudioFormat.S16:
             pcm = pcm.to_int16()
-        elif self.format == "f32" and pcm.format != "f32":
+        elif self.format == AudioFormat.F32 and pcm.format != AudioFormat.F32:
             pcm = pcm.to_float32()

Also applies to: 162-168, 312-315

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
getstream/video/rtc/track_util.py (1)

1519-1527: Fix interpolation index calculation to prevent out-of-bounds access.

This issue was flagged in a previous review: the index calculation at line 1522 can produce out-of-bounds indices when upsampling.

in_indices = out_indices * (num_samples / out_length)

When upsampling (e.g., 16kHz→48kHz), the last output index maps to a position beyond the last input sample, causing np.interp to extrapolate incorrectly. Examples: 16kHz→48kHz exceeds bounds by 0.67 samples, 44.1kHz→48kHz by 0.08 samples.

Apply this fix to ensure the last output sample maps to the last input sample:

-        # Create interpolation indices
-        # Map output sample positions back to input sample positions
-        out_indices = np.arange(out_length)
-        in_indices = out_indices * (num_samples / out_length)
+        # Create interpolation indices
+        # Map output sample positions back to input sample positions
+        # Ensure last output maps to last input to prevent extrapolation
+        out_indices = np.arange(out_length)
+        if out_length == 1:
+            # Edge case: single output sample maps to middle of input
+            in_indices = np.array([num_samples / 2.0])
+        else:
+            in_indices = out_indices * ((num_samples - 1) / (out_length - 1))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 57cd3e3 and 2564809.

📒 Files selected for processing (1)
  • getstream/video/rtc/track_util.py (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
  • to_ndarray (41-46)
  • to_ndarray (322-323)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Tests (3.10)
  • GitHub Check: Tests (3.12)
  • GitHub Check: Tests (3.11)
  • GitHub Check: Tests (3.13)
🔇 Additional comments (4)
getstream/video/rtc/track_util.py (4)

508-530: LGTM: Clean delegation to Resampler.

The refactoring to use the new Resampler class simplifies the public API while maintaining the same functionality.


578-580: LGTM: Cleaner int16 conversion.

Using to_int16() centralizes the conversion logic and improves maintainability.


765-765: LGTM: Consistent format conversion.

Using to_int16() ensures consistent conversion behavior across the codebase.


1925-1925: LGTM: Centralized format conversion.

Using to_int16() maintains consistency with the refactoring pattern throughout this PR.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
getstream/video/rtc/track_util.py (1)

432-454: Critical: Incomplete format mapping breaks packed format detection.

The fallback at line 439 sets pcm_format = AudioFormat.S16 for unsupported PyAV formats (u8, s32, s64, dbl and their planar variants), but frame_format retains the original format name (e.g., "s32", "dbl").

At line 454, the is_packed check (frame_format in ("s16", "flt")) returns False for unsupported formats, even if they are packed. This causes incorrect shape normalization at lines 459-482, leading to silent audio data corruption.

Impact: If a PyAV frame uses any format other than s16/s16p/flt/fltp (e.g., s32, dbl, u8), the audio data will be silently corrupted due to incorrect deinterleaving.

Required fix (choose one):

  1. Handle all 12 PyAV formats explicitly with correct dtype mappings
  2. Update fallback to also set frame_format = "s16" when converting to S16
  3. Update is_packed check to include all packed format names

Apply this diff to fix option 2:

         else:
             pcm_format = AudioFormat.S16
             dtype = np.int16
+            frame_format = "s16"  # Update frame_format for correct is_packed detection
🧹 Nitpick comments (1)
getstream/video/rtc/track_util.py (1)

1556-1558: Consider adding explicit clipping for stereo-to-mono conversion.

Line 1558 averages two int16 channels: np.mean(samples, axis=0).astype(samples.dtype). While mathematically safe (the mean of two int16 values stays within int16 range), the astype cast could be more robust with explicit clipping.

Apply this diff for additional safety:

         elif from_channels == 2 and to_channels == 1:
             # Stereo to mono: average the two channels
-            return np.mean(samples, axis=0).astype(samples.dtype)
+            mono = np.mean(samples, axis=0)
+            if samples.dtype == np.int16:
+                mono = np.clip(mono, -32768, 32767)
+            return mono.astype(samples.dtype)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2564809 and eabaf4f.

📒 Files selected for processing (2)
  • getstream/video/rtc/track_util.py (7 hunks)
  • tests/rtc/test_pcm_data.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (10)
  • PcmData (88-1398)
  • AudioFormat (30-80)
  • Resampler (1401-1600)
  • clear (878-903)
  • to_int16 (655-716)
  • from_data (329-399)
  • resample (508-530)
  • resample (1437-1492)
  • append (718-848)
  • from_av_frame (402-506)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
  • to_ndarray (41-46)
  • to_ndarray (322-323)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Tests (3.10)
  • GitHub Check: Tests (3.12)
🔇 Additional comments (8)
tests/rtc/test_pcm_data.py (4)

3-6: LGTM! Imports are well-organized.

The new imports (av, Fraction) and updated public API imports (AudioFormat, Resampler) are necessary for testing the new from_av_frame() functionality and the Resampler class.


978-1082: Excellent test coverage for to_int16().

The test suite covers:

  • Basic float32→int16 conversion with correct scaling
  • Idempotency (returns self when already s16)
  • Metadata preservation (pts, dts, time_base)
  • Stereo handling
  • Clipping of out-of-range values
  • Wrong dtype handling via from_data()

All test assertions are correct and the expected values are properly calculated.


1084-1353: Outstanding Resampler test coverage.

The test suite thoroughly validates:

  • Upsampling and downsampling (16kHz↔48kHz)
  • Channel conversions (mono↔stereo) with correct duplication/averaging
  • Format conversions (s16↔f32) with proper scaling
  • Real-time streaming scenarios (20ms chunks, consistent across multiple chunks)
  • Timestamp preservation
  • Edge cases (empty audio, single sample)
  • Linear interpolation quality (monotonic increases for ramps)
  • Statelessness verification (identical inputs produce identical outputs)

The streaming test at lines 1318-1352 is particularly valuable—it verifies that chunked processing is stateless, which is critical for the PR's goal of simplifying streaming audio use-cases.


1355-1543: Comprehensive from_av_frame() test coverage.

The test suite validates:

  • Multiple formats: s16p (planar int16), fltp (planar float32), s16 (packed/interleaved)
  • Mono and stereo audio
  • Timestamp extraction (pts, dts) and time_base conversion (Fraction→float)
  • Proper deinterleaving of packed formats (lines 1415-1450)
  • Integration with Resampler for real-world workflows
  • Edge cases: empty frames, single frames
  • Standard WebRTC rates (48kHz) with 20ms chunks (960 samples)

The packed stereo test (lines 1415-1450) is particularly valuable as it verifies correct deinterleaving of [L0,R0,L1,R1,...] to [[L0,L1,...],[R0,R1,...]] format.

getstream/video/rtc/track_util.py (4)

655-716: LGTM! Correct float32→int16 conversion.

The implementation correctly:

  • Returns self when already int16 (idempotent)
  • Clips float values to [-1.0, 1.0] before scaling (line 703)
  • Scales by 32767.0 (correct for float→int16, not 32768)
  • Preserves all metadata (pts, dts, time_base, channels)
  • Handles non-ndarray samples via from_bytes/from_data normalization

The conversion formula matches audio standards and the comprehensive test coverage validates correctness.


1494-1534: Interpolation fix correctly applied.

The index calculation at line 1529 now correctly uses:

in_indices = out_indices * ((num_samples - 1) / (out_length - 1))

This ensures the last output sample maps exactly to the last input sample, preventing the out-of-bounds issue identified in previous reviews. The edge case for out_length == 1 (lines 1520-1522) is also properly handled.

Verified: The fix prevents out-of-bounds errors in common resampling scenarios (16kHz→48kHz, 44.1kHz→48kHz, etc.) that were flagged in previous review.


578-579: Correct integration of to_int16() method.

The new to_int16() method is properly integrated at all conversion points:

  • Line 578: to_wav_bytes() correctly converts to int16 (WAV requires 16-bit PCM)
  • Line 765: append() uses it when target format is s16/int16
  • Line 1932: _normalize_audio_format() uses it for s16 normalization

All integration points maintain correct format conversion behavior.

Also applies to: 765-765, 1932-1932


508-530: Clean delegation to new Resampler class.

The refactored resample() method maintains backward compatibility by delegating to the new Resampler class (lines 527-530). The early return for no-op cases (line 523) preserves efficiency.

This design supports the PR's goal of providing a simpler stateless resampler while keeping the existing API intact.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/rtc/test_pcm_data.py (1)

803-810: Keep the 'float32' string constructor test intact.

The docstring still claims we are exercising the 'float32' string path, but the call now passes AudioFormat.F32. This drops coverage of the plain-string input, which we still support elsewhere. Please keep the argument as "float32" (or add a second assertion) so we continue validating that code path.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eabaf4f and abcc8c7.

📒 Files selected for processing (2)
  • getstream/video/rtc/track_util.py (7 hunks)
  • tests/rtc/test_pcm_data.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (10)
  • PcmData (88-1400)
  • AudioFormat (30-80)
  • Resampler (1403-1602)
  • clear (880-905)
  • to_int16 (657-718)
  • from_data (329-399)
  • resample (510-532)
  • resample (1439-1494)
  • append (720-850)
  • from_av_frame (402-508)
getstream/video/rtc/track_util.py (1)
tests/rtc/test_track_util.py (2)
  • to_ndarray (41-46)
  • to_ndarray (322-323)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Tests (3.13)
  • GitHub Check: Tests (3.12)
  • GitHub Check: Tests (3.10)
  • GitHub Check: Tests (3.11)

Comment on lines +1455 to +1583
samples = pcm.samples
current_rate = pcm.sample_rate
current_channels = pcm.channels
current_format = pcm.format

# Step 1: Adjust sample rate if needed
if current_rate != self.sample_rate:
if current_channels == 1:
samples = self._resample_1d(samples, current_rate, self.sample_rate)
else:
# Resample each channel independently
resampled_channels = []
for ch in range(current_channels):
resampled_ch = self._resample_1d(
samples[ch], current_rate, self.sample_rate
)
resampled_channels.append(resampled_ch)
samples = np.array(resampled_channels)
current_rate = self.sample_rate

# Step 2: Adjust channels if needed
if current_channels != self.channels:
samples = self._adjust_channels(samples, current_channels, self.channels)
current_channels = self.channels

# Step 3: Adjust format if needed
if current_format != self.format:
samples = self._adjust_format(samples, current_format, self.format)
current_format = self.format

# Create new PcmData with resampled audio, preserving timestamps
return PcmData(
samples=samples,
sample_rate=self.sample_rate,
format=self.format,
channels=self.channels,
pts=pcm.pts,
dts=pcm.dts,
time_base=pcm.time_base,
)

def _resample_1d(
self, samples: np.ndarray, from_rate: int, to_rate: int
) -> np.ndarray:
"""
Resample a 1D array using linear interpolation.
Args:
samples: 1D input samples
from_rate: Input sample rate
to_rate: Output sample rate
Returns:
Resampled 1D array
"""
if from_rate == to_rate:
return samples

# Calculate output length
num_samples = len(samples)
duration = num_samples / from_rate
out_length = int(np.round(duration * to_rate))

if out_length == 0:
return np.array([], dtype=samples.dtype)

# Handle edge case: single output sample
if out_length == 1:
# Return the first sample
return np.array([samples[0]], dtype=samples.dtype)

# Create interpolation indices
# Map output sample positions back to input sample positions
# Use (num_samples - 1) / (out_length - 1) to ensure the last output
# sample maps exactly to the last input sample, preventing out-of-bounds
out_indices = np.arange(out_length)
in_indices = out_indices * ((num_samples - 1) / (out_length - 1))

# Linear interpolation
resampled = np.interp(in_indices, np.arange(num_samples), samples)

return resampled.astype(samples.dtype)

def _adjust_channels(
self, samples: np.ndarray, from_channels: int, to_channels: int
) -> np.ndarray:
"""
Adjust number of channels (mono <-> stereo conversion).
Args:
samples: Input samples
from_channels: Input channel count
to_channels: Output channel count
Returns:
Samples with adjusted channel count
"""
if from_channels == to_channels:
return samples

if from_channels == 1 and to_channels == 2:
# Mono to stereo: duplicate the mono channel
return np.array([samples, samples])
elif from_channels == 2 and to_channels == 1:
# Stereo to mono: average the two channels
return np.mean(samples, axis=0).astype(samples.dtype)
else:
raise ValueError(
f"Unsupported channel conversion: {from_channels} -> {to_channels}"
)

def _adjust_format(
self, samples: np.ndarray, from_format: str, to_format: str
) -> np.ndarray:
"""
Convert between s16 and f32 formats.
Args:
samples: Input samples
from_format: Input format ("s16" or "f32")
to_format: Output format ("s16" or "f32")
Returns:
Samples in the target format
"""
if from_format == to_format:
return samples

if from_format == "s16" and to_format == "f32":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle time-major PCM before resampling.

Resampler.resample() now assumes pcm.samples is channel-major. However, PcmData explicitly supports (samples, channels) layouts (see test_to_bytes_interleaves_from_time_major). With such time-major input, samples[ch] in the channel loop grabs entire rows of the time axis instead of per-channel data, so resampling/upmixing produces truncated or nonsensical audio; _adjust_channels likewise averages across the wrong axis. Example:

time_major = np.array(
    [[1, -1],
     [2, -2],
     [3, -3],
     [4, -4]], dtype=np.int16)
pcm = PcmData(samples=time_major, sample_rate=16000, format="s16", channels=2)
Resampler(format="s16", sample_rate=48000, channels=2).resample(pcm)

Currently yields two short “channels” built from the first two rows instead of the full stereo signal. Please canonicalize to channel-major (and flatten mono) before resampling so both layouts work. One possible fix:

         samples = pcm.samples
         current_rate = pcm.sample_rate
         current_channels = pcm.channels
         current_format = pcm.format
+
+        if isinstance(samples, np.ndarray):
+            if current_channels == 1 and samples.ndim == 2:
+                # Accept (1, N) or (N, 1) inputs for mono
+                if samples.shape[0] == 1:
+                    samples = samples.reshape(-1)
+                elif samples.shape[1] == 1:
+                    samples = samples.reshape(-1)
+            elif current_channels > 1 and samples.ndim == 2:
+                if samples.shape[0] == current_channels:
+                    pass  # already channel-major
+                elif samples.shape[1] == current_channels:
+                    samples = samples.T
+                else:
+                    raise ValueError(
+                        f"Unexpected multi-channel shape {samples.shape} for channels={current_channels}"
+                    )

@tbarbugli tbarbugli merged commit 2c3bc45 into main Nov 4, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants