More audio utils #173

tbarbugli · 2025-11-03T19:41:23Z

Change AudioStreamTrack.write method to accept PcmData so callers do not have to pass bytes
Added new from_av_frame constructor for PcmData that accepts an av.AudioFrame
Added PcmData.to_int16 method
Added a new audio resampler based on numpy, it is not as fancy as av's audio resampler but it is stateless and simpler to use for streaming audio use-cases

Summary by CodeRabbit

New Features
- Enhanced audio API with a PCM-first workflow: resampling, channel/format conversion, buffering, and public audio utilities for more robust audio handling.
Documentation
- Added two new docs detailing project setup and Python testing guidelines.
Tests
- Test suite changes: removal of legacy audio-track tests and updated coverage for PCM conversion, resampling, format handling, and metadata preservation.

coderabbitai · 2025-11-03T19:41:32Z

Walkthrough

Refactors the RTC audio pipeline to a PCM-first design: adds PcmData, AudioFormat enum, and Resampler; AudioStreamTrack now accepts PcmData, performs normalization/resampling and 20ms framing; updates package exports; adds two documentation files; removes legacy framerate-based audio tests and updates PCM/resampler tests.

Changes

Cohort / File(s)	Summary
Documentation `AGENTS.md`, `CLAUDE.md`	Add new docs covering project setup, dependency rules, generate.sh, and pytest testing conventions, fixtures, assets, and run guidance.
Package exports `getstream/video/rtc/__init__.py`	Export `AudioStreamTrack`, `PcmData`, `Resampler`, and `AudioFormat` from the rtc package.
Audio track refactor `getstream/video/rtc/audio_track.py`	Replace framerate/stereo constructor with `sample_rate`/`channels`/`format`; `write()` now accepts `PcmData`; internal buffering is PCM-centric with normalization/resampling to target sample_rate/channels/format, 20ms framing, silence padding, and revised queue semantics.
PCM utilities & resampling `getstream/video/rtc/track_util.py`	Add `AudioFormat` enum, `PcmData.from_av_frame()`, `PcmData.to_int16()`, and a `Resampler` class; refactor resample logic to delegate to `Resampler`; update format/channel conversion and WAV output helpers.
Tests removed `tests/rtc/test_audio_track.py`	Remove legacy tests tied to framerate/stereo-based AudioStreamTrack behavior.
Tests updated `tests/rtc/test_pcm_data.py`	Update tests to use `AudioFormat`/`Resampler`; adjust expectations for format conversions, resampling, from_av_frame(), metadata preservation, and to_int16 behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant AudioStreamTrack
    participant PCM_Buffer as PCM_Buffer
    participant Resampler
    participant AV as AVFrame

    Note over AudioStreamTrack: New PCM-first write/recv flow
    Caller->>AudioStreamTrack: write(pcm: PcmData)
    activate AudioStreamTrack
    AudioStreamTrack->>PCM_Buffer: enqueue PcmData
    deactivate AudioStreamTrack

    Caller->>AudioStreamTrack: recv()
    activate AudioStreamTrack
    AudioStreamTrack->>PCM_Buffer: request 20ms PCM
    activate PCM_Buffer
    alt PCM missing or format mismatch
        PCM_Buffer->>Resampler: _normalize_pcm(pcm)
        activate Resampler
        Resampler->>Resampler: resample / adjust channels / convert format
        Resampler-->>PCM_Buffer: normalized PcmData
        deactivate Resampler
        PCM_Buffer->>PCM_Buffer: pad with silence if needed
    else sufficient PCM available
        PCM_Buffer->>PCM_Buffer: slice 20ms PCM segment
    end
    PCM_Buffer-->>AudioStreamTrack: 20ms PCM segment
    deactivate PCM_Buffer
    AudioStreamTrack->>AV: build AudioFrame from PCM
    AudioStreamTrack-->>Caller: AudioFrame (20ms)
    deactivate AudioStreamTrack

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pay close attention to getstream/video/rtc/audio_track.py for buffering, 20ms boundary logic, silence padding, queue overflow handling, and pts/time_base calculations.
Review getstream/video/rtc/track_util.py Resampler algorithms, planar vs packed frame handling in PcmData.from_av_frame(), and numeric correctness in to_int16() and format conversions.
Verify updated tests in tests/rtc/test_pcm_data.py cover edge cases previously in the removed tests/rtc/test_audio_track.py.

Possibly related PRs

Audio utils #170 — Overlapping changes to PcmData/Resampler/AudioFormat and audio track resampling APIs; likely closely related or dependent.

Poem

🐰 I munched the bytes and hopped the rate,

PcmData hummed at twenty-millisecond gait,
Resampler stitched channels neat and true,
Frames now padded, timed, and new,
A rabbit cheers — fresh audio for you.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'More audio utils' is vague and generic. While it relates to audio functionality additions, it uses non-descriptive language that doesn't convey the specific changes (PcmData enhancements, AudioStreamTrack API update, new Resampler class).	Provide a more specific title summarizing the main changes, such as 'Enhance AudioStreamTrack to accept PcmData and add audio utilities' or 'Add audio resampler and expand PcmData API'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch audio-more

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4b68c09 and 57cd3e3.

📒 Files selected for processing (7)

AGENTS.md (1 hunks)
CLAUDE.md (1 hunks)
getstream/video/rtc/__init__.py (2 hunks)
getstream/video/rtc/audio_track.py (2 hunks)
getstream/video/rtc/track_util.py (7 hunks)
tests/rtc/test_audio_track.py (0 hunks)
tests/rtc/test_pcm_data.py (6 hunks)

💤 Files with no reviewable changes (1)

tests/rtc/test_audio_track.py

🧰 Additional context used

🧬 Code graph analysis (4)

getstream/video/rtc/audio_track.py (1)

getstream/video/rtc/track_util.py (8)

PcmData (88-1398)

to_bytes (532-567)

append (718-848)

head (1292-1398)

tail (1184-1290)

duration (176-232)

to_int16 (655-716)

to_float32 (590-653)

getstream/video/rtc/__init__.py (2)

getstream/video/rtc/audio_track.py (1)

AudioStreamTrack (15-317)

getstream/video/rtc/track_util.py (3)

PcmData (88-1398)

Resampler (1401-1589)

AudioFormat (30-80)

getstream/video/rtc/track_util.py (1)

tests/rtc/test_track_util.py (2)

to_ndarray (41-46)

to_ndarray (322-323)

tests/rtc/test_pcm_data.py (1)

getstream/video/rtc/track_util.py (10)

PcmData (88-1398)

AudioFormat (30-80)

Resampler (1401-1589)

clear (878-903)

to_int16 (655-716)

from_data (329-399)

resample (508-530)

resample (1433-1488)

append (718-848)

from_av_frame (402-506)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Tests (3.13)
GitHub Check: Tests (3.12)
GitHub Check: Tests (3.10)
GitHub Check: Tests (3.11)

🔇 Additional comments (8)

getstream/video/rtc/track_util.py (8)

401-506: LGTM with observations on shape handling.

The implementation correctly maps PyAV formats, handles empty frames, and preserves timestamps. The deinterleaving logic for packed stereo formats (lines 459-465) correctly reshapes [L0,R0,L1,R1,...] to [[L0,L1,...], [R0,R1,...]].

One observation: The shape normalization (lines 466-482) ensures mono is always 1D and stereo is always 2D (channels, samples), which is consistent with the codebase conventions.

508-530: LGTM! Clean delegation pattern.

The refactored resample method correctly delegates to the new Resampler class, simplifying the API and improving separation of concerns.

569-588: LGTM! Consistent WAV format handling.

The method correctly uses to_int16() to ensure 16-bit PCM output, which is the standard for WAV files.

655-716: LGTM! Correct clipping and scaling for int16 conversion.

The implementation correctly:

Returns self when already in s16 format (optimization)

Clips float32 values to [-1.0, 1.0] before conversion (line 703)

Scales by 32767.0 which is appropriate for symmetric conversion

Note: Using 32767.0 (rather than 32768.0) ensures that both 1.0 and -1.0 map to valid int16 values, avoiding asymmetric clipping. This is a standard practice for audio processing.

764-765: LGTM! Consistent format conversion in append.

The update correctly uses to_int16() for s16 format conversion, maintaining consistency with the f32 conversion on line 763.

1401-1431: LGTM! Well-structured stateless resampler.

The Resampler class design is clean:

Stateless design is appropriate for chunk-based streaming

Three-step conversion process (rate, channels, format) is logical

Timestamp preservation is correctly implemented

Format validation in constructor prevents invalid configurations

Also applies to: 1433-1488

1897-1923: LGTM! Consistent use of conversion methods.

The helper function correctly uses to_int16() for s16 format conversion, maintaining consistency with the to_float32() conversion for f32 format.

1553-1583: No issues found. Code correctly follows existing conversion patterns.

The scaling asymmetry in the code under review is intentional and consistent with the existing conversion methods in the same file:

Lines 640 and 1574: both use / 32768.0 for s16 → f32

Lines 703 and 1579: both use * 32767.0 for f32 → s16

This pattern correctly handles the asymmetric int16 range [-32768, 32767] and is a standard approach in audio processing. Round-trip conversion fidelity is acceptable given the inherent constraints of the formats.

coderabbitai · 2025-11-03T19:55:04Z

getstream/video/rtc/audio_track.py

+        self,
+        sample_rate: int = 48000,
+        channels: int = 1,
+        format: str = "s16",
+        max_queue_size: int = 100,
    ):
        """
-        Initialize an AudioStreamTrack that reads data from a queue.
+        Initialize an AudioStreamTrack that accepts PcmData objects.

        Args:
-            framerate: Sample rate in Hz (default: 8000)
-            stereo: Whether to use stereo output (default: False)
-            format: Audio format (default: "s16")
-            max_queue_size: Maximum number of frames to keep in queue (default: 100)
+            sample_rate: Target sample rate in Hz (default: 48000)
+            channels: Number of channels - 1=mono, 2=stereo (default: 1)
+            format: Audio format - "s16" or "f32" (default: "s16")
+            max_queue_size: Maximum number of PcmData objects in queue (default: 100)
        """
        super().__init__()
-        self.framerate = framerate
-        self.stereo = stereo
+        self.sample_rate = sample_rate
+        self.channels = channels
        self.format = format
-        self.layout = "stereo" if stereo else "mono"
        self.max_queue_size = max_queue_size


⚠️ Potential issue | 🔴 Critical

Canonicalize the track format before building frames

AudioStreamTrack only treats the exact strings "s16" and "f32" as special. If a caller follows the PCM helpers and instantiates the track with format="float32" (or "int16"), _normalize_pcm skips conversion, yet the av_format fallback still defaults to "s16". That sends float32 bytes into an int16 AudioFrame, resulting in heavily corrupted audio on playback. Please normalize/validate the format once in __init__ (re-using AudioFormat) and reuse those canonical values in _normalize_pcm and the AV format selection so aliases do not break the stream.

-from getstream.video.rtc.track_util import PcmData +from getstream.video.rtc.track_util import AudioFormat, PcmData @@ - format: str = "s16", + format: str = AudioFormat.S16, @@ - self.sample_rate = sample_rate - self.channels = channels - self.format = format + self.sample_rate = sample_rate + self.channels = channels + fmt = format.value if isinstance(format, AudioFormat) else str(format).lower() + if fmt == "float32": + fmt = AudioFormat.F32 + elif fmt == "int16": + fmt = AudioFormat.S16 + self.format = AudioFormat.validate(fmt) @@ - if self.format == "s16": + if self.format == AudioFormat.S16: av_format = "s16" # Packed int16 - elif self.format == "f32": + elif self.format == AudioFormat.F32: av_format = "flt" # Packed float32 else: av_format = "s16" # Default to s16 @@ - if self.format == "s16" and pcm.format != "s16": + if self.format == AudioFormat.S16 and pcm.format != AudioFormat.S16: pcm = pcm.to_int16() - elif self.format == "f32" and pcm.format != "f32": + elif self.format == AudioFormat.F32 and pcm.format != AudioFormat.F32: pcm = pcm.to_float32()

Also applies to: 162-168, 312-315

getstream/video/rtc/track_util.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

getstream/video/rtc/track_util.py (1)
1519-1527: Fix interpolation index calculation to prevent out-of-bounds access.

This issue was flagged in a previous review: the index calculation at line 1522 can produce out-of-bounds indices when upsampling.
in_indices = out_indices * (num_samples / out_length)
When upsampling (e.g., 16kHz→48kHz), the last output index maps to a position beyond the last input sample, causing np.interp to extrapolate incorrectly. Examples: 16kHz→48kHz exceeds bounds by 0.67 samples, 44.1kHz→48kHz by 0.08 samples.

Apply this fix to ensure the last output sample maps to the last input sample:
-        # Create interpolation indices
-        # Map output sample positions back to input sample positions
-        out_indices = np.arange(out_length)
-        in_indices = out_indices * (num_samples / out_length)
+        # Create interpolation indices
+        # Map output sample positions back to input sample positions
+        # Ensure last output maps to last input to prevent extrapolation
+        out_indices = np.arange(out_length)
+        if out_length == 1:
+            # Edge case: single output sample maps to middle of input
+            in_indices = np.array([num_samples / 2.0])
+        else:
+            in_indices = out_indices * ((num_samples - 1) / (out_length - 1))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 57cd3e3 and 2564809.

📒 Files selected for processing (1)

getstream/video/rtc/track_util.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

getstream/video/rtc/track_util.py (1)

tests/rtc/test_track_util.py (2)

to_ndarray (41-46)

to_ndarray (322-323)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Tests (3.10)
GitHub Check: Tests (3.12)
GitHub Check: Tests (3.11)
GitHub Check: Tests (3.13)

🔇 Additional comments (4)

getstream/video/rtc/track_util.py (4)

508-530: LGTM: Clean delegation to Resampler.

The refactoring to use the new Resampler class simplifies the public API while maintaining the same functionality.

578-580: LGTM: Cleaner int16 conversion.

Using to_int16() centralizes the conversion logic and improves maintainability.

765-765: LGTM: Consistent format conversion.

Using to_int16() ensures consistent conversion behavior across the codebase.

1925-1925: LGTM: Centralized format conversion.

Using to_int16() maintains consistency with the refactoring pattern throughout this PR.

getstream/video/rtc/track_util.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

getstream/video/rtc/track_util.py (1)
432-454: Critical: Incomplete format mapping breaks packed format detection.

The fallback at line 439 sets pcm_format = AudioFormat.S16 for unsupported PyAV formats (u8, s32, s64, dbl and their planar variants), but frame_format retains the original format name (e.g., "s32", "dbl").

At line 454, the is_packed check (frame_format in ("s16", "flt")) returns False for unsupported formats, even if they are packed. This causes incorrect shape normalization at lines 459-482, leading to silent audio data corruption.

Impact: If a PyAV frame uses any format other than s16/s16p/flt/fltp (e.g., s32, dbl, u8), the audio data will be silently corrupted due to incorrect deinterleaving.

Required fix (choose one):

Handle all 12 PyAV formats explicitly with correct dtype mappings

Update fallback to also set frame_format = "s16" when converting to S16

Update is_packed check to include all packed format names

Apply this diff to fix option 2:
         else:
             pcm_format = AudioFormat.S16
             dtype = np.int16
+            frame_format = "s16"  # Update frame_format for correct is_packed detection

🧹 Nitpick comments (1)

getstream/video/rtc/track_util.py (1)
1556-1558: Consider adding explicit clipping for stereo-to-mono conversion.

Line 1558 averages two int16 channels: np.mean(samples, axis=0).astype(samples.dtype). While mathematically safe (the mean of two int16 values stays within int16 range), the astype cast could be more robust with explicit clipping.

Apply this diff for additional safety:
         elif from_channels == 2 and to_channels == 1:
             # Stereo to mono: average the two channels
-            return np.mean(samples, axis=0).astype(samples.dtype)
+            mono = np.mean(samples, axis=0)
+            if samples.dtype == np.int16:
+                mono = np.clip(mono, -32768, 32767)
+            return mono.astype(samples.dtype)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2564809 and eabaf4f.

📒 Files selected for processing (2)

getstream/video/rtc/track_util.py (7 hunks)
tests/rtc/test_pcm_data.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/rtc/test_pcm_data.py (1)

getstream/video/rtc/track_util.py (10)

PcmData (88-1398)

AudioFormat (30-80)

Resampler (1401-1600)

clear (878-903)

to_int16 (655-716)

from_data (329-399)

resample (508-530)

resample (1437-1492)

append (718-848)

from_av_frame (402-506)

getstream/video/rtc/track_util.py (1)

tests/rtc/test_track_util.py (2)

to_ndarray (41-46)

to_ndarray (322-323)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Tests (3.10)
GitHub Check: Tests (3.12)

🔇 Additional comments (8)

tests/rtc/test_pcm_data.py (4)

3-6: LGTM! Imports are well-organized.

The new imports (av, Fraction) and updated public API imports (AudioFormat, Resampler) are necessary for testing the new from_av_frame() functionality and the Resampler class.

978-1082: Excellent test coverage for to_int16().

The test suite covers:

Basic float32→int16 conversion with correct scaling

Idempotency (returns self when already s16)

Metadata preservation (pts, dts, time_base)

Stereo handling

Clipping of out-of-range values

Wrong dtype handling via from_data()

All test assertions are correct and the expected values are properly calculated.

1084-1353: Outstanding Resampler test coverage.

The test suite thoroughly validates:

Upsampling and downsampling (16kHz↔48kHz)

Channel conversions (mono↔stereo) with correct duplication/averaging

Format conversions (s16↔f32) with proper scaling

Real-time streaming scenarios (20ms chunks, consistent across multiple chunks)

Timestamp preservation

Edge cases (empty audio, single sample)

Linear interpolation quality (monotonic increases for ramps)

Statelessness verification (identical inputs produce identical outputs)

The streaming test at lines 1318-1352 is particularly valuable—it verifies that chunked processing is stateless, which is critical for the PR's goal of simplifying streaming audio use-cases.

1355-1543: Comprehensive from_av_frame() test coverage.

The test suite validates:

Multiple formats: s16p (planar int16), fltp (planar float32), s16 (packed/interleaved)

Mono and stereo audio

Timestamp extraction (pts, dts) and time_base conversion (Fraction→float)

Proper deinterleaving of packed formats (lines 1415-1450)

Integration with Resampler for real-world workflows

Edge cases: empty frames, single frames

Standard WebRTC rates (48kHz) with 20ms chunks (960 samples)

The packed stereo test (lines 1415-1450) is particularly valuable as it verifies correct deinterleaving of [L0,R0,L1,R1,...] to [[L0,L1,...],[R0,R1,...]] format.
getstream/video/rtc/track_util.py (4)
655-716: LGTM! Correct float32→int16 conversion.

The implementation correctly:

Returns self when already int16 (idempotent)

Clips float values to [-1.0, 1.0] before scaling (line 703)

Scales by 32767.0 (correct for float→int16, not 32768)

Preserves all metadata (pts, dts, time_base, channels)

Handles non-ndarray samples via from_bytes/from_data normalization

The conversion formula matches audio standards and the comprehensive test coverage validates correctness.

1494-1534: Interpolation fix correctly applied.

The index calculation at line 1529 now correctly uses:
in_indices = out_indices * ((num_samples - 1) / (out_length - 1))
This ensures the last output sample maps exactly to the last input sample, preventing the out-of-bounds issue identified in previous reviews. The edge case for out_length == 1 (lines 1520-1522) is also properly handled.

Verified: The fix prevents out-of-bounds errors in common resampling scenarios (16kHz→48kHz, 44.1kHz→48kHz, etc.) that were flagged in previous review.

578-579: Correct integration of to_int16() method.

The new to_int16() method is properly integrated at all conversion points:

Line 578: to_wav_bytes() correctly converts to int16 (WAV requires 16-bit PCM)

Line 765: append() uses it when target format is s16/int16

Line 1932: _normalize_audio_format() uses it for s16 normalization

All integration points maintain correct format conversion behavior.

Also applies to: 765-765, 1932-1932

508-530: Clean delegation to new Resampler class.

The refactored resample() method maintains backward compatibility by delegating to the new Resampler class (lines 527-530). The early return for no-op cases (line 523) preserves efficiency.

This design supports the PR's goal of providing a simpler stateless resampler while keeping the existing API intact.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/rtc/test_pcm_data.py (1)

803-810: Keep the 'float32' string constructor test intact.

The docstring still claims we are exercising the 'float32' string path, but the call now passes AudioFormat.F32. This drops coverage of the plain-string input, which we still support elsewhere. Please keep the argument as "float32" (or add a second assertion) so we continue validating that code path.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eabaf4f and abcc8c7.

📒 Files selected for processing (2)

getstream/video/rtc/track_util.py (7 hunks)
tests/rtc/test_pcm_data.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/rtc/test_pcm_data.py (1)

getstream/video/rtc/track_util.py (10)

PcmData (88-1400)

AudioFormat (30-80)

Resampler (1403-1602)

clear (880-905)

to_int16 (657-718)

from_data (329-399)

resample (510-532)

resample (1439-1494)

append (720-850)

from_av_frame (402-508)

getstream/video/rtc/track_util.py (1)

tests/rtc/test_track_util.py (2)

to_ndarray (41-46)

to_ndarray (322-323)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Tests (3.13)
GitHub Check: Tests (3.12)
GitHub Check: Tests (3.10)
GitHub Check: Tests (3.11)

coderabbitai · 2025-11-04T14:35:07Z

getstream/video/rtc/track_util.py

+        samples = pcm.samples
+        current_rate = pcm.sample_rate
+        current_channels = pcm.channels
+        current_format = pcm.format
+
+        # Step 1: Adjust sample rate if needed
+        if current_rate != self.sample_rate:
+            if current_channels == 1:
+                samples = self._resample_1d(samples, current_rate, self.sample_rate)
+            else:
+                # Resample each channel independently
+                resampled_channels = []
+                for ch in range(current_channels):
+                    resampled_ch = self._resample_1d(
+                        samples[ch], current_rate, self.sample_rate
+                    )
+                    resampled_channels.append(resampled_ch)
+                samples = np.array(resampled_channels)
+            current_rate = self.sample_rate
+
+        # Step 2: Adjust channels if needed
+        if current_channels != self.channels:
+            samples = self._adjust_channels(samples, current_channels, self.channels)
+            current_channels = self.channels
+
+        # Step 3: Adjust format if needed
+        if current_format != self.format:
+            samples = self._adjust_format(samples, current_format, self.format)
+            current_format = self.format
+
+        # Create new PcmData with resampled audio, preserving timestamps
+        return PcmData(
+            samples=samples,
+            sample_rate=self.sample_rate,
+            format=self.format,
+            channels=self.channels,
+            pts=pcm.pts,
+            dts=pcm.dts,
+            time_base=pcm.time_base,
+        )
+
+    def _resample_1d(
+        self, samples: np.ndarray, from_rate: int, to_rate: int
+    ) -> np.ndarray:
+        """
+        Resample a 1D array using linear interpolation.
+
+        Args:
+            samples: 1D input samples
+            from_rate: Input sample rate
+            to_rate: Output sample rate
+
+        Returns:
+            Resampled 1D array
+        """
+        if from_rate == to_rate:
+            return samples
+
+        # Calculate output length
+        num_samples = len(samples)
+        duration = num_samples / from_rate
+        out_length = int(np.round(duration * to_rate))
+
+        if out_length == 0:
+            return np.array([], dtype=samples.dtype)
+
+        # Handle edge case: single output sample
+        if out_length == 1:
+            # Return the first sample
+            return np.array([samples[0]], dtype=samples.dtype)
+
+        # Create interpolation indices
+        # Map output sample positions back to input sample positions
+        # Use (num_samples - 1) / (out_length - 1) to ensure the last output
+        # sample maps exactly to the last input sample, preventing out-of-bounds
+        out_indices = np.arange(out_length)
+        in_indices = out_indices * ((num_samples - 1) / (out_length - 1))
+
+        # Linear interpolation
+        resampled = np.interp(in_indices, np.arange(num_samples), samples)
+
+        return resampled.astype(samples.dtype)
+
+    def _adjust_channels(
+        self, samples: np.ndarray, from_channels: int, to_channels: int
+    ) -> np.ndarray:
+        """
+        Adjust number of channels (mono <-> stereo conversion).
+
+        Args:
+            samples: Input samples
+            from_channels: Input channel count
+            to_channels: Output channel count
+
+        Returns:
+            Samples with adjusted channel count
+        """
+        if from_channels == to_channels:
+            return samples
+
+        if from_channels == 1 and to_channels == 2:
+            # Mono to stereo: duplicate the mono channel
+            return np.array([samples, samples])
+        elif from_channels == 2 and to_channels == 1:
+            # Stereo to mono: average the two channels
+            return np.mean(samples, axis=0).astype(samples.dtype)
+        else:
+            raise ValueError(
+                f"Unsupported channel conversion: {from_channels} -> {to_channels}"
+            )
+
+    def _adjust_format(
+        self, samples: np.ndarray, from_format: str, to_format: str
+    ) -> np.ndarray:
+        """
+        Convert between s16 and f32 formats.
+
+        Args:
+            samples: Input samples
+            from_format: Input format ("s16" or "f32")
+            to_format: Output format ("s16" or "f32")
+
+        Returns:
+            Samples in the target format
+        """
+        if from_format == to_format:
+            return samples
+
+        if from_format == "s16" and to_format == "f32":


⚠️ Potential issue | 🔴 Critical

Handle time-major PCM before resampling.

Resampler.resample() now assumes pcm.samples is channel-major. However, PcmData explicitly supports (samples, channels) layouts (see test_to_bytes_interleaves_from_time_major). With such time-major input, samples[ch] in the channel loop grabs entire rows of the time axis instead of per-channel data, so resampling/upmixing produces truncated or nonsensical audio; _adjust_channels likewise averages across the wrong axis. Example:

time_major = np.array( [[1, -1], [2, -2], [3, -3], [4, -4]], dtype=np.int16) pcm = PcmData(samples=time_major, sample_rate=16000, format="s16", channels=2) Resampler(format="s16", sample_rate=48000, channels=2).resample(pcm)

Currently yields two short “channels” built from the first two rows instead of the full stereo signal. Please canonicalize to channel-major (and flatten mono) before resampling so both layouts work. One possible fix:

samples = pcm.samples current_rate = pcm.sample_rate current_channels = pcm.channels current_format = pcm.format + + if isinstance(samples, np.ndarray): + if current_channels == 1 and samples.ndim == 2: + # Accept (1, N) or (N, 1) inputs for mono + if samples.shape[0] == 1: + samples = samples.reshape(-1) + elif samples.shape[1] == 1: + samples = samples.reshape(-1) + elif current_channels > 1 and samples.ndim == 2: + if samples.shape[0] == current_channels: + pass # already channel-major + elif samples.shape[1] == current_channels: + samples = samples.T + else: + raise ValueError( + f"Unexpected multi-channel shape {samples.shape} for channels={current_channels}" + )

tbarbugli added 2 commits November 3, 2025 17:10

new audio track and methods for pcmdata

c0f887f

new audio track cleanup

57cd3e3

tbarbugli had a problem deploying to feeds-v3-ci November 3, 2025 19:41 — with GitHub Actions Failure

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

fix doctest

2564809

tbarbugli temporarily deployed to feeds-v3-ci November 4, 2025 14:04 — with GitHub Actions Inactive

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

getstream/video/rtc/track_util.py Outdated Show resolved Hide resolved

getstream/video/rtc/track_util.py Show resolved Hide resolved

fix small off-by-one issue with upsampling

eabaf4f

tbarbugli temporarily deployed to feeds-v3-ci November 4, 2025 14:20 — with GitHub Actions Inactive

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

raise err if av frame format is unsupported

abcc8c7

tbarbugli temporarily deployed to feeds-v3-ci November 4, 2025 14:27 — with GitHub Actions Inactive

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

tbarbugli merged commit 2c3bc45 into main Nov 4, 2025
5 checks passed

This was referenced Nov 7, 2025

add participant to pcm #176

Merged

Non blocking json enc/dec #178

Merged

Cleanup PcmData constructors and populate participant in events #192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More audio utils #173

More audio utils #173

Uh oh!

tbarbugli commented Nov 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 3, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

More audio utils #173

More audio utils #173

Uh oh!

Conversation

tbarbugli commented Nov 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tbarbugli commented Nov 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 3, 2025 •

edited

Loading