Skip to content

Conversation

@tbarbugli
Copy link
Member

@tbarbugli tbarbugli commented Nov 8, 2025

  • Move json enc/dec and pydantic from the event loop to avoid blocking
  • Add repr and str methods for PcmData
  • PcmData samples must be an ndarray, removed compat code

@coderabbitai
Copy link

coderabbitai bot commented Nov 8, 2025

Walkthrough

This PR offloads CPU-bound operations to thread pools in AsyncBaseClient's request handling (JSON serialization and response parsing) and adds async utility functions that delegate synchronously to their counterparts. Additionally, PcmData gains repr and str methods with supporting tests for audio data formatting.

Changes

Cohort / File(s) Summary
Async request threading
getstream/base.py
Added asyncio import; AsyncBaseClient._request_async now serializes JSON payloads and parses responses in separate threads via asyncio.to_thread to avoid blocking the event loop.
Async utility functions
getstream/utils/__init__.py
Added two public async helpers: build_query_param_async and build_body_dict_async. Both delegate to their synchronous counterparts via asyncio.to_thread, allowing CPU-bound parameter and body construction to run in a thread pool. Exported in all.
PcmData string representations
getstream/video/rtc/track_util.py
Added repr and str methods to PcmData class for human-readable audio data formatting (channel description, sample rate, format, sample count, duration). Refactored internal logic to simplify shape/dtype handling and remove redundant isinstance checks for samples.
PcmData representation tests
tests/rtc/test_pcm_data.py
Added six new unit tests exercising PcmData string representations: mono/stereo/multichannel variants with different durations and formats, plus repr identity validation.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant AsyncBaseClient
    participant ThreadPool
    participant RequestLib
    
    Caller->>AsyncBaseClient: _request_async(json=payload)
    
    rect rgb(220, 240, 255)
        Note over AsyncBaseClient,ThreadPool: JSON Serialization (new)
        AsyncBaseClient->>ThreadPool: asyncio.to_thread(json.dumps, payload)
        ThreadPool-->>AsyncBaseClient: JSON string
    end
    
    AsyncBaseClient->>RequestLib: httpx.request(content=json_str, headers)
    RequestLib-->>AsyncBaseClient: response
    
    rect rgb(220, 240, 255)
        Note over AsyncBaseClient,ThreadPool: Response Parsing (new)
        AsyncBaseClient->>ThreadPool: asyncio.to_thread(self._parse_response, response)
        ThreadPool-->>AsyncBaseClient: parsed result
    end
    
    AsyncBaseClient-->>Caller: result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Areas requiring attention:

  • getstream/base.py: Verify asyncio.to_thread usage is correct for JSON serialization and response parsing; ensure error handling and exception propagation from threaded operations work as intended.
  • getstream/video/rtc/track_util.py: Carefully review the simplified shape and dtype handling logic in PcmData methods (to_float32, to_int16, copy, chunks, tail, head); confirm that removal of isinstance branching preserves behavior for all edge cases (mono, stereo, multichannel, empty samples).
  • getstream/utils/init.py: Confirm the asyncio.to_thread delegation pattern correctly preserves behavior of build_query_param and build_body_dict in async contexts.

Possibly related PRs

  • Audio utils #170: Performs a broader PcmData refactor and introduces AudioFormat; overlaps on PcmData modifications.
  • More audio utils #173: Adds from_av_frame, to_int16, and Resampler to PcmData; directly related to PcmData class changes in this PR.

Poem

🐰 Threads and strings, they dance so fine,
JSON serialized, responses align,
PCM data now speaks its name,
Async flows and repr's claim to fame! 🎵

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Non blocking json enc/dec' is partially related to the changeset. While JSON handling improvements are present in base.py, the PR also includes significant async utility functions and audio representation enhancements that are not reflected in the title.
Docstring Coverage ✅ Passed Docstring coverage is 82.76% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch audio-more

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
getstream/video/rtc/track_util.py (1)

237-260: Review fallback logic for ambiguous array shapes.

The duration property attempts to infer the samples dimension from 2D arrays by comparing shape to channel count. The fallback on line 255 uses max(self.samples.shape) for ambiguous cases.

Potential issue: For ambiguous shapes like (2, 2) with 2 channels, the fallback assumes the larger dimension is samples, but both dimensions are equal. While this might work in practice, the logic could be clearer.

However, the existing test suite (including test_bug_duration_with_different_array_shapes) validates this behavior, so the current implementation appears to handle the supported use cases correctly.

Consider adding a comment explaining the ambiguous case handling:

 else:
     # Ambiguous or unknown - assume (channels, samples) and pick larger dimension
-    # This handles edge cases like (2, 2) arrays
+    # This handles edge cases like (2, 2) arrays; assumes standard (channels, samples) convention
     num_samples = max(self.samples.shape[0], self.samples.shape[1])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b9966c3 and 3aab82d.

📒 Files selected for processing (4)
  • getstream/base.py (3 hunks)
  • getstream/utils/__init__.py (3 hunks)
  • getstream/video/rtc/track_util.py (11 hunks)
  • tests/rtc/test_pcm_data.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (1)
  • PcmData (87-1344)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Tests (3.12)
  • GitHub Check: Tests (3.11)
  • GitHub Check: Tests (3.10)
  • GitHub Check: Tests (3.13)
🔇 Additional comments (13)
getstream/base.py (2)

4-4: LGTM - asyncio import added for thread offloading.

The import is used for asyncio.to_thread() calls later in the file.


389-391: LGTM - Response parsing offloaded to thread pool.

Offloading _parse_response() to a thread via asyncio.to_thread() is appropriate since JSON parsing with json.loads() is CPU-bound and could block the event loop for large responses.

The inherited _parse_response() method from ResponseParserMixin (lines 38-61) performs:

  • JSON parsing with json.loads(response.text)
  • Deserialization to data models via from_dict()

Both operations are synchronous and CPU-intensive, making thread offloading beneficial.

tests/rtc/test_pcm_data.py (1)

1565-1669: LGTM - Comprehensive test coverage for string representations.

The new tests effectively validate the __str__() and __repr__() methods for PcmData across multiple scenarios:

  • Mono vs. stereo vs. multichannel: Verifies correct channel descriptions ("Mono", "Stereo", "4-channel")
  • Format descriptions: Validates "16-bit PCM" for s16 and "32-bit float" for f32
  • Duration formatting: Checks both milliseconds (< 1.0s) and seconds (>= 1.0s) representations
  • Sample count and rate: Ensures these are included in the output

The test coverage aligns well with the implementation in getstream/video/rtc/track_util.py lines 179-223.

getstream/utils/__init__.py (3)

129-148: LGTM - Async wrapper offloads query parameter construction to thread pool.

The build_query_param_async() function correctly delegates to the synchronous build_query_param() via asyncio.to_thread(), preventing blocking of the event loop during:

  • .to_json() method calls on objects
  • json.dumps() for dictionaries and complex objects
  • URL encoding with quote()

The docstring clearly indicates this offloads CPU-bound work, which sets appropriate expectations.


178-193: LGTM - Async wrapper offloads body dictionary construction to thread pool.

The build_body_dict_async() function correctly delegates to build_body_dict() via asyncio.to_thread(), offloading:

  • Recursive .to_dict() calls on nested objects
  • Traversal of nested dictionaries and lists

This prevents blocking the event loop when constructing complex request bodies with deep object hierarchies.


275-277: LGTM - Export list updated for new async utilities.

The __all__ list correctly includes the new async functions, making them part of the public API.

getstream/video/rtc/track_util.py (7)

170-177: LGTM - Standard repr implementation.

The __repr__() method delegates to __str__(), which is a common and acceptable pattern for providing both programmer and user-friendly representations.


179-223: LGTM - Clear and informative string representation.

The __str__() method provides a well-formatted, human-readable description of the audio data including:

  • Channel configuration (Mono/Stereo/n-channel)
  • Sample rate with units
  • Format description (16-bit PCM, 32-bit float)
  • Sample count
  • Duration with appropriate units (ms for < 1s, seconds for >= 1s)

The implementation correctly handles 2D array shapes by checking which dimension matches the channel count (lines 187-194).


638-638: LGTM - Simplified type checks rely on ndarray semantics.

The code now directly checks self.samples.dtype instead of first verifying isinstance(self.samples, np.ndarray). This simplification is safe because:

  1. The constructor (lines 140-162) validates that samples have the correct dtype for the declared format
  2. The constructor creates an empty ndarray with the correct dtype when samples is None (lines 130-137)
  3. All methods that modify samples ensure they remain ndarrays

This reduces redundant type checking while maintaining correctness.

Also applies to: 679-679


725-726: LGTM - Simplified helper assumes ndarray.

The _ensure_ndarray() helper now simply returns pcm.samples directly, removing the previous type-checking branch. This is safe because the PcmData class invariants ensure samples is always an ndarray (enforced by the constructor and all mutation methods).


847-847: LGTM - Simplified copy using ndarray method.

The copy() method now directly calls self.samples.copy() without type checking. This is safe given the class invariants and simplifies the code.


1032-1040: LGTM - Streamlined shape normalization in chunks().

The updated logic for normalizing sample array shape is clearer:

  • Flattens 2D mono arrays to 1D (lines 1033-1034)
  • Preserves 2D arrays for multi-channel (lines 1035-1037)
  • Handles other cases (line 1039)

This aligns with the standard (channels, samples) convention for multi-channel audio.


1183-1184: LGTM - Consistent ndarray handling in tail() and head().

Both methods now include comments confirming that samples is always an ndarray (lines 1183, 1283), which documents the class invariant. The logic for handling multi-channel vs. mono cases is consistent with other methods in the class.

Also applies to: 1283-1284

Comment on lines +360 to +365
if call_kwargs.get("json") is not None:
json_body = call_kwargs.pop("json")
json_str = await asyncio.to_thread(json.dumps, json_body)
call_kwargs["content"] = json_str
call_kwargs["headers"] = call_kwargs.get("headers", {})
call_kwargs["headers"]["Content-Type"] = "application/json"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Content-Type header handling and encoding behavior.

The manual JSON serialization approach replaces httpx's built-in json parameter handling. While this offloads CPU-bound work to a thread pool, consider these points:

  1. Encoding: json.dumps() returns a str, which httpx will encode using the charset from Content-Type. Since no charset is specified, it defaults to UTF-8, which is correct for JSON.

  2. Header conflict: If call_kwargs["headers"] already contains a Content-Type header, line 365 will overwrite it. While this is likely intentional, ensure this doesn't break any callers that explicitly set Content-Type.

  3. Error handling: If json_body contains non-serializable objects or circular references, json.dumps() will raise an exception in the thread pool, which will propagate as expected.

  4. Behavior parity: Verify that this manual approach preserves all behaviors of httpx's native json parameter (e.g., setting Content-Length, handling empty bodies).

Consider adding a test to verify that the manual JSON encoding produces identical requests to httpx's native json parameter:


🏁 Script executed:

#!/bin/bash
# Description: Check if there are existing tests for JSON request handling in async client

rg -n "AsyncBaseClient.*json" --type=py -A5 -B5

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Get the actual implementation context around lines 360-365
cat getstream/base.py | sed -n '350,375p'

Length of output: 1186


🏁 Script executed:

#!/bin/bash
# Search for any JSON-related tests more broadly
rg -n "json" --type=py -i tests/ 2>/dev/null | head -30

Length of output: 2502


🏁 Script executed:

#!/bin/bash
# Look for Content-Type header handling in the codebase
rg -n "Content-Type" --type=py -B2 -A2

Length of output: 811


🏁 Script executed:

#!/bin/bash
# Search for any test files that might test AsyncBaseClient or HTTP requests
fd -e py tests/ | head -20

Length of output: 329



---

🏁 Script executed:

```shell
#!/bin/bash
# Find what self.client is and where it's initialized
rg -n "self.client" getstream/base.py -B3 -A3 | head -50

Length of output: 1690


🏁 Script executed:

#!/bin/bash
# Check imports in base.py to confirm httpx usage
head -50 getstream/base.py | grep -E "^import|^from"

Length of output: 440


🏁 Script executed:

#!/bin/bash
# Search for any tests that verify request content/headers
rg -n "call_kwargs|content.*json|headers.*Content-Type" --type=py tests/ -B2 -A2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for test setup/mocking of httpx requests
rg -n "httpx\.|AsyncClient|mock.*client" --type=py tests/ -B2 -A2 | head -40

Length of output: 2701


🌐 Web query:

httpx python library native json parameter behavior Content-Type header

💡 Result:

Short answer: httpx's json= parameter JSON-encodes the value and sets Content-Type: application/json for you (unless you override the header). [1][2]

Example: httpx.post(url, json={"a":1}) -> body is JSON and http header Content-Type: application/json, but if you pass headers={"Content-Type": "..."} that explicit value is used instead. [1][2]

Sources:
[1] httpx API (json parameter docs).
[2] httpx Quickstart (Sending JSON encoded data).


Fix header overwrite to respect explicit Content-Type if provided.

The manual JSON serialization unconditionally overwrites any caller-provided Content-Type header (line 365), which differs from httpx's native json parameter. When you pass json= to httpx, it sets Content-Type only if not explicitly provided by the caller; this code always overwrites.

Behavioral differences confirmed:

  • httpx native: Respects explicit headers={"Content-Type": ...} passed by caller
  • This implementation: Unconditionally sets to "application/json", breaking caller intent

Update line 365 to only set Content-Type if not already present:

if "Content-Type" not in call_kwargs["headers"]:
    call_kwargs["headers"]["Content-Type"] = "application/json"

This preserves the ability for callers to override Content-Type while maintaining the default JSON behavior.

🤖 Prompt for AI Agents
In getstream/base.py around lines 360 to 365, the manual JSON serialization
always overwrites any caller-provided Content-Type header; change the logic so
that after preparing call_kwargs["headers"] you only set the Content-Type to
"application/json" if the caller has not already provided a Content-Type header
(i.e., check for existing "Content-Type" key in call_kwargs["headers"] before
assigning it) so callers can override the header while preserving the default
behavior.

@tbarbugli tbarbugli merged commit 41e5fa8 into main Nov 9, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants