Non blocking json enc/dec #178

tbarbugli · 2025-11-08T18:04:13Z

Move json enc/dec and pydantic from the event loop to avoid blocking
Add repr and str methods for PcmData
PcmData samples must be an ndarray, removed compat code

coderabbitai · 2025-11-08T18:04:21Z

Walkthrough

This PR offloads CPU-bound operations to thread pools in AsyncBaseClient's request handling (JSON serialization and response parsing) and adds async utility functions that delegate synchronously to their counterparts. Additionally, PcmData gains repr and str methods with supporting tests for audio data formatting.

Changes

Cohort / File(s)	Summary
Async request threading `getstream/base.py`	Added asyncio import; AsyncBaseClient._request_async now serializes JSON payloads and parses responses in separate threads via asyncio.to_thread to avoid blocking the event loop.
Async utility functions `getstream/utils/__init__.py`	Added two public async helpers: build_query_param_async and build_body_dict_async. Both delegate to their synchronous counterparts via asyncio.to_thread, allowing CPU-bound parameter and body construction to run in a thread pool. Exported in all.
PcmData string representations `getstream/video/rtc/track_util.py`	Added repr and str methods to PcmData class for human-readable audio data formatting (channel description, sample rate, format, sample count, duration). Refactored internal logic to simplify shape/dtype handling and remove redundant isinstance checks for samples.
PcmData representation tests `tests/rtc/test_pcm_data.py`	Added six new unit tests exercising PcmData string representations: mono/stereo/multichannel variants with different durations and formats, plus repr identity validation.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant AsyncBaseClient
    participant ThreadPool
    participant RequestLib
    
    Caller->>AsyncBaseClient: _request_async(json=payload)
    
    rect rgb(220, 240, 255)
        Note over AsyncBaseClient,ThreadPool: JSON Serialization (new)
        AsyncBaseClient->>ThreadPool: asyncio.to_thread(json.dumps, payload)
        ThreadPool-->>AsyncBaseClient: JSON string
    end
    
    AsyncBaseClient->>RequestLib: httpx.request(content=json_str, headers)
    RequestLib-->>AsyncBaseClient: response
    
    rect rgb(220, 240, 255)
        Note over AsyncBaseClient,ThreadPool: Response Parsing (new)
        AsyncBaseClient->>ThreadPool: asyncio.to_thread(self._parse_response, response)
        ThreadPool-->>AsyncBaseClient: parsed result
    end
    
    AsyncBaseClient-->>Caller: result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Areas requiring attention:

getstream/base.py: Verify asyncio.to_thread usage is correct for JSON serialization and response parsing; ensure error handling and exception propagation from threaded operations work as intended.
getstream/video/rtc/track_util.py: Carefully review the simplified shape and dtype handling logic in PcmData methods (to_float32, to_int16, copy, chunks, tail, head); confirm that removal of isinstance branching preserves behavior for all edge cases (mono, stereo, multichannel, empty samples).
getstream/utils/init.py: Confirm the asyncio.to_thread delegation pattern correctly preserves behavior of build_query_param and build_body_dict in async contexts.

Possibly related PRs

Audio utils #170: Performs a broader PcmData refactor and introduces AudioFormat; overlaps on PcmData modifications.
More audio utils #173: Adds from_av_frame, to_int16, and Resampler to PcmData; directly related to PcmData class changes in this PR.

Poem

🐰 Threads and strings, they dance so fine,
JSON serialized, responses align,
PCM data now speaks its name,
Async flows and repr's claim to fame! 🎵

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Non blocking json enc/dec' is partially related to the changeset. While JSON handling improvements are present in base.py, the PR also includes significant async utility functions and audio representation enhancements that are not reflected in the title.
Docstring Coverage	✅ Passed	Docstring coverage is 82.76% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch audio-more

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

getstream/video/rtc/track_util.py (1)
237-260: Review fallback logic for ambiguous array shapes.

The duration property attempts to infer the samples dimension from 2D arrays by comparing shape to channel count. The fallback on line 255 uses max(self.samples.shape) for ambiguous cases.

Potential issue: For ambiguous shapes like (2, 2) with 2 channels, the fallback assumes the larger dimension is samples, but both dimensions are equal. While this might work in practice, the logic could be clearer.

However, the existing test suite (including test_bug_duration_with_different_array_shapes) validates this behavior, so the current implementation appears to handle the supported use cases correctly.

Consider adding a comment explaining the ambiguous case handling:
 else:
     # Ambiguous or unknown - assume (channels, samples) and pick larger dimension
-    # This handles edge cases like (2, 2) arrays
+    # This handles edge cases like (2, 2) arrays; assumes standard (channels, samples) convention
     num_samples = max(self.samples.shape[0], self.samples.shape[1])

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b9966c3 and 3aab82d.

📒 Files selected for processing (4)

getstream/base.py (3 hunks)
getstream/utils/__init__.py (3 hunks)
getstream/video/rtc/track_util.py (11 hunks)
tests/rtc/test_pcm_data.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/rtc/test_pcm_data.py (1)

getstream/video/rtc/track_util.py (1)

PcmData (87-1344)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Tests (3.12)
GitHub Check: Tests (3.11)
GitHub Check: Tests (3.10)
GitHub Check: Tests (3.13)

🔇 Additional comments (13)

getstream/base.py (2)

4-4: LGTM - asyncio import added for thread offloading.

The import is used for asyncio.to_thread() calls later in the file.

389-391: LGTM - Response parsing offloaded to thread pool.

Offloading _parse_response() to a thread via asyncio.to_thread() is appropriate since JSON parsing with json.loads() is CPU-bound and could block the event loop for large responses.

The inherited _parse_response() method from ResponseParserMixin (lines 38-61) performs:

JSON parsing with json.loads(response.text)

Deserialization to data models via from_dict()

Both operations are synchronous and CPU-intensive, making thread offloading beneficial.

tests/rtc/test_pcm_data.py (1)

1565-1669: LGTM - Comprehensive test coverage for string representations.

The new tests effectively validate the __str__() and __repr__() methods for PcmData across multiple scenarios:

Mono vs. stereo vs. multichannel: Verifies correct channel descriptions ("Mono", "Stereo", "4-channel")

Format descriptions: Validates "16-bit PCM" for s16 and "32-bit float" for f32

Duration formatting: Checks both milliseconds (< 1.0s) and seconds (>= 1.0s) representations

Sample count and rate: Ensures these are included in the output

The test coverage aligns well with the implementation in getstream/video/rtc/track_util.py lines 179-223.

getstream/utils/__init__.py (3)

129-148: LGTM - Async wrapper offloads query parameter construction to thread pool.

The build_query_param_async() function correctly delegates to the synchronous build_query_param() via asyncio.to_thread(), preventing blocking of the event loop during:

.to_json() method calls on objects

json.dumps() for dictionaries and complex objects

URL encoding with quote()

The docstring clearly indicates this offloads CPU-bound work, which sets appropriate expectations.

178-193: LGTM - Async wrapper offloads body dictionary construction to thread pool.

The build_body_dict_async() function correctly delegates to build_body_dict() via asyncio.to_thread(), offloading:

Recursive .to_dict() calls on nested objects

Traversal of nested dictionaries and lists

This prevents blocking the event loop when constructing complex request bodies with deep object hierarchies.

275-277: LGTM - Export list updated for new async utilities.

The __all__ list correctly includes the new async functions, making them part of the public API.

getstream/video/rtc/track_util.py (7)

170-177: LGTM - Standard repr implementation.

The __repr__() method delegates to __str__(), which is a common and acceptable pattern for providing both programmer and user-friendly representations.

179-223: LGTM - Clear and informative string representation.

The __str__() method provides a well-formatted, human-readable description of the audio data including:

Channel configuration (Mono/Stereo/n-channel)

Sample rate with units

Format description (16-bit PCM, 32-bit float)

Sample count

Duration with appropriate units (ms for < 1s, seconds for >= 1s)

The implementation correctly handles 2D array shapes by checking which dimension matches the channel count (lines 187-194).

638-638: LGTM - Simplified type checks rely on ndarray semantics.

The code now directly checks self.samples.dtype instead of first verifying isinstance(self.samples, np.ndarray). This simplification is safe because:

The constructor (lines 140-162) validates that samples have the correct dtype for the declared format

The constructor creates an empty ndarray with the correct dtype when samples is None (lines 130-137)

All methods that modify samples ensure they remain ndarrays

This reduces redundant type checking while maintaining correctness.

Also applies to: 679-679

725-726: LGTM - Simplified helper assumes ndarray.

The _ensure_ndarray() helper now simply returns pcm.samples directly, removing the previous type-checking branch. This is safe because the PcmData class invariants ensure samples is always an ndarray (enforced by the constructor and all mutation methods).

847-847: LGTM - Simplified copy using ndarray method.

The copy() method now directly calls self.samples.copy() without type checking. This is safe given the class invariants and simplifies the code.

1032-1040: LGTM - Streamlined shape normalization in chunks().

The updated logic for normalizing sample array shape is clearer:

Flattens 2D mono arrays to 1D (lines 1033-1034)

Preserves 2D arrays for multi-channel (lines 1035-1037)

Handles other cases (line 1039)

This aligns with the standard (channels, samples) convention for multi-channel audio.

1183-1184: LGTM - Consistent ndarray handling in tail() and head().

Both methods now include comments confirming that samples is always an ndarray (lines 1183, 1283), which documents the class invariant. The logic for handling multi-channel vs. mono cases is consistent with other methods in the class.

Also applies to: 1283-1284

coderabbitai · 2025-11-08T18:09:14Z

getstream/base.py

+            if call_kwargs.get("json") is not None:
+                json_body = call_kwargs.pop("json")
+                json_str = await asyncio.to_thread(json.dumps, json_body)
+                call_kwargs["content"] = json_str
+                call_kwargs["headers"] = call_kwargs.get("headers", {})
+                call_kwargs["headers"]["Content-Type"] = "application/json"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Content-Type header handling and encoding behavior.

The manual JSON serialization approach replaces httpx's built-in json parameter handling. While this offloads CPU-bound work to a thread pool, consider these points:

Encoding: json.dumps() returns a str, which httpx will encode using the charset from Content-Type. Since no charset is specified, it defaults to UTF-8, which is correct for JSON.

Header conflict: If call_kwargs["headers"] already contains a Content-Type header, line 365 will overwrite it. While this is likely intentional, ensure this doesn't break any callers that explicitly set Content-Type.

Error handling: If json_body contains non-serializable objects or circular references, json.dumps() will raise an exception in the thread pool, which will propagate as expected.

Behavior parity: Verify that this manual approach preserves all behaviors of httpx's native json parameter (e.g., setting Content-Length, handling empty bodies).

Consider adding a test to verify that the manual JSON encoding produces identical requests to httpx's native json parameter:

🏁 Script executed:

#!/bin/bash # Description: Check if there are existing tests for JSON request handling in async client rg -n "AsyncBaseClient.*json" --type=py -A5 -B5

Length of output: 0

🏁 Script executed:

#!/bin/bash # Get the actual implementation context around lines 360-365 cat getstream/base.py | sed -n '350,375p'

Length of output: 1186

🏁 Script executed:

#!/bin/bash # Search for any JSON-related tests more broadly rg -n "json" --type=py -i tests/ 2>/dev/null | head -30

Length of output: 2502

🏁 Script executed:

#!/bin/bash # Look for Content-Type header handling in the codebase rg -n "Content-Type" --type=py -B2 -A2

Length of output: 811

🏁 Script executed:

#!/bin/bash # Search for any test files that might test AsyncBaseClient or HTTP requests fd -e py tests/ | head -20

Length of output: 329

--- 🏁 Script executed: ```shell #!/bin/bash # Find what self.client is and where it's initialized rg -n "self.client" getstream/base.py -B3 -A3 | head -50

Length of output: 1690

🏁 Script executed:

#!/bin/bash # Check imports in base.py to confirm httpx usage head -50 getstream/base.py | grep -E "^import|^from"

Length of output: 440

🏁 Script executed:

#!/bin/bash # Search for any tests that verify request content/headers rg -n "call_kwargs|content.*json|headers.*Content-Type" --type=py tests/ -B2 -A2

Length of output: 0

🏁 Script executed:

#!/bin/bash # Look for test setup/mocking of httpx requests rg -n "httpx\.|AsyncClient|mock.*client" --type=py tests/ -B2 -A2 | head -40

Length of output: 2701

🌐 Web query:

httpx python library native json parameter behavior Content-Type header

💡 Result:

Short answer: httpx's json= parameter JSON-encodes the value and sets Content-Type: application/json for you (unless you override the header). [1][2]

Example: httpx.post(url, json={"a":1}) -> body is JSON and http header Content-Type: application/json, but if you pass headers={"Content-Type": "..."} that explicit value is used instead. [1][2]

Sources:
[1] httpx API (json parameter docs).
[2] httpx Quickstart (Sending JSON encoded data).

Fix header overwrite to respect explicit Content-Type if provided.

The manual JSON serialization unconditionally overwrites any caller-provided Content-Type header (line 365), which differs from httpx's native json parameter. When you pass json= to httpx, it sets Content-Type only if not explicitly provided by the caller; this code always overwrites.

Behavioral differences confirmed:

httpx native: Respects explicit headers={"Content-Type": ...} passed by caller

This implementation: Unconditionally sets to "application/json", breaking caller intent

Update line 365 to only set Content-Type if not already present:

if "Content-Type" not in call_kwargs["headers"]: call_kwargs["headers"]["Content-Type"] = "application/json"

This preserves the ability for callers to override Content-Type while maintaining the default JSON behavior.

🤖 Prompt for AI Agents

In getstream/base.py around lines 360 to 365, the manual JSON serialization always overwrites any caller-provided Content-Type header; change the logic so that after preparing call_kwargs["headers"] you only set the Content-Type to "application/json" if the caller has not already provided a Content-Type header (i.e., check for existing "Content-Type" key in call_kwargs["headers"] before assigning it) so callers can override the header while preserving the default behavior.

tbarbugli added 2 commits November 6, 2025 20:29

add __repr__ and __str__ for pcmdata

3b4c88a

dont block evloop with json enc/dec

3aab82d

tbarbugli temporarily deployed to feeds-v3-ci November 8, 2025 18:04 — with GitHub Actions Inactive

coderabbitai bot reviewed Nov 8, 2025

View reviewed changes

tbarbugli merged commit 41e5fa8 into main Nov 9, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non blocking json enc/dec #178

Non blocking json enc/dec #178

Uh oh!

tbarbugli commented Nov 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Non blocking json enc/dec #178

Non blocking json enc/dec #178

Uh oh!

Conversation

tbarbugli commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tbarbugli commented Nov 8, 2025 •

edited

Loading

coderabbitai bot commented Nov 8, 2025 •

edited

Loading