-
Notifications
You must be signed in to change notification settings - Fork 12
Non blocking json enc/dec #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR offloads CPU-bound operations to thread pools in AsyncBaseClient's request handling (JSON serialization and response parsing) and adds async utility functions that delegate synchronously to their counterparts. Additionally, PcmData gains repr and str methods with supporting tests for audio data formatting. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant AsyncBaseClient
participant ThreadPool
participant RequestLib
Caller->>AsyncBaseClient: _request_async(json=payload)
rect rgb(220, 240, 255)
Note over AsyncBaseClient,ThreadPool: JSON Serialization (new)
AsyncBaseClient->>ThreadPool: asyncio.to_thread(json.dumps, payload)
ThreadPool-->>AsyncBaseClient: JSON string
end
AsyncBaseClient->>RequestLib: httpx.request(content=json_str, headers)
RequestLib-->>AsyncBaseClient: response
rect rgb(220, 240, 255)
Note over AsyncBaseClient,ThreadPool: Response Parsing (new)
AsyncBaseClient->>ThreadPool: asyncio.to_thread(self._parse_response, response)
ThreadPool-->>AsyncBaseClient: parsed result
end
AsyncBaseClient-->>Caller: result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes Areas requiring attention:
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
getstream/video/rtc/track_util.py (1)
237-260: Review fallback logic for ambiguous array shapes.The
durationproperty attempts to infer the samples dimension from 2D arrays by comparing shape to channel count. The fallback on line 255 usesmax(self.samples.shape)for ambiguous cases.Potential issue: For ambiguous shapes like
(2, 2)with 2 channels, the fallback assumes the larger dimension is samples, but both dimensions are equal. While this might work in practice, the logic could be clearer.However, the existing test suite (including
test_bug_duration_with_different_array_shapes) validates this behavior, so the current implementation appears to handle the supported use cases correctly.Consider adding a comment explaining the ambiguous case handling:
else: # Ambiguous or unknown - assume (channels, samples) and pick larger dimension - # This handles edge cases like (2, 2) arrays + # This handles edge cases like (2, 2) arrays; assumes standard (channels, samples) convention num_samples = max(self.samples.shape[0], self.samples.shape[1])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
getstream/base.py(3 hunks)getstream/utils/__init__.py(3 hunks)getstream/video/rtc/track_util.py(11 hunks)tests/rtc/test_pcm_data.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (1)
PcmData(87-1344)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Tests (3.12)
- GitHub Check: Tests (3.11)
- GitHub Check: Tests (3.10)
- GitHub Check: Tests (3.13)
🔇 Additional comments (13)
getstream/base.py (2)
4-4: LGTM - asyncio import added for thread offloading.The import is used for
asyncio.to_thread()calls later in the file.
389-391: LGTM - Response parsing offloaded to thread pool.Offloading
_parse_response()to a thread viaasyncio.to_thread()is appropriate since JSON parsing withjson.loads()is CPU-bound and could block the event loop for large responses.The inherited
_parse_response()method fromResponseParserMixin(lines 38-61) performs:
- JSON parsing with
json.loads(response.text)- Deserialization to data models via
from_dict()Both operations are synchronous and CPU-intensive, making thread offloading beneficial.
tests/rtc/test_pcm_data.py (1)
1565-1669: LGTM - Comprehensive test coverage for string representations.The new tests effectively validate the
__str__()and__repr__()methods for PcmData across multiple scenarios:
- Mono vs. stereo vs. multichannel: Verifies correct channel descriptions ("Mono", "Stereo", "4-channel")
- Format descriptions: Validates "16-bit PCM" for s16 and "32-bit float" for f32
- Duration formatting: Checks both milliseconds (< 1.0s) and seconds (>= 1.0s) representations
- Sample count and rate: Ensures these are included in the output
The test coverage aligns well with the implementation in
getstream/video/rtc/track_util.pylines 179-223.getstream/utils/__init__.py (3)
129-148: LGTM - Async wrapper offloads query parameter construction to thread pool.The
build_query_param_async()function correctly delegates to the synchronousbuild_query_param()viaasyncio.to_thread(), preventing blocking of the event loop during:
.to_json()method calls on objectsjson.dumps()for dictionaries and complex objects- URL encoding with
quote()The docstring clearly indicates this offloads CPU-bound work, which sets appropriate expectations.
178-193: LGTM - Async wrapper offloads body dictionary construction to thread pool.The
build_body_dict_async()function correctly delegates tobuild_body_dict()viaasyncio.to_thread(), offloading:
- Recursive
.to_dict()calls on nested objects- Traversal of nested dictionaries and lists
This prevents blocking the event loop when constructing complex request bodies with deep object hierarchies.
275-277: LGTM - Export list updated for new async utilities.The
__all__list correctly includes the new async functions, making them part of the public API.getstream/video/rtc/track_util.py (7)
170-177: LGTM - Standard repr implementation.The
__repr__()method delegates to__str__(), which is a common and acceptable pattern for providing both programmer and user-friendly representations.
179-223: LGTM - Clear and informative string representation.The
__str__()method provides a well-formatted, human-readable description of the audio data including:
- Channel configuration (Mono/Stereo/n-channel)
- Sample rate with units
- Format description (16-bit PCM, 32-bit float)
- Sample count
- Duration with appropriate units (ms for < 1s, seconds for >= 1s)
The implementation correctly handles 2D array shapes by checking which dimension matches the channel count (lines 187-194).
638-638: LGTM - Simplified type checks rely on ndarray semantics.The code now directly checks
self.samples.dtypeinstead of first verifyingisinstance(self.samples, np.ndarray). This simplification is safe because:
- The constructor (lines 140-162) validates that samples have the correct dtype for the declared format
- The constructor creates an empty ndarray with the correct dtype when samples is None (lines 130-137)
- All methods that modify samples ensure they remain ndarrays
This reduces redundant type checking while maintaining correctness.
Also applies to: 679-679
725-726: LGTM - Simplified helper assumes ndarray.The
_ensure_ndarray()helper now simply returnspcm.samplesdirectly, removing the previous type-checking branch. This is safe because the PcmData class invariants ensuresamplesis always an ndarray (enforced by the constructor and all mutation methods).
847-847: LGTM - Simplified copy using ndarray method.The
copy()method now directly callsself.samples.copy()without type checking. This is safe given the class invariants and simplifies the code.
1032-1040: LGTM - Streamlined shape normalization in chunks().The updated logic for normalizing sample array shape is clearer:
- Flattens 2D mono arrays to 1D (lines 1033-1034)
- Preserves 2D arrays for multi-channel (lines 1035-1037)
- Handles other cases (line 1039)
This aligns with the standard (channels, samples) convention for multi-channel audio.
1183-1184: LGTM - Consistent ndarray handling in tail() and head().Both methods now include comments confirming that
samplesis always an ndarray (lines 1183, 1283), which documents the class invariant. The logic for handling multi-channel vs. mono cases is consistent with other methods in the class.Also applies to: 1283-1284
| if call_kwargs.get("json") is not None: | ||
| json_body = call_kwargs.pop("json") | ||
| json_str = await asyncio.to_thread(json.dumps, json_body) | ||
| call_kwargs["content"] = json_str | ||
| call_kwargs["headers"] = call_kwargs.get("headers", {}) | ||
| call_kwargs["headers"]["Content-Type"] = "application/json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify Content-Type header handling and encoding behavior.
The manual JSON serialization approach replaces httpx's built-in json parameter handling. While this offloads CPU-bound work to a thread pool, consider these points:
-
Encoding:
json.dumps()returns astr, which httpx will encode using the charset from Content-Type. Since no charset is specified, it defaults to UTF-8, which is correct for JSON. -
Header conflict: If
call_kwargs["headers"]already contains aContent-Typeheader, line 365 will overwrite it. While this is likely intentional, ensure this doesn't break any callers that explicitly set Content-Type. -
Error handling: If
json_bodycontains non-serializable objects or circular references,json.dumps()will raise an exception in the thread pool, which will propagate as expected. -
Behavior parity: Verify that this manual approach preserves all behaviors of httpx's native
jsonparameter (e.g., settingContent-Length, handling empty bodies).
Consider adding a test to verify that the manual JSON encoding produces identical requests to httpx's native json parameter:
🏁 Script executed:
#!/bin/bash
# Description: Check if there are existing tests for JSON request handling in async client
rg -n "AsyncBaseClient.*json" --type=py -A5 -B5Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get the actual implementation context around lines 360-365
cat getstream/base.py | sed -n '350,375p'Length of output: 1186
🏁 Script executed:
#!/bin/bash
# Search for any JSON-related tests more broadly
rg -n "json" --type=py -i tests/ 2>/dev/null | head -30Length of output: 2502
🏁 Script executed:
#!/bin/bash
# Look for Content-Type header handling in the codebase
rg -n "Content-Type" --type=py -B2 -A2Length of output: 811
🏁 Script executed:
#!/bin/bash
# Search for any test files that might test AsyncBaseClient or HTTP requests
fd -e py tests/ | head -20Length of output: 329
---
🏁 Script executed:
```shell
#!/bin/bash
# Find what self.client is and where it's initialized
rg -n "self.client" getstream/base.py -B3 -A3 | head -50
Length of output: 1690
🏁 Script executed:
#!/bin/bash
# Check imports in base.py to confirm httpx usage
head -50 getstream/base.py | grep -E "^import|^from"Length of output: 440
🏁 Script executed:
#!/bin/bash
# Search for any tests that verify request content/headers
rg -n "call_kwargs|content.*json|headers.*Content-Type" --type=py tests/ -B2 -A2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for test setup/mocking of httpx requests
rg -n "httpx\.|AsyncClient|mock.*client" --type=py tests/ -B2 -A2 | head -40Length of output: 2701
🌐 Web query:
httpx python library native json parameter behavior Content-Type header
💡 Result:
Short answer: httpx's json= parameter JSON-encodes the value and sets Content-Type: application/json for you (unless you override the header). [1][2]
Example: httpx.post(url, json={"a":1}) -> body is JSON and http header Content-Type: application/json, but if you pass headers={"Content-Type": "..."} that explicit value is used instead. [1][2]
Sources:
[1] httpx API (json parameter docs).
[2] httpx Quickstart (Sending JSON encoded data).
Fix header overwrite to respect explicit Content-Type if provided.
The manual JSON serialization unconditionally overwrites any caller-provided Content-Type header (line 365), which differs from httpx's native json parameter. When you pass json= to httpx, it sets Content-Type only if not explicitly provided by the caller; this code always overwrites.
Behavioral differences confirmed:
- httpx native: Respects explicit
headers={"Content-Type": ...}passed by caller - This implementation: Unconditionally sets to "application/json", breaking caller intent
Update line 365 to only set Content-Type if not already present:
if "Content-Type" not in call_kwargs["headers"]:
call_kwargs["headers"]["Content-Type"] = "application/json"This preserves the ability for callers to override Content-Type while maintaining the default JSON behavior.
🤖 Prompt for AI Agents
In getstream/base.py around lines 360 to 365, the manual JSON serialization
always overwrites any caller-provided Content-Type header; change the logic so
that after preparing call_kwargs["headers"] you only set the Content-Type to
"application/json" if the caller has not already provided a Content-Type header
(i.e., check for existing "Content-Type" key in call_kwargs["headers"] before
assigning it) so callers can override the header while preserving the default
behavior.
Uh oh!
There was an error while loading. Please reload this page.