⚡️ Speed up method WS.read by 13%
#4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 13% (0.13x) speedup for
WS.readindistributed/comm/ws.py⏱️ Runtime :
28.8 microseconds→25.6 microseconds(best of14runs)📝 Explanation and details
The optimization achieves a 12% runtime improvement through two key changes to hot code paths identified by the profiler:
1. Frame Collection Optimization in
WS.read():frames = [(await self.sock.read_message()) for _ in range(n_frames)]- list comprehension with embedded async callsframes.append(frame)after eachawait2. Size Calculation Optimization in
from_frames():size = sum(map(nbytes, frames))- functional approach creating iterator objectsfor frame in frames: size += nbytes(frame)map()object creation andsum()function call overhead. The manual loop runs at C speed for the iteration and addition, avoiding Python function call overhead per frame.Performance Impact:
Both optimizations target the most time-consuming operations shown in the profiler - frame reading (1.4% of total time) and size calculation (0.4% of total time). While these percentages seem small, they represent the only non-dominated operations outside the main
_from_frames()call that consumes 99%+ of the time.The optimizations are particularly effective for workloads with multiple frames per message (as shown in the test cases), where the cumulative effect of reduced per-frame overhead becomes significant.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-WS.read-mgbs5ulqand push.