Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor WSMessage to use tagged unions #7319

Merged
merged 24 commits into from
Oct 26, 2024
Merged

Refactor WSMessage to use tagged unions #7319

merged 24 commits into from
Oct 26, 2024

Conversation

Dreamsorcerer
Copy link
Member

This reworks WSMessage to be a union of dataclasses which can provide much better type safety using tagged unions (on the type attribute).

I need to go over a couple of details (like what should be exported and from where, imports are a little messy currently). But, my main question before I finish it off, is whether it makes sense to have .json() only on the TEXT/BINARY messages? If we do this, it becomes easy for mypy to validate the code, requiring a msg.type is WSMsgType.TEXT check before calling .json().

Fixes #7313

@Dreamsorcerer Dreamsorcerer added the backport:skip Skip backport bot label Jun 10, 2023
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.59%. Comparing base (7982b59) to head (e701955).
Report is 622 commits behind head on master.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7319      +/-   ##
==========================================
- Coverage   98.60%   98.59%   -0.01%     
==========================================
  Files         108      108              
  Lines       35160    35207      +47     
  Branches     4184     4184              
==========================================
+ Hits        34668    34714      +46     
- Misses        329      330       +1     
  Partials      163      163              
Flag Coverage Δ
CI-GHA 98.48% <100.00%> (-0.01%) ⬇️
OS-Linux 98.15% <100.00%> (-0.01%) ⬇️
OS-Windows 96.54% <100.00%> (+<0.01%) ⬆️
OS-macOS 97.84% <98.97%> (+<0.01%) ⬆️
Py-3.10.11 97.71% <98.97%> (+<0.01%) ⬆️
Py-3.10.15 97.63% <95.28%> (-0.01%) ⬇️
Py-3.11.10 97.71% <95.28%> (+<0.01%) ⬆️
Py-3.11.9 97.79% <98.97%> (+0.01%) ⬆️
Py-3.12.7 98.19% <95.28%> (-0.01%) ⬇️
Py-3.13.0 98.18% <95.28%> (-0.01%) ⬇️
Py-3.9.13 97.61% <98.97%> (+<0.01%) ⬆️
Py-3.9.20 97.53% <94.33%> (-0.01%) ⬇️
Py-pypy7.3.16 97.17% <99.05%> (+<0.01%) ⬆️
VM-macos 97.84% <98.97%> (+<0.01%) ⬆️
VM-ubuntu 98.15% <100.00%> (-0.01%) ⬇️
VM-windows 96.54% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bdraco
Copy link
Member

bdraco commented Oct 22, 2024

Creating the dataclasses will be significantly slower than the NamedTuple, but the property accesses will be faster...probably need to benchmark it to see what the tradeoff will be

@Dreamsorcerer
Copy link
Member Author

Creating the dataclasses will be significantly slower than the NamedTuple, but the property accesses will be faster...probably need to benchmark it to see what the tradeoff will be

A quick test locally suggests that NamedTuple will also work with unions. Let me know which you want to go with and I'll update (and then try to revert the micro-optimisations that got caught in conflict resolutions).

@Dreamsorcerer
Copy link
Member Author

Also worth noting that with dataclass we get errors because DataQueue expects a Sized thing. This suggests to me that DataQueue is probably not the best solution here as there seems to be no meaning to the size when processing WS messages. Maybe an unsized queue could be used here to be a little more optimised.

@bdraco
Copy link
Member

bdraco commented Oct 23, 2024

I'll write a benchmark for all the options later today

@bdraco
Copy link
Member

bdraco commented Oct 23, 2024

Options compared

from dataclasses import dataclass
from typing import NamedTuple
from aiohttp.http_websocket import WSMsgType
from typing import Optional, Literal
import timeit

BINARY_TYPE = WSMsgType.BINARY


@dataclass
class _WSMessageDataClass:
    data: object
    type: WSMsgType
    extra: Optional[str] = None


@dataclass
class WSMessageBinaryDataClass(_WSMessageDataClass):
    data: bytes
    type: Literal[WSMsgType.BINARY] = BINARY_TYPE


@dataclass(frozen=True)
class _WSMessageFrozenDataClass:
    data: object
    type: WSMsgType
    extra: Optional[str] = None


@dataclass(frozen=True)
class WSMessageBinaryFrozenDataClass(_WSMessageFrozenDataClass):
    data: bytes
    type: Literal[WSMsgType.BINARY] = BINARY_TYPE


class _WSMessageNamedTuple(NamedTuple):
    data: object
    type: WSMsgType
    extra: Optional[str] = None


class WSMessageBinaryNamedTuple(_WSMessageNamedTuple):
    data: bytes
    type: Literal[WSMsgType.BINARY]


data = b"Hello, world!"

named_tuple_direct = timeit.timeit(
    "x=WSMessageBinaryNamedTuple(data, BINARY_TYPE);x.data;x.type",
    globals=locals(),
    number=1000000,
)
named_tuple_tuple_new = timeit.timeit(
    "x=tuple.__new__(WSMessageBinaryNamedTuple, (data, BINARY_TYPE));x.data;x.type",
    globals=locals(),
    number=1000000,
)
dataclass_direct = timeit.timeit(
    "x=WSMessageBinaryDataClass(data);x.data;x.type", globals=locals(), number=1000000
)
frozen_dataclass_direct = timeit.timeit(
    "x=WSMessageBinaryFrozenDataClass(data);x.data;x.type",
    globals=locals(),
    number=1000000,
)

print(f"named_tuple_direct: {named_tuple_direct}")
print(f"named_tuple_tuple_new: {named_tuple_tuple_new}")
print(f"dataclass_direct: {dataclass_direct}")
print(f"frozen_dataclass_direct: {frozen_dataclass_direct}")

named_tuple_direct: 0.13957041699904948
named_tuple_tuple_new: 0.09394508300465532
dataclass_direct: 0.11717991600744426
frozen_dataclass_direct: 0.26404679199913517

WSMessageBinaryNamedTuple(data, BINARY_TYPE): 0.13957041699904948
tuple.__new__(WSMessageBinaryNamedTuple, (data, BINARY_TYPE)): 0.09394508300465532
WSMessageBinaryDataClass(data): 0.11717991600744426
WSMessageBinaryFrozenDataClass(data): 0.26404679199913517

WSMessageBinaryNamedTuple(data, BINARY_TYPE)

Advantages: Remains Immutable, not a breaking change and backwards compatible if someone is doing tuple unpacking
Disadvantage: Slow to construct as it uses a lambda under the hood, type still has to be passed

tuple.__new__(WSMessageBinaryNamedTuple, (data, BINARY_TYPE))

Advantages: Remains Immutable, not a breaking change and backwards compatible if someone is doing tuple unpacking
Disadvantage: construction syntax is a bit clunky to avoid the lambda (~33% faster), type still has to be passed

WSMessageBinaryDataClass(data)

Advantages: Better syntax to construct, type value does not have to be passed, still faster than direct NamedTuple construction which uses a lambda under the hood.
Disadvantages: not immutable, breaking change since tuple unpack no longer works (could probably be worked around)

WSMessageBinaryFrozenDataClass(data)

Advantages: Remains immutable, Better syntax to construct, type value does not have to be passed, painfully slow to construct (will be quite noticeable)
Disadvantages: breaking change since tuple unpack no longer works (could probably be worked around)

aiohttp/http_websocket.py Fixed Show fixed Hide fixed
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 25, 2024
Copy link

codspeed-hq bot commented Oct 25, 2024

CodSpeed Performance Report

Merging #7319 will not alter performance

Comparing ws_message (e701955) with master (7982b59)

Summary

✅ 4 untouched benchmarks

@bdraco
Copy link
Member

bdraco commented Oct 25, 2024

If you rebase, you should get a better idea of the performance impact, however from looking at the https://codspeed.io/aio-libs/aiohttp/branches/websocket_benchmarks it looks like changes here may be shadowed by the performance of parse_frame
Screenshot 2024-10-25 at 1 04 29 PM

@Dreamsorcerer Dreamsorcerer marked this pull request as ready for review October 26, 2024 00:10
@Dreamsorcerer
Copy link
Member Author

Think we're all good now.

@Dreamsorcerer Dreamsorcerer merged commit 5f4c052 into master Oct 26, 2024
40 of 41 checks passed
@Dreamsorcerer Dreamsorcerer deleted the ws_message branch October 26, 2024 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip Skip backport bot bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

websocket: message.json() fails with a TypeError when the message exceeds the size limit
2 participants