New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataclass wrapper for ipv8 Payload #1011
Comments
Thanks for the suggestion. This is definitely something that fits the IPv8 core library. I spent some time minimizing your code to make it smaller (combining your Wrapperfrom dataclasses import dataclass
from functools import partial
from typing import Type, get_args, get_type_hints
from ipv8.messaging.serialization import Serializable
def type_map(t: Type) -> str:
if t == int:
return "Q"
elif t == bytes:
return "varlenH"
elif "Tuple" in str(t) or "List" in str(t) or "Set" in str(t):
return (
"varlenH-list"
if "int" in str(t) or "bytes" in str(t)
else [get_args(t)[0]]
)
elif hasattr(t, "format_list"):
return t
else:
raise NotImplementedError(t, " unknown")
def dataclass_payload(cls=None, /, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False):
"""
Equivalent to ``@dataclass``, but also makes the wrapped class a ``Serializable``.
See ``dataclasses.dataclass`` for argument descriptions.
"""
if cls is None:
return partial(dataclass_payload, init=init, repr=repr, eq=eq, order=order, unsafe_hash=unsafe_hash,
frozen=frozen)
origin = dataclass(cls, init=init, repr=repr, eq=eq, order=order, unsafe_hash=unsafe_hash, frozen=frozen)
class DataClassPayload(origin, Serializable):
names = list(get_type_hints(origin).keys())
format_list = list(map(type_map, get_type_hints(origin).values()))
@classmethod
def from_unpack_list(cls, *args):
return DataClassPayload(*args)
def to_pack_list(self):
return [(self.format_list[i], getattr(self, self.names[i])) for i in range(len(self.names))]
DataClassPayload.__name__ = origin.__name__
DataClassPayload.__qualname__ = origin.__qualname__
return DataClassPayload Benchmark codefrom dataclasses import asdict, dataclass, is_dataclass
from time import time
from testdc_bulat import payload
from testdc_quinten import dataclass_payload
from ipv8.messaging.serialization import Serializable, Payload, default_serializer
@payload
@dataclass
class BaseDataclass:
other: bytes
@dataclass
class BaseDataclass2:
other: bytes
t_load_1_start = time()
@payload
@dataclass(unsafe_hash=True)
class CellState(BaseDataclass):
cell_additive: int
cells: bytes
t_load_1_end = time()
t_init_1_start = time()
c1 = CellState(b"def", 1, b"abc")
t_init_1_end = time()
t_pack_1_start = time()
raw = default_serializer.pack_serializable(c1)
t_pack_1_end = time()
t_unpack_1_start = time()
default_serializer.unpack_serializable(CellState, raw)
t_unpack_1_end = time()
print(raw)
print(default_serializer.unpack_serializable(CellState, raw)[0])
print(f"{is_dataclass(c1)=}, {isinstance(c1, Serializable)=}, {isinstance(c1, Payload)=}")
t_load_2_start = time()
@dataclass_payload(unsafe_hash=True)
class CellState2(BaseDataclass2):
cell_additive: int
cells: bytes
t_load_2_end = time()
t_init_2_start = time()
c2 = CellState2(b"def", 1, b"abc")
t_init_2_end = time()
t_pack_2_start = time()
raw = default_serializer.pack_serializable(c2)
t_pack_2_end = time()
t_unpack_2_start = time()
default_serializer.unpack_serializable(CellState2, raw)
t_unpack_2_end = time()
print(raw)
print(default_serializer.unpack_serializable(CellState2, raw)[0])
print(f"{is_dataclass(c2)=}, {isinstance(c2, Serializable)=}, {isinstance(c2, Payload)=}")
assert asdict(c1) == asdict(c2)
print("=== RESULTS 1 ===")
print(f"Time load: {t_load_1_end-t_load_1_start} seconds")
print(f"Time init: {t_init_1_end-t_init_1_start} seconds")
print(f"Time pack: {t_pack_1_end-t_pack_1_start} seconds")
print(f"Time total: {(t_load_1_end-t_load_1_start+t_init_1_end-t_init_1_start+t_pack_1_end-t_pack_1_start+t_unpack_1_end-t_unpack_1_start)} seconds")
print(" - Important -")
print(f"Time unpack: {t_unpack_1_end-t_unpack_1_start} seconds")
print(f"Time init + pack: {(t_init_1_end-t_init_1_start+t_pack_1_end-t_pack_1_start)} seconds")
print("=== RESULTS 2 ===")
print(f"Time load: {t_load_2_end-t_load_2_start} seconds")
print(f"Time init: {t_init_2_end-t_init_2_start} seconds")
print(f"Time pack: {t_pack_2_end-t_pack_2_start} seconds")
print(f"Time total: {(t_load_2_end-t_load_2_start+t_init_2_end-t_init_2_start+t_pack_2_end-t_pack_2_start+t_unpack_2_end-t_unpack_2_start)} seconds")
print(" - Important -")
print(f"Time unpack: {t_unpack_2_end-t_unpack_2_start} seconds")
print(f"Time init + pack: {(t_init_2_end-t_init_2_start+t_pack_2_end-t_pack_2_start)} seconds") If anyone thinks they can make this even smaller and faster, be my guest. This is probably where I'll leave it. |
By the way, I think this is the most controversial choice in this suggestion: if t == int:
return "Q" All I'd like to hear feedback on this. |
Summary of offline discussion:
|
I'm not an expert in python black magic, but I can help with any other work on this feature. |
@drew2a thanks for offering to help, I'll fire up the latest prototype to see what still needs to be done in order to finish this. I'll follow up with the findings. [Admin notice: unassigning @grimadas due to scheduling constraints.] |
I double-checked the black magic and corrected it a bit to allow for all of the possible strange stuff you can do with inheritance and nesting. The WIP is available here https://github.com/qstokkink/py-ipv8/tree/add_dcpayload Example usage from the test: varlenH = TypeVar('varlenH') # Can be any format string that the Serializer can handle
@dataclass_payload
class A:
@dataclass_payload
class Item:
a: bool
a: int
b: bytes
c: varlenH
d: Item
e: [Item]
f: str
g: List[Item] In order to get this to production, the following still needs to be done:
@drew2a If you have time, would you like to perform a sanity check using the prototype branch? |
Sure! |
Thanks, I'll leave that to you and focus on the documentation myself. |
First of all: thank you for your work! Community checksI've tried to test this branch on Here the code: https://github.com/Tribler/tribler/pull/6427/files Conclusion: it works on local tests, but doesn't work on a wild network. Stacktrace: [PID:40930] 2021-10-06 21:30:56,289 - ERROR <community:435> PopularityCommunity.on_packet(): Exception occurred while handling packet!
Traceback (most recent call last):
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 392, in unpack_serializable
offset = self._packers[fmt].unpack(data, offset, unpack_list)
TypeError: unhashable type: 'list'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 392, in unpack_serializable
offset = self._packers[fmt].unpack(data, offset, unpack_list)
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 255, in unpack
result = unpack_from(self.format_str, data, offset)
struct.error: unpack_from requires a buffer of at least 35108 bytes for unpacking 8 bytes at offset 35100 (actual buffer size is 395)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/community.py", line 431, in on_packet
result = handler(source_address, data)
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/lazy_community.py", line 80, in wrapper
unpacked = self.serializer.unpack_serializable_list(payloads, remainder, offset=23)
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 420, in unpack_serializable_list
payload, offset = self.unpack_serializable(serializable, data, offset)
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 400, in unpack_serializable
offset = self._packers['payload-list'].unpack(data, offset, unpack_list, fmt[0])
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 241, in unpack
offset = self.packer.unpack(data, offset, result, *args)
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 86, in unpack
unpacked, offset = self.serializer.unpack_serializable(serializable_class, data, offset=offset + 2)
File "/Users/<user>/Projects/github.com/Tribler/tribler/src/pyipv8/ipv8/messaging/serialization.py", line 402, in unpack_serializable
raise PackError("Could not unpack item: %s\n%s: %s" % (fmt, type(e).__name__, str(e))) from e
ipv8.messaging.serialization.PackError: Could not unpack item: q Notes (doesn't related to the wild network):msg_idIf I specify timestamp_type = TypeVar('Q')
@dataclass_payload
class TorrentsHealthPayload:
@dataclass_payload
class Torrent:
infohash: bytes
seeders: int
leechers: int
timestamp: timestamp_type
random_torrents_length: int
torrents_checked_length: int
random_torrents: List[Torrent]
torrents_checked: List[Torrent]
msg_id:int = 1 Error occurred: > return TorrentsHealthPayload(
random_torrents_length=len(random_torrents_checked),
torrents_checked_length=len(popular_torrents_checked),
random_torrents=to_list(random_torrents_checked),
torrents_checked=to_list(popular_torrents_checked)
)
E TypeError: __init__() missing 1 required positional argument: 'msg_id' Hints doesn't workHere the dataclass_payload example: Here just an ordinary dataclass: @dataclass
class DataClass:
any: int
data: str |
@drew2a that was fast, thanks for the quick feedback 👍 I see the following going "wrong" (i.e., a mismatch between your expectations and the implementation - so this should change):
Please let me know what you think of these possible changes to address these issues:
I don't have a solution for the |
@drew2a I pushed two new commits to the repo that implement the possible changes from my previous post. [5b88423] @dataclass_payload(msg_id=53)
class MyMessage:
a: int
b: bytes [b92a8e8] from dataclasses import dataclass
from ipv8.messaging.payload_dataclass import dataclass Please let me know if you think these are improvements, or if we should throw these changes out. |
I'll check it soon :) |
@qstokkink |
msg_idThis improvement works and looks perfectly to me: @dataclass(msg_id=1)
class TorrentsHealthPayload:
@dataclass
class Torrent:
infohash: bytes
seeders: int
leechers: int
timestamp: timestamp_type
random_torrents_length: int
torrents_checked_length: int
random_torrents: List[Torrent]
torrents_checked: List[Torrent] dataclassThis improvement also looks cool, but it doesn't solve the hint problem (at least for me): Probably it is not related to improvements, but I see the following warning: All my changes are located here: https://github.com/Tribler/tribler/pull/6427/files#diff-ada92791a877316681cd06afcbf338be4bee5b61357522ea75c700206f3e098f |
Thanks for checking, I'll keep the After restarting my IDE my type hinting also no longer worked: PyCharm seems to somehow cache the type hints if you've first run it with a normal |
Request for dataclass-based message payloads.
Since message payloads are used very frequently I suggest we simplify the creation and usage of Payloads.
Python data classes might be a promising direction.
Here is an example:
The payload is a wrapper:
From the developer point of view the dataclass would act as a
VariablePayload
.Downside: we have to write map from Python types to ipv8 struct types
The text was updated successfully, but these errors were encountered: