New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix serialize() usages in custom JSON encoders #28742
Conversation
_iterables = (list, set, tuple) | ||
_collections = (list, set, tuple) # dict is treated specially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to better reflect these classes. Iterable has special meaning in Python (collection also does, but its meaning is more in-line with what we mean here).
2d1b83e
to
26d8e86
Compare
Our custom JSON encoders call serialize() in their default() hooks. However, serialize() does not guarantee to return a JSON-serializable type, and when it does not (e.g. a set object), the JSON encoder would infinitely recurse into serialize() trying to coerce the object into JSON-serializable. This patch adds an additional 'ensure_json_compatible' parameter to serialize(), so JSON encoders can use it to require serialize() to always return JSON-compatible values. Some minor refactoring is also applied to serialize() to ensure the flag is correctly passed down through recursive visits, and also hide the 'depth' tracking parameter (which is internal to the function) from outside access. The _reverse_cache global variable is also deleted since it is not used anywhere, even before this patch.
61c034e
to
0446dba
Compare
Running it with full test suite |
I do not think it should be solved here, but rather with a custom encoder for json that makes Explanation: this code was intentionally written to remove all JSONisms and to have the possibility to encose into another format and to have cleaner code - hence the separation between serializer and encode (as it is within the standard library!) JSON does not guarantee order or uniqueness in its lists hence the issue. Three ways to solve this. 1) implement a JSON encoder that takes a collection and always converts it into a list with then (minor) downside the deserialization might fail. 2) adjust the serializer to serialize in such a way that the deserializer deserializes into the right type - e.g. How we do it now with custom objects 3) accept the loss of type information and convert into list in the serializer. 2 is the most robust, 1 is better for just json, 3 is a hack with potential deserialization issues (it most likely requires inspection slowing down the deserializer) but less of a hack than the current implementation. Finally, importantly do not put any serialization code in a class. Classes are an order of a magnitude slower. This is due to the amount of stack walks the interpreter needs to do. P.S. Please add a test that covers this both ways (deserialization / serialization) which takes care of uniqueness and ordering P.P.S. I think the best way is to use option 2 and serialize tuples, lists, sets into something like mytuple = {"classname": "builtins.tuple", data([a, b, c, d]), "version": "1"}} thus wrapping it, so we can properly deserialize it. |
return [deserialize(d) for d in o] | ||
|
||
if not isinstance(o, dict): | ||
raise TypeError() | ||
raise TypeError("not deserializable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably better to explain what type we are getting if we are improving the error message,.
return True | ||
|
||
return False | ||
return any(p.match(classname) is not None for p in _patterns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
elif isinstance(value, dict): | ||
for k, v in value.items(): | ||
s += f"{k}={str(serialize(v, False))}," | ||
s += f"{k}={str(serialize(v))}," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, i think? :-)
@@ -56,12 +56,12 @@ def default(self, o: Any) -> Any: | |||
return o.strftime("%Y-%m-%d") | |||
|
|||
if isinstance(o, Decimal): | |||
data = serialize(o) | |||
data = serialize(o, ensure_json_compatible=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, this should not be done this way, but rather be solved here.
@@ -72,89 +70,104 @@ def decode(d: dict[str, str | int | T]) -> tuple: | |||
return d[CLASSNAME], d[VERSION], d.get(DATA, None) | |||
|
|||
|
|||
def serialize(o: object, depth: int = 0) -> U | None: | |||
@dataclasses.dataclass() | |||
class _Serializer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not put this in a class. Classes are overhead on the stack and significantly slow down serialization/deserialization. The code was implemented on purpose o be as high up in the chain as possible.
@@ -72,89 +70,104 @@ def decode(d: dict[str, str | int | T]) -> tuple: | |||
return d[CLASSNAME], d[VERSION], d.get(DATA, None) | |||
|
|||
|
|||
def serialize(o: object, depth: int = 0) -> U | None: | |||
@dataclasses.dataclass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds calling overhead, which we don't want in this code.
Rethinking this, I think this can simply be achieved by a better implemented |
Fix #28741.
Our custom JSON encoders call
serialize()
in theirdefault()
hooks. However,serialize()
does not guarantee to return a JSON-serializable type, and when it does not (e.g. a set object), the JSON encoder would infinitely recurse intoserialize()
trying to coerce the object into JSON-serializable.This patch adds an additional
ensure_json_compatible
parameter toserialize()
, so JSON encoders can use it to requireserialize()
to always return JSON-compatible values.Some minor refactoring is also applied to
serialize()
to ensure the flag is correctly passed down through recursive visits, and also hide thedepth
tracking parameter (which is internal to the function) from outside access.The
_reverse_cache
global variable is also deleted since it is not used anywhere, even before this patch.