Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structs not deserializable with bincode #91

Open
rrichardson opened this issue Apr 21, 2021 · 18 comments
Open

Structs not deserializable with bincode #91

rrichardson opened this issue Apr 21, 2021 · 18 comments

Comments

@rrichardson
Copy link

rrichardson commented Apr 21, 2021

I have implemented an Ingress Controller using your very helpful API. For maintaining state, I store a handful of different objects in Sled, and, to save space/cpus, I serialize into Bincode in the Sled DB.

I have noticed that several Endpoints objects that I get back from my k8s cluster, can not be correctly deserialized from bincode-serialized bytes. (maybe they were incorrectly serialized?).
I have not yet made a minimal reproduction of this, but I can, if it helps.

If I change my serde lib to serde_json from bincode, then serialization and deserialization works just fine. I'm guessing that some facet of the data, such as an empty vector or something, in some of the live k8s objects are causing problems with bincode, since it does not produce self-describing encoded data.

I'm just checking to see if this is a known issue, if not, I'll try to make a repro of it.

@Arnavion
Copy link
Owner

It's not a known issue, but it's possible. The serde library is format-independent but an individual (De)Serialize impl isn't necessarily format-independent. bincode isn't a self-describing format so it's possible the #[serde(skip_serializing_if("Option::is_none"))]-type things or something else breaks it.

Please do provide a repro.

@rrichardson
Copy link
Author

Here is a basic repro/test-project : When I get a sec, I can try to instrument the Deserialization functions for Endpoints figure out where it's failing.

https://github.com/rrichardson/bincode-k8s-repro/blob/main/src/main.rs

The error is Custom("invalid type: sequence, expected Endpoints")

@rrichardson
Copy link
Author

I thought it was just a specific Endpoint example, but now I think it's everything in bincode.

@rrichardson rrichardson changed the title Some structs not deserializable with bincode (sometimes?) Structs not deserializable with bincode Apr 21, 2021
@Arnavion
Copy link
Owner

Ah, the "invalid type: sequence, expected Ingress" error made me remember. It's because bincode serializes structs as sequences of fields, and why #[derive(serde::Deserialize)] accepts both maps and sequences of fields in its generated impls ( serde-rs/serde@dbe2bea ).

I can see if adding sequence stuff to the codegen's Deserialize impls is sufficient, or if the skip_serializing_if things will still break.

@rrichardson
Copy link
Author

Looking at the implementation of Deserialize, and looking at the bincode docs. I am willing to say that the two implementations are at odds with one another. serialize/deserialize_struct uses a sequence however, the implemented serde routines use a map.

This quote seems appropos: The implementation supports two possible ways that a struct may be represented by a data format: as a seq like in Bincode, and as a map like in JSON.

It is hard to get a clear picture, but it looks like deserialize_identifier and visit_map aren't supported in bincode. If that's the case, the ser/de routines in k8s-openapi might require a serious rework to support bincode.

@Arnavion
Copy link
Owner

Yes, that's the "sequence stuff" I referred to.

@rrichardson
Copy link
Author

Hmm.. in looking for a compact alternative to bincode, I tried MessagePack, and got exactly the same result :( MessagePack is self-describing, so I figured it'd support map and deserialize_identifier.

@Arnavion
Copy link
Owner

I can see if adding sequence stuff to the codegen's Deserialize impls is sufficient, or if the skip_serializing_if things will still break.

Indeed, I realized while adding the sequence stuff to the Deserialize impls that the calls to serialize will not match up with the calls to SeqAccess::next_element, so it's not possible to use a non-self-describing format while still keeping the ability to not emit None as null with serde_json.

@Arnavion
Copy link
Owner

Arnavion commented Apr 21, 2021

Hmm.. in looking for a compact alternative to bincode, I tried MessagePack, and got exactly the same result :( MessagePack is self-describing, so I figured it'd support map and deserialize_identifier.

It's self-describing in that it encodes the type, but it doesn't encode the field name. 1 2 Without the field name it's not possible to know when a field was skipped while serializing and should thus not be deserialized either.

@rrichardson
Copy link
Author

That makes sense. It is my understanding that MessagePack does persist field names. So maybe the rmp implementation should be updated to support the map approach.
In either case, it seems like there should be two, distinct ser/de implementations for each k8s-openapi struct. One that persists using map instead of struct. and one that persists as struct/sequence, and persists all fields, even if they are None. I'm not sure if it's possible to branch based on the capabilities of the serializer, but I'm not sure how else to approach such a problem.

@Arnavion
Copy link
Owner

I'm not sure if it's possible to branch based on the capabilities of the serializer, but I'm not sure how else to approach such a problem.

It's not. Like I said in the first post, the trait impl doesn't know which format it's being used with.

We could have two impls for each type gated by a feature (#[cfg(feature = "serde_bincode")] vs not), but using the types with the API server requires the JSON-compatible serialization anyway, so your use case would require both of them active at the same time which is impossible.

So the only option I can think of is making new traits similar to serde's, and having their impls not skip optional fields. Just with this you won't be able to serialize them with serde serializers (unless you use with / remote), but that can be solved by also providing something like struct AltSerialize<T>(T); that impls the serde traits in terms of T's not-serde traits.

@rrichardson
Copy link
Author

It looks like serde-rs/serde#2012 would give a path forward, that is actually pretty elegant. skip_field and skip_serializing* just becomes a suggestion, and it lets the encoding implementation decide whether or not to actually skip.

@Arnavion
Copy link
Owner

Actually, I forgot that SerializeStruct has skip_field in the first place. Currently the impls just ignore None fields, but they could be changed to call skip_field for them.

However even that won't help you with either bincode or rmp currently, because both of those don't override the default skip_field and thus ignore it. Perhaps you can ask them if it would be possible for them to implement skip_field in some way into the bytestream so that deserialization can pick it out as a None.

@rrichardson
Copy link
Author

However even that won't help you with either bincode or rmp currently, because both of those don't override the default skip_field and thus ignore it. Perhaps you can ask them if it would be possible for them to implement skip_field in some way into the bytestream so that deserialization can pick it out as a None.

Right, bincode would need to be updated to process the field even when told to skip. Presently it cannot, though, because skip_field is not passed enough data to call serialize_field instead.
With the proposed change, an encoding author would have the data needed to either blindly forward to serialize_field, or use the data to make some other decision.
It is a breaking change, though, so it might have to wait until serde 2.0?
¯_(ツ)_/¯

@Arnavion
Copy link
Owner

Might still work for rmp though?

@Arnavion
Copy link
Owner

It might work for bincode too, actually.

To be clear, I'm saying their respective impls of skip_field can act like the caller called serialize_none. I can change k8s-openapi to always call serialize_some or skip_field at serializing time. At deserializing time k8s-openapi would deserialize as an Option, so that would work already.

This scheme would work with both serde_json and for bincode and rmp, as long as bincode and rmp override skip_field to act the same as serialize_none.

@Arnavion
Copy link
Owner

Here's what I'm proposing from k8s-openapi's side. See Pod's impls for an example.

@rrichardson
Copy link
Author

I think that makes sense. I think that this is a fairly pervasive problem in the serde ecosystem, I can test it with a modified bincode, and, if it works like we hope, propose an additional config option that would swap serialize_none for skip_field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants