-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make proto map serialization canoncalized in our codegen #12816
Comments
The new codegen, cosmos-proto (aka pulsar), does this already. Now that go has generics we should consider adopting a sorted tree map type (instead of the built in hash map) and maybe integrate that with the codegen. |
Ah sweet! Whats timeline for pulsar, and migration plan? Is it a sub 6 month situation? Agreed we should be using a generics based Tree map. Jack and Sunny pointed out that https://github.com/tidwall/btree supports generics now! |
The Map type in both Protobuf and Go are defined, by their respective specifications, to be un-ordered. Even if you carefully encode a Protobuf Map If you have code which needs key-val pairs in a deterministic order, then you must use something other than a Protobuf Map type to encode it on the wire, and/or something other than a stdlib Go map to represent it in memory. If Protobuf representations of types are meant to be deterministic, then Protobuf Maps are necessarily off-limits in .proto definitions. If users expect deterministic iteration order of key-val sets, then Go maps are necessarily off-limits in Go code. |
We care primarily about deterministic serialization of maps in consensus and that is a solvable problem. We can of course define a spec that requires that of clients, but in general we don't care whether clients use deterministic bytes for signing because we have other solutions for that. @ValarDragon The SDK team doesn't have a timeline for picking up cosmos proto again yet afaik. Maybe @marbar3778 can comment |
The goal is to make it part of our q4 OKRs. Would love to see the work being used before the end of the year, I think this is realistic with the work scopes |
As long as transactions, and all other user-submitted and signable data, cannot contain maps, sounds good. |
Something similar to this is supported in gogoproto as well
|
Summary
Right now we have made the decision to not use protobuf maps in queries, state machine entries, etc. This has caused massive amounts of cognitive overhead and extra work for us to rebuild bad maps out of multiple arrays, all over the codebase, massively increasing our complexity surface. (And integration pains)
This was done out of concern over there being multiple valid encodings in two areas:
I currently don't think blanket not supporting maps makes sense, unless theres details of protobuf encoding canonicalization that I'm missing.
For protobuf input (sdk.Msg, query requests) I agree that we must not allow maps, because clients in other languages could serialize them in an unordered non canonical fashion. However for data fields that are internal to the state machine, and responses from the state machine (msg responses, query responses, state values, genesis proto), I think this was an ill founded decision. We could instead make them serialized in a canonical ordering.
For the state machine, such use cases are likely better served by ORM. But for query responses (even with ORM), and genesis values, a direct map should be useful.
I suggest we take the fact that we codegen our protobuf serialization and deserialization, to just sort golang maps by key before serializing into protobuf map representations, and also ensure all maps are sorted by key in their wire format when deserializing. Thus we can get the UX of using maps in our golang code, and in client code working with the responses, without non-detrerminism concerns.
Proposal
The text was updated successfully, but these errors were encountered: