Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make proto map serialization canoncalized in our codegen #12816

Open
3 tasks
ValarDragon opened this issue Aug 4, 2022 · 7 comments
Open
3 tasks

Make proto map serialization canoncalized in our codegen #12816

ValarDragon opened this issue Aug 4, 2022 · 7 comments
Labels
C: Proto Proto definition and proto release T:feature-request

Comments

@ValarDragon
Copy link
Contributor

ValarDragon commented Aug 4, 2022

Summary

Right now we have made the decision to not use protobuf maps in queries, state machine entries, etc. This has caused massive amounts of cognitive overhead and extra work for us to rebuild bad maps out of multiple arrays, all over the codebase, massively increasing our complexity surface. (And integration pains)

This was done out of concern over there being multiple valid encodings in two areas:

  • The in-memory representation of a golang hashmap is volatile - it will be quite different between systems.
  • The protobuf wire format of a serialization makes no guarantee on the ordering of the keys.

I currently don't think blanket not supporting maps makes sense, unless theres details of protobuf encoding canonicalization that I'm missing.

For protobuf input (sdk.Msg, query requests) I agree that we must not allow maps, because clients in other languages could serialize them in an unordered non canonical fashion. However for data fields that are internal to the state machine, and responses from the state machine (msg responses, query responses, state values, genesis proto), I think this was an ill founded decision. We could instead make them serialized in a canonical ordering.

For the state machine, such use cases are likely better served by ORM. But for query responses (even with ORM), and genesis values, a direct map should be useful.

I suggest we take the fact that we codegen our protobuf serialization and deserialization, to just sort golang maps by key before serializing into protobuf map representations, and also ensure all maps are sorted by key in their wire format when deserializing. Thus we can get the UX of using maps in our golang code, and in client code working with the responses, without non-detrerminism concerns.

Proposal

  • In our protobuf codegen, change the logic for unmarshalling a map, to get also gather the keys in sequence into a slice, and ensure the slice is sorted.
  • In our protobuf codegen, change the logic for marshalling into a protobuf map to ensure that keys are ordered in the serialization
  • Start using protobuf maps in places, likely in genesis structs and query responses
@aaronc
Copy link
Member

aaronc commented Aug 4, 2022

The new codegen, cosmos-proto (aka pulsar), does this already. Now that go has generics we should consider adopting a sorted tree map type (instead of the built in hash map) and maybe integrate that with the codegen.

@ValarDragon
Copy link
Contributor Author

Ah sweet! Whats timeline for pulsar, and migration plan? Is it a sub 6 month situation?

Agreed we should be using a generics based Tree map. Jack and Sunny pointed out that https://github.com/tidwall/btree supports generics now!

@08d2
Copy link
Contributor

08d2 commented Aug 9, 2022

The Map type in both Protobuf and Go are defined, by their respective specifications, to be un-ordered. Even if you carefully encode a Protobuf Map {a: 1, b: 2} as map a 1 b 2 on the wire, you cannot assert that the receiver will see the same representation. Client libraries, middle-boxes, anything at all, can transform that on-the-wire representation to map b 2 a 1 while remaining spec-compliant. This behavior can easily change between minor or patch versions of any of these intermediating actors.

If you have code which needs key-val pairs in a deterministic order, then you must use something other than a Protobuf Map type to encode it on the wire, and/or something other than a stdlib Go map to represent it in memory. If Protobuf representations of types are meant to be deterministic, then Protobuf Maps are necessarily off-limits in .proto definitions. If users expect deterministic iteration order of key-val sets, then Go maps are necessarily off-limits in Go code.

@aaronc
Copy link
Member

aaronc commented Aug 10, 2022

We care primarily about deterministic serialization of maps in consensus and that is a solvable problem. We can of course define a spec that requires that of clients, but in general we don't care whether clients use deterministic bytes for signing because we have other solutions for that.

@ValarDragon The SDK team doesn't have a timeline for picking up cosmos proto again yet afaik. Maybe @marbar3778 can comment

@tac0turtle
Copy link
Member

The SDK team doesn't have a timeline for picking up cosmos proto again yet afaik. Maybe @marbar3778 can comment

The goal is to make it part of our q4 OKRs. Would love to see the work being used before the end of the year, I think this is realistic with the work scopes

@08d2
Copy link
Contributor

08d2 commented Aug 11, 2022

We care primarily about deterministic serialization of maps in consensus and that is a solvable problem.

As long as transactions, and all other user-submitted and signable data, cannot contain maps, sounds good.

@tac0turtle
Copy link
Member

Something similar to this is supported in gogoproto as well

An additional message-level option `stable_marshaler` (and the file-level option `stable_marshaler_all`) exists which causes the generated marshalling code to behave deterministically. Today, this only changes the serialization of maps; they are serialized in sort order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Proto Proto definition and proto release T:feature-request
Projects
None yet
Development

No branches or pull requests

4 participants