Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Points and scalars are serialized as 40 bytes (instead of 32) #265

Closed
trevp opened this issue Jul 12, 2019 · 11 comments
Closed

Points and scalars are serialized as 40 bytes (instead of 32) #265

trevp opened this issue Jul 12, 2019 · 11 comments

Comments

@trevp
Copy link

trevp commented Jul 12, 2019

RistrettoPoint, CompressedRistretto, and Scalar are serialized into 40 bytes (instead of 32) when using serde with efficient encodings like bincode.

This is because they are handled as variable-length "bytes" in serde terminology, instead of fixed-length "tuples", so are given a 8-byte length field.

This is fairly easy to fix using serialize_tuple / deserialize_tuple (I've done so in a local branch; could submit a PR), but I don't know what fixes are possible given stability guarantees.

@hdevalence
Copy link
Contributor

I'm not sure that we can fix this without breaking API stability, but I recall that the overhead is reduced when using other encodings (like CBOR, which I think uses as little as 1 byte).

I think that some amount of overhead is inevitable when using Serde, because Serde operates on self-describing data structures, not ones specified in advance. If the data format is specified elsewhere, it should be possible to use the to/from byte conversion methods to achieve a precise and compact encoding for a more complex object like a proof or a signature (we do this in Bulletproofs). This aggregate can then implement the Serde traits, using its own encoding internally, instead of using a derived encoding of all of its components that may not be as efficient.

@trevp
Copy link
Author

trevp commented Jul 16, 2019

I think that some amount of overhead is inevitable when using Serde, because Serde operates on self-describing data structures, not ones specified in advance.

I don't think that's true, Serde works great with non-self-describing formats like bincode.

https://serde.rs/impl-deserialize.html

Indeed, if points and scalars serialize to 32 bytes it's easy to have structs consisting of points and scalar and use bincode to #derive efficient, packed representations of signatures, proofs, etc.

To me, it seems better to serialize crypto objects like (keys, signatures, proofs) into fixed-length, non-self-describing, efficiently-packed blobs.

These blobs can then be embedded into (JSON, Protobufs, XML, etc), keeping a separation between your crypto format and code and your application protocol. Serializing crypto directly into self-describing formats is usually a worse idea (think ASN-1 encoded DSA sigs, or XML-DSIG).

If the data format is specified elsewhere, it should be possible to use the to/from byte conversion methods to achieve a precise and compact encoding for a more complex object like a proof or a signature (we do this in Bulletproofs).

Yes, one could write serialize/deserialize code by hand, but that defeats the point of serde. If points and scalars serialize to 32 bytes, then you just #derive these implementations instead of writing them.

@tarcieri
Copy link
Contributor

I think Serde is agnostic to the self-describing vs "schema" debate, although generally using it with the latter has been somewhat cumbersome (and squaring it with a format like Protos is still an unsolved challenge).

It seems like the problem here is the serialization is treating fixed-width fields as variable-width. I think it'd be better for the serialization to be more specific in this case (i.e. what @trevp wants). For self-describing formats which lack a schema language to describe the precise length in advance, this shouldn't be a breaking change. It would be a breaking change for formats like bincode.

My suggestion would be if you release a curve25519-dalek 2.x for other reasons (e.g. RNG APIs), that would be a good time to address this issue too.

@hdevalence
Copy link
Contributor

This would be good to fix, but I'm not sure there's an easy way to do so without breaking compatibility. One option would be to have a Cargo feature that changes the serialization format, but because Cargo features are additive it seems like a recipe for future pain (e.g., one crate enables the feature, the other doesn't, now the code breaks).

@tarcieri
Copy link
Contributor

tarcieri commented Jul 31, 2019

I guess it depends on what sort of compatibility you want. If you don't mind changing the on-wire serialization, it may still be compatible with the existing deserializers with no other changes.

That's all going to depend on specific formats, I'm afraid, but it's entirely possible for all practical intents and purposes it could be backwards compatible.

Seems like it'd need testing with all serialization formats you intend to support.

@hdevalence
Copy link
Contributor

Hi all, I think the best path forward for this is:

  1. Check that using Serde tuples produces acceptable results on backends we care about (bincode, others? i.e., do a survey of a few commonly used ones);

  2. Release a 2.0.0 version with fixed serialization, updated rand_core API, and Remove build.rs? #217, and advise in the changelog that it is a compatible upgrade except for users of the Serde traits, due to the change in data model.

I will do this next week.

@hdevalence
Copy link
Contributor

Missed my window to do maintenance work last Friday due to other work but I am hoping to get to this tomorrow.

@goldenMetteyya
Copy link

goldenMetteyya commented Oct 10, 2019

Hi all, I think the best path forward for this is:

  1. Check that using Serde tuples produces acceptable results on backends we care about (bincode, others? i.e., do a survey of a few commonly used ones);
  2. Release a 2.0.0 version with fixed serialization, updated rand_core API, and Remove build.rs? #217, and advise in the changelog that it is a compatible upgrade except for users of the Serde traits, due to the change in data model.

I will do this next week.

@hdevalence Can we add rust 2018 upgrade as well?
And potentially @tarcieri can finish zeroize V1

If 2.0.0 is being cut might as well make the additions:)

@hdevalence
Copy link
Contributor

I think neither of those are breaking changes, so they can be done independently.

@hdevalence
Copy link
Contributor

I got stuck on a large ZF project and kept missing my weekly window to get this done, but now that that project is complete I've fixed this in #297.

@hdevalence
Copy link
Contributor

The fix can be tested in 2.0.0-alpha.0, closing this issue for now.

pinkforest pushed a commit to pinkforest/curve25519-dalek that referenced this issue Jun 27, 2023
* Added and cleaned up some verification docs

Co-authored-by: Michael Rosenberg <michael@mrosenberg.pub>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants