introduce base64 encoding and decoding for all nodes#299
Merged
Conversation
...and use the dependency exposed by rust-bitcoin, rather than defining our own, so that we don't have two versions that we need to keep in sync.
…ing node In NamedNode<Construct> and NamedNode<Commit> we had set the sharing ID to be the name of the node. The idea here was that if two nodes had different names in the human-readable encoding, even if they were identical, that they would not be shared since they were "conceptually different". This idea was at best confusing, and at worst dangerous. "Sharing" is a property of the bit encoding, not the human-readable encoding, and pretending otherwise is just wrong. I'm not actually sure what I intended by making the sharing ID be the name. Notably, in the bit encoding we didn't actually use the sharing ID! We override it in NamedCommitNode::encode, where we first convert the named commit node to a normal commit node before encoding it. So AFAICT this confused notion of sharing ID was never actually used. Some evidence for this is that this commit doesn't break any unit tests.. However, in the next commit I will make encoding generic, and I won't be able to use the "convert to a commitnode" hack anymore. Anyway this is all moot since the human-readable encoding has been almost completely superceded by SimplicityHL.
Right now we have encoding methods on RedeemNode and CommitNode. They are not generic because RedeemNode has witness data while CommitNode does not. This is a problem because it means that we cannot use the bit encoding to implement generic fmt::Display impls. Instead we iterate and implement fmt::Display in terms of a dump of each of the nodes in the program, with newlines and everything, which is not really useful as a Display impl. It's also an API annoyance; it means that we have to re-implement the encoding method on every single node type individualy; there is no way to encode "just the program" for redeem nodes, there are no encoding functions for ConstructNode or NamedConstructNode even though it's totally possible to encode these by calling encode_program directly, etc. (I think it was a semi-deliberate design choice not to foreground the encoding of ConstructNodes. But this is dumb. It's no less confusing to encode ConstructNodes than to encode CommitNodes; in both cases it is impossible to share "correctly" but there is still a natural way to do so.) While we are at it, stop taking a BitWriter and just take a generic Write. There is no circumstance where the user should be bit-encoding a program into the middle of another bitstream, since we split the witness data off from the program data. (In fact, arguably there is no circumstance where a user needs to work with bitwriters *at all*.) We weren't even using it nicely as a BitWriter, since we'd always call `flush` after writing. Just acknowledge that we will always write a whole number of bytes by taking an io::Write. Also while we're at it, return the count of program and witness bits separately when encoding to a generic writer. We were returning the sum, which since the split between witness and program, is a basically useless quantity. (Not that we were using the return value of this function anywhere..) f deal with the "sharing id" of namedcommitnode being the name
Collaborator
Author
|
cc @canndrew this is the last PR before we can start cutting releases |
Generally speaking users don't want the "full decode" representation of arbitrary nodes except for debugging, and in that case they probably want a more structured representation. It's certainly not what you'd expect for the string representation. Simplicity programs are canonically encoded in base 64, although there is not a canonical encoding for witness data. I decided to use hex since this makes programs and witnesses visually distinct and also it makes the witness data convenient for pasting into json blobs for witness scripts. The next commit will add some unit tests. Then we will add string parsing and some round-trip fuzz tests.
We had repeated some bit-decoding logic.
These tests have been an albatross around the neck of this library for too long. They are the reason we have the `decode_natural` function even though it is called exactly once, from its thin wrapper `read_natural` that has a more sensible API.
5823a0e to
eb362c2
Compare
Several changes here: * Split out the natural-number parsing errors into `DecodeNaturalError` * Elaborate `BadIndex` to give the actual and maximum number read * Make `EarlyEndOfStreamError` non-exhaustive to prevent direct construction, and store it directly in DecodeNaturalError rather than replacing it with a dummy variant. * Drop the `TooManyNodes` error variant. This only existed to prevent allocating huge vecs. Just cap the value passed to `Vec::with_capacity` instead. * Drop the `EmptyProgram` error variant which was impossible to hit. * Inline `decode_natural` into `read_natural`; make this generic over the integer types to eliminate casts to/from u32; expand the existing unit test to cover excess values.
Essentially, DecodeError is used when constructing a program from the bit encoding and FinalizeError is used when constructing a program using the library. Along the way, tighten up some error returns for functions that return e.g. crate::Error when they actually only ever yield type errors. We drop the Error::Policy variant which was never used. The old Error enum now has a single variant, which annoyingly is used by the Haskell-generated code so we can't remove it without roundtripping through upstream. Will try to do this before the next release but it's not a huge priority. In the next commit we'll add a ParseError which wraps DecodeError and adds some string-encoding-related variants.
It bugs me to have two versions of hex-conservative. Our lockfile retains hex-conservative 0.1, because our fuzztests depend on an old released version of this crate.
We cannot use the std from_str trait because these all construct an Arc<Self> rather than a Self. And we cannot implement this generically on Node because the rules for decoding are different for each node type (we have progressively tighter correctness requirements, and with RedeemNode we need extra witness data).
eb362c2 to
96839b7
Compare
Collaborator
|
ACK 96839b7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This also refactors our bit encoding code to no longer be specific to
CommitNodeand RedeemNode; it introduces some string display methods andfrom_str` methods which are able to parse hex witness data alongside base64 programs.