introduce base64 encoding and decoding for all nodes by apoelstra · Pull Request #299 · BlockstreamResearch/rust-simplicity

apoelstra · 2025-07-29T03:11:12Z

This also refactors our bit encoding code to no longer be specific to CommitNode and RedeemNode; it introduces some string display methods and from_str` methods which are able to parse hex witness data alongside base64 programs.

...and use the dependency exposed by rust-bitcoin, rather than defining our own, so that we don't have two versions that we need to keep in sync.

…ing node In NamedNode<Construct> and NamedNode<Commit> we had set the sharing ID to be the name of the node. The idea here was that if two nodes had different names in the human-readable encoding, even if they were identical, that they would not be shared since they were "conceptually different". This idea was at best confusing, and at worst dangerous. "Sharing" is a property of the bit encoding, not the human-readable encoding, and pretending otherwise is just wrong. I'm not actually sure what I intended by making the sharing ID be the name. Notably, in the bit encoding we didn't actually use the sharing ID! We override it in NamedCommitNode::encode, where we first convert the named commit node to a normal commit node before encoding it. So AFAICT this confused notion of sharing ID was never actually used. Some evidence for this is that this commit doesn't break any unit tests.. However, in the next commit I will make encoding generic, and I won't be able to use the "convert to a commitnode" hack anymore. Anyway this is all moot since the human-readable encoding has been almost completely superceded by SimplicityHL.

Right now we have encoding methods on RedeemNode and CommitNode. They are not generic because RedeemNode has witness data while CommitNode does not. This is a problem because it means that we cannot use the bit encoding to implement generic fmt::Display impls. Instead we iterate and implement fmt::Display in terms of a dump of each of the nodes in the program, with newlines and everything, which is not really useful as a Display impl. It's also an API annoyance; it means that we have to re-implement the encoding method on every single node type individualy; there is no way to encode "just the program" for redeem nodes, there are no encoding functions for ConstructNode or NamedConstructNode even though it's totally possible to encode these by calling encode_program directly, etc. (I think it was a semi-deliberate design choice not to foreground the encoding of ConstructNodes. But this is dumb. It's no less confusing to encode ConstructNodes than to encode CommitNodes; in both cases it is impossible to share "correctly" but there is still a natural way to do so.) While we are at it, stop taking a BitWriter and just take a generic Write. There is no circumstance where the user should be bit-encoding a program into the middle of another bitstream, since we split the witness data off from the program data. (In fact, arguably there is no circumstance where a user needs to work with bitwriters *at all*.) We weren't even using it nicely as a BitWriter, since we'd always call `flush` after writing. Just acknowledge that we will always write a whole number of bytes by taking an io::Write. Also while we're at it, return the count of program and witness bits separately when encoding to a generic writer. We were returning the sum, which since the split between witness and program, is a basically useless quantity. (Not that we were using the return value of this function anywhere..) f deal with the "sharing id" of namedcommitnode being the name

apoelstra · 2025-07-29T03:11:32Z

cc @canndrew this is the last PR before we can start cutting releases

Generally speaking users don't want the "full decode" representation of arbitrary nodes except for debugging, and in that case they probably want a more structured representation. It's certainly not what you'd expect for the string representation. Simplicity programs are canonically encoded in base 64, although there is not a canonical encoding for witness data. I decided to use hex since this makes programs and witnesses visually distinct and also it makes the witness data convenient for pasting into json blobs for witness scripts. The next commit will add some unit tests. Then we will add string parsing and some round-trip fuzz tests.

We had repeated some bit-decoding logic.

These tests have been an albatross around the neck of this library for too long. They are the reason we have the `decode_natural` function even though it is called exactly once, from its thin wrapper `read_natural` that has a more sensible API.

Several changes here: * Split out the natural-number parsing errors into `DecodeNaturalError` * Elaborate `BadIndex` to give the actual and maximum number read * Make `EarlyEndOfStreamError` non-exhaustive to prevent direct construction, and store it directly in DecodeNaturalError rather than replacing it with a dummy variant. * Drop the `TooManyNodes` error variant. This only existed to prevent allocating huge vecs. Just cap the value passed to `Vec::with_capacity` instead. * Drop the `EmptyProgram` error variant which was impossible to hit. * Inline `decode_natural` into `read_natural`; make this generic over the integer types to eliminate casts to/from u32; expand the existing unit test to cover excess values.

Essentially, DecodeError is used when constructing a program from the bit encoding and FinalizeError is used when constructing a program using the library. Along the way, tighten up some error returns for functions that return e.g. crate::Error when they actually only ever yield type errors. We drop the Error::Policy variant which was never used. The old Error enum now has a single variant, which annoyingly is used by the Haskell-generated code so we can't remove it without roundtripping through upstream. Will try to do this before the next release but it's not a huge priority. In the next commit we'll add a ParseError which wraps DecodeError and adds some string-encoding-related variants.

It bugs me to have two versions of hex-conservative. Our lockfile retains hex-conservative 0.1, because our fuzztests depend on an old released version of this crate.

We cannot use the std from_str trait because these all construct an Arc<Self> rather than a Self. And we cannot implement this generically on Node because the rules for decoding are different for each node type (we have progressively tighter correctness requirements, and with RedeemNode we need extra witness data).

apoelstra

On 96839b7 successfully ran local tests

canndrew · 2025-07-29T12:21:47Z

ACK 96839b7

apoelstra added 3 commits July 29, 2025 03:10

base64: move dep from simpcli to main crate

555ad7c

...and use the dependency exposed by rust-bitcoin, rather than defining our own, so that we don't have two versions that we need to keep in sync.

apoelstra added 4 commits July 29, 2025 03:16

node: add unit tests for commit/redeem string format

07c59cc

node: reuse ConstructNode decode code in other node's decode impls

6a959ba

We had repeated some bit-decoding logic.

decode: rewrite decode_natural unit tests

960cd4b

These tests have been an albatross around the neck of this library for too long. They are the reason we have the `decode_natural` function even though it is called exactly once, from its thin wrapper `read_natural` that has a more sensible API.

apoelstra force-pushed the 2025-07/base64 branch from 5823a0e to eb362c2 Compare July 29, 2025 03:17

apoelstra added 4 commits July 29, 2025 03:34

bump hex-conservative dep from 0.1 to 0.2

64553b7

It bugs me to have two versions of hex-conservative. Our lockfile retains hex-conservative 0.1, because our fuzztests depend on an old released version of this crate.

apoelstra force-pushed the 2025-07/base64 branch from eb362c2 to 96839b7 Compare July 29, 2025 03:35

apoelstra commented Jul 29, 2025

View reviewed changes

apoelstra merged commit 082e104 into master Jul 29, 2025
39 checks passed

apoelstra deleted the 2025-07/base64 branch July 29, 2025 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce base64 encoding and decoding for all nodes#299

introduce base64 encoding and decoding for all nodes#299
apoelstra merged 11 commits intomasterfrom
2025-07/base64

apoelstra commented Jul 29, 2025

Uh oh!

apoelstra commented Jul 29, 2025

Uh oh!

apoelstra left a comment

Uh oh!

canndrew commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

apoelstra commented Jul 29, 2025

Uh oh!

apoelstra commented Jul 29, 2025

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

canndrew commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants