Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human-readable encoding 1: serialization and README #152

Merged
merged 8 commits into from
Jul 13, 2023

Conversation

apoelstra
Copy link
Collaborator

This PR introduces the human-readable serialization of Simplicity programs. Right now it will serialize disconnect nodes as having only one child (and the next PR will parse such programs). Later, when we introduce typed holes, we will correct this so that disconnect nodes always have two children (the rightmost one being a named hole). But for now disconnect is more-or-less unusable. To avoid too much complexity at once I think this is a reasonable order of operations.

It also slightly changes the syntax from the last PR to allow ' in symbol names. This is so I can do stuff like x' := pair x unit or whatever, where I use the prime symbol to indicate a slightly tweaked version of a different expression.

@apoelstra
Copy link
Collaborator Author

I have a crappy CLI tool that I've been mostly interacting with this code through. If you'd like I can clean that up a little and add it to this PR, so you can feed it hex-encoded Simplicity programs and get the serialization.

Could also finally switch to base64..

@sanket1729
Copy link
Member

I started reviewing this PR. Only read the README so far. Will play around with this over the next couple days.

Any CLI tool would be helpful. it doesn't have to be cleaned up. I can navigate through the CLI code to figure out stuff

@apoelstra
Copy link
Collaborator Author

@sanket1729 you may also be interested in the full branch (which I keep reasonably up to date) https://github.com/BlockstreamResearch/rust-simplicity/tree/2023-06--simplang which includes the CLI tool and also parsing logic.

@apoelstra apoelstra force-pushed the 2023-07--human-1 branch 2 times, most recently from 9871f13 to 2459f56 Compare July 9, 2023 14:56
@apoelstra
Copy link
Collaborator Author

Updated README and also removed the name_map from Program because I don't use it yet and it was complicating some later code. Will restore it in a later PR.

Comment on lines +119 to +140
The name `main` is special for several reasons:
* An expression named `name` implicitly has the type ascription `1 -> 1`. That is, it must always be a program.
* To generate a commitment-time program from an expression, it must be called `main`.
* Type ascriptions for `main` and its children are enforced **after** type inference has completed, so they act as sanity checks but cannot change the output of type inference. For other expressions, type ascriptions are enforced **before** and may guide inference.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aed42d0: We should define what a Simplicity program is, and also what a (semantic) Simplicity expression is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but probably not in the README for the human-readable encoding.

ignored by the parser.

A HOLE is the literal `?` followed by a NAME. It indicates an expression that has
yet to be defined. Holes have a different namespace than other names.
Copy link
Collaborator

@uncomputable uncomputable Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yet to be defined. Holes have a different namespace than other names.
yet to be defined. Holes have a different namespace than other expression names.

@uncomputable
Copy link
Collaborator

uncomputable commented Jul 10, 2023

Resolved. This error message confused me because bound 1 looks like bound one.

diff --git a/src/types/mod.rs b/src/types/mod.rs
index cdf106b..1ddb37d 100644
--- a/src/types/mod.rs
+++ b/src/types/mod.rs
@@ -128,7 +128,7 @@ impl fmt::Display for Error {
             } => {
                 write!(
                     f,
-                    "failed to apply bound {} to existing bound {}: {}",
+                    "failed to apply bound `{}` to existing bound `{}`: {}",
                     new_bound, existing_bound, hint,
                 )
             }

Copy link
Collaborator

@uncomputable uncomputable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2459f56: Looks good. There are some comments. The biggest question is what "expression" should mean.

@apoelstra apoelstra force-pushed the 2023-07--human-1 branch 2 times, most recently from 6450fbb to 049ba0c Compare July 10, 2023 19:26
@apoelstra
Copy link
Collaborator Author

Force pushed to

  • fix FIXME in the serialization logic for constant words
  • update README in a couple places, most notably to change "definition" to "named expression"

@apoelstra
Copy link
Collaborator Author

  • Got rid of the confusingly-named Program type in favor of Forest.
  • Added a CI job which just runs cargo test on the simpcli crate.

No other changes.

Copy link
Collaborator

@uncomputable uncomputable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

851928f: Looks good. I think we should split "named expressions" into "definitions" and "type bounds" / "declarations". We should also emphasize that there are three namespaces: one for expressions, one for holes and one for types.

apoelstra and others added 8 commits July 12, 2023 15:50
This function isn't actually used anywhere, but it will be, and this
will result in incorrect CMR computations.
Makes it clear that `1` and `2` refer to types rather than being indices
or something.
Eventually we should make a bunch of test vectors and run integration
tests on them, but for now this is fine.
I had to rearrange the parsing logic because help doesn't expect any
arguments. The lines after let first_arg = ... are for commands with
at least one argument.
@apoelstra
Copy link
Collaborator Author

Changed :: to : and updated README again.

We have another TODO to change # to #{...} but I think we can do that in a followup PR. Up to you.

Copy link
Collaborator

@uncomputable uncomputable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0150e7c

@uncomputable uncomputable merged commit 68d726a into master Jul 13, 2023
@uncomputable uncomputable deleted the 2023-07--human-1 branch July 13, 2023 12:27
apoelstra added a commit that referenced this pull request Jul 18, 2023
b608ec1 Cargo.toml: patch for rust-lang/rust#110475 (Andrew Poelstra)
ab184db human_encoding: use ErrorSet when finalizing NamedConstructNode (Andrew Poelstra)
70b989a human_encoding: introduce Error and ErrorSet (Andrew Poelstra)
9d379ce human_encoding: introduce NamedConstructNode (Andrew Poelstra)
64c6ea4 human_encoding: change assertl/assertr syntax (Andrew Poelstra)
2252354 types: replace 1 + A with A? in display (Andrew Poelstra)
8193fd2 types: don't put parentheses around display of outermost type (Andrew Poelstra)
2adb0a8 node: some more utility methods on Inner (Andrew Poelstra)
a516651 clippy: fix some new lints (Andrew Poelstra)
07ac274 analysis: feature-gate an import (Andrew Poelstra)

Pull request description:

  After/during review of #152 we decided to change the syntax in a couple ways:

  * We introduced the `?` syntax in types where `A?` is shorthand for `1 + A`
  * We changed the CMR syntax from `#expr` to `#{expr}`
  * We added a new `#<64 hex chars>` format for directly specifying CMRs without giving an expression

  This PR also includes some cleanups and new types and utility methods that will be used by the parser. Let me know if you want me to move any to the next PR, or if you'd like me to pull the parser into this one.

  Fixes #157

ACKs for top commit:
  uncomputable:
    ACK b608ec1 Feel free to address the nit.

Tree-SHA512: 91488059ab119d724cda84065256fd7941f09f39ee91b6f7b7453bc65a8b201164059a0cfc456a091916d920f2a1757caf9e2d613ec11cf5c607ebb6a620cac7
uncomputable added a commit that referenced this pull request Aug 1, 2023
cb59702 ci: bump fuzzing rustc to 1.64.0 (Andrew Poelstra)
9f5eaa3 simpcli: add "assemble" option (Andrew Poelstra)
53ee1ec human_encoding: use first (not last) name when parsing chains of references (Andrew Poelstra)
d70f5c2 human_encoding: add circular reference unit test (Andrew Poelstra)
5d0544f human_encoding: forbid multiply-connected witness or disconnect nodes (Andrew Poelstra)
f4ea62f commit: use text encoding in unit tests (Andrew Poelstra)
b63935f human_encoding: correctly share during encoding (Andrew Poelstra)
2816c2d human_encoding: add parsing support (Andrew Poelstra)
c3b86f5 bump MSRV to 1.58 (temporarily?); add santiago dependency (Andrew Poelstra)

Pull request description:

  This uses the `santiago` parser generator, which requires rustc 1.58. Bump the MSRV to 1.58, though we will revert this later when we replace the generated parser with a more efficient ad-hoc one.

  Since #152 added serialization, parsing gives us the ability to round-trip. So update many unit tests and add a fuzz test to test this.

ACKs for top commit:
  uncomputable:
    ACK cb59702

Tree-SHA512: 5acb1e22797c70b4f565343d849e7f6703dc125760baf3a3426dbba8f4e2c2f7cefa4af6a2db359398d8b1a7c018dd3f13ae3c5dca736efa81e24cbaf7a5b866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants