Skip to content

Commit

Permalink
Rewrite wasmparser's module parser and validator (#40)
Browse files Browse the repository at this point in the history
* Rewrite wasmparser's module parser and validator

This commit is a major refactoring (bordering on rewrite) of
wasmparser's top-level wasm module parsing and validation. Lots of code
from the previous validator has been reused but it's moved around so
much that it looks more like a rewrite. At a high level this commit
removes the old `Parser`, `ModuleReader`, and `ValidatingParser` types.
These are replaced with a new `Parser`, `Validator`, and
`FuncValidator`. There are a number of points motivating this rewrite:

* Parsing a module currently requires the entire module to be resident
  in-memory. There is no way to incrementally parse a module (for
  example as it arrives from the network).

* Validating a module is an all-or-nothing operation. This, like parsing
  above, means it's not friendly for incrementally acquired wasm
  binaries.

* Validation does not expose its results, nor does it provide a way
  for doing so. This means that you can validate a wasm blob in its
  entirety but you can't retroactively ask what the type of function 3
  was. More concretely, if you're implementing a code translator you
  have to track a lot of state the validator was already keeping for you.

* Validation did not easily provide the ability to parse, validate, and
  possibly compile wasm functions in parallel. The single monolithic
  `Validator` would be difficult to interleave application-specific
  details into, such as parallelism.

These issues are all currently deep architectural issues in how code is
organized today, so the approach taken in this commit is to rewrite
these as opposed to adding them on as a feature. Much of this work was
motivated after recent refactorings for the module linking proposal. The
amount of bookeeping needed to keep track of type aliases and such was a
big enough job for validation that I didn't want to have to redo it all
again in wasmtime later on!

The new `Parser` and `Validator` types are designed to be used both in
high-level and low-level contexts. Handling a WebAssembly module
efficiently can often involve a lot of moving pieces at runtime which
are very application-specific, and it seems like overkill or impossible
at worst to try to encapsulate all these patterns in wasmparser. Instead
the intention here is that the lowest level bits are able to be reused
regardless of how you're parsing wasm, and the higher level bits are as
straightforward to reimplement and use as possible. This ideally means
that if you can't use some high-level conveniences in wasmparser it
should be obvious how you can rewrite them locally to work in your own
application.

Detailed design of the new APIs added here is best learned by reading
the rustdoc documentation, the examples, or tests. At a high-level
though the way these new types operate are:

* `Parser` is fed chunks of data, and it will return one chunk of parsed
  data which is a view into the input buffer. If it can't parse a chunk
  then it will tell the application it needs to wait for more data to be
  available.

* Most sections are parsed as-a-whole, meaning that they need to be
  entirely resident in memory before being parsed. For example this
  rewrite does not support incrementally parsing the type section. This
  is done for ease with the expectation that most sections are
  reasonably quite small and have no reason to be incrementally
  processed beyond the section-at-a-time level.

* `Parser`, however, will allow incremental downloads of the code and
  module code sections. This means that it supports parsing a singular
  function or a singular module at a time. This allows functions to be
  validated/processed immediately as they're received, without having to
  wait for the next function to be available.

* The `Validator` type receives as input the payloads returned by
  `Parser`. The `Validator` type is intended to be persistently living
  adjacent to a `Parser` which it receives input from.

* Validation is intended to eventually expose information about the
  module and operators as necessary. For example methods can be added to
  `Validator` to query what types are and the results of operators in
  functions. It's envisioned that you'd use a `Parser` to parse a module
  and then `Validator` would be used before you execute
  application-specific code per-section/item.

At this time operator/function validation is not changed. The operator
validator is only very lightly touched, but otherwise it's expected that
this will be a future refactoring. I would like to effectively add a
method-per-opcode to `FuncValidator` so engines can, for example,
validate a `call` instruction, get back the type of the call, and then
use that type to iterate over the types on the stack and return values.
None of this is supported yet, but I'm hoping to make, for example,
cranelift-wasm lean much more heavily on the wasmparser `Validator`.
  • Loading branch information
alexcrichton committed Jul 13, 2020
1 parent e087fc8 commit db2ef19
Show file tree
Hide file tree
Showing 68 changed files with 4,258 additions and 4,661 deletions.
3 changes: 2 additions & 1 deletion Cargo.toml
Expand Up @@ -11,6 +11,7 @@ members = ['fuzz']
[dependencies]
anyhow = "1.0"
getopts = "0.2"
rayon = "1.0"
wasmparser = { path = "crates/wasmparser" }
wasmprinter = { path = "crates/wasmprinter" }
wast = { path = "crates/wast" }
Expand All @@ -19,10 +20,10 @@ wat = { path = "crates/wat" }
[dev-dependencies]
anyhow = "1.0"
getopts = "0.2"
rayon = "1.0"
serde_json = "1.0"
tempfile = "3.1"
diff = "0.1"
wasmparser-dump = { path = 'crates/dump' }

[[test]]
name = "dump"
Expand Down
10 changes: 10 additions & 0 deletions crates/dump/Cargo.toml
@@ -0,0 +1,10 @@
[package]
name = "wasmparser-dump"
version = "0.1.0"
authors = ["The Wasmtime Project Developers"]
edition = "2018"
publish = false

[dependencies]
anyhow = "1"
wasmparser = { path = "../wasmparser" }

0 comments on commit db2ef19

Please sign in to comment.