Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rewrite wasmparser's module parser and validator (#40)
* Rewrite wasmparser's module parser and validator This commit is a major refactoring (bordering on rewrite) of wasmparser's top-level wasm module parsing and validation. Lots of code from the previous validator has been reused but it's moved around so much that it looks more like a rewrite. At a high level this commit removes the old `Parser`, `ModuleReader`, and `ValidatingParser` types. These are replaced with a new `Parser`, `Validator`, and `FuncValidator`. There are a number of points motivating this rewrite: * Parsing a module currently requires the entire module to be resident in-memory. There is no way to incrementally parse a module (for example as it arrives from the network). * Validating a module is an all-or-nothing operation. This, like parsing above, means it's not friendly for incrementally acquired wasm binaries. * Validation does not expose its results, nor does it provide a way for doing so. This means that you can validate a wasm blob in its entirety but you can't retroactively ask what the type of function 3 was. More concretely, if you're implementing a code translator you have to track a lot of state the validator was already keeping for you. * Validation did not easily provide the ability to parse, validate, and possibly compile wasm functions in parallel. The single monolithic `Validator` would be difficult to interleave application-specific details into, such as parallelism. These issues are all currently deep architectural issues in how code is organized today, so the approach taken in this commit is to rewrite these as opposed to adding them on as a feature. Much of this work was motivated after recent refactorings for the module linking proposal. The amount of bookeeping needed to keep track of type aliases and such was a big enough job for validation that I didn't want to have to redo it all again in wasmtime later on! The new `Parser` and `Validator` types are designed to be used both in high-level and low-level contexts. Handling a WebAssembly module efficiently can often involve a lot of moving pieces at runtime which are very application-specific, and it seems like overkill or impossible at worst to try to encapsulate all these patterns in wasmparser. Instead the intention here is that the lowest level bits are able to be reused regardless of how you're parsing wasm, and the higher level bits are as straightforward to reimplement and use as possible. This ideally means that if you can't use some high-level conveniences in wasmparser it should be obvious how you can rewrite them locally to work in your own application. Detailed design of the new APIs added here is best learned by reading the rustdoc documentation, the examples, or tests. At a high-level though the way these new types operate are: * `Parser` is fed chunks of data, and it will return one chunk of parsed data which is a view into the input buffer. If it can't parse a chunk then it will tell the application it needs to wait for more data to be available. * Most sections are parsed as-a-whole, meaning that they need to be entirely resident in memory before being parsed. For example this rewrite does not support incrementally parsing the type section. This is done for ease with the expectation that most sections are reasonably quite small and have no reason to be incrementally processed beyond the section-at-a-time level. * `Parser`, however, will allow incremental downloads of the code and module code sections. This means that it supports parsing a singular function or a singular module at a time. This allows functions to be validated/processed immediately as they're received, without having to wait for the next function to be available. * The `Validator` type receives as input the payloads returned by `Parser`. The `Validator` type is intended to be persistently living adjacent to a `Parser` which it receives input from. * Validation is intended to eventually expose information about the module and operators as necessary. For example methods can be added to `Validator` to query what types are and the results of operators in functions. It's envisioned that you'd use a `Parser` to parse a module and then `Validator` would be used before you execute application-specific code per-section/item. At this time operator/function validation is not changed. The operator validator is only very lightly touched, but otherwise it's expected that this will be a future refactoring. I would like to effectively add a method-per-opcode to `FuncValidator` so engines can, for example, validate a `call` instruction, get back the type of the call, and then use that type to iterate over the types on the stack and return values. None of this is supported yet, but I'm hoping to make, for example, cranelift-wasm lean much more heavily on the wasmparser `Validator`.
- Loading branch information