Skip to content

Relax the block check in OperatorsReader<'_> #2202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

alexcrichton
Copy link
Member

This commit relaxes a check added in #2134 which maintains a stack of frame kinds in the operators reader, in addition to the validator. The goal of #2134 was to ensure that spec-wise-syntactically-invalid-modules are caught in the parser without the need of the validator, but investigation in #2180 has shown that this is a source of at least some of a performance regression. The change here is to relax the check to still be able to pass spec tests while making such infrastructure cheaper.

The reader now maintains just a small depth: u32 counter instead of a stack of kinds. This means that the reader can still catch invalid modules such as instructions-after-end, but the validator is required to handle situations such as else outside of an if block.

This required some adjustments to tests as well as some workarounds for the upstream spec tests that assert legacy exception-handling instructions are malformed, not invalid.

This commit relaxes a check added in bytecodealliance#2134 which maintains a stack of
frame kinds in the operators reader, in addition to the validator. The
goal of bytecodealliance#2134 was to ensure that spec-wise-syntactically-invalid-modules
are caught in the parser without the need of the validator, but
investigation in bytecodealliance#2180 has shown that this is a source of at least some
of a performance regression. The change here is to relax the check to
still be able to pass spec tests while making such infrastructure
cheaper.

The reader now maintains just a small `depth: u32` counter instead of a
stack of kinds. This means that the reader can still catch invalid
modules such as instructions-after-`end`, but the validator is required
to handle situations such as `else` outside of an `if` block.

This required some adjustments to tests as well as some workarounds for
the upstream spec tests that assert legacy exception-handling
instructions are malformed, not invalid.
@alexcrichton
Copy link
Member Author

cc @keithw I'm curious for your thoughts on this, I'm not wed to this approach and would be happy to close without merging, but if you don't have an objection reducing the cost of the parsing I think would be good.

cc @Robbepop in relation to #2180

@Robbepop
Copy link
Collaborator

@alexcrichton thank you for trying to come up with a solution for this! :)

@keithw
Copy link
Contributor

keithw commented May 22, 2025

Hmm... Honestly, I've really tried to replicate a slowdown in parsing individual large modules, and to date my measurements (#2180 (comment)) were that parsing alone doesn't seem to have gotten slower, when we're talking about parsing a single huge module, e.g. https://github.com/turbolent/w2c2/blob/main/examples/clang/clang.wasm, vs. something like the testsuite which is about parsing thousands of tiny modules.

So my preference would be to find and be able to replicate the problem first because I think that will help inform how to fix it, and e.g. where it's worth complicating the story and spec adherence. My view is that the spec defines a notion of binary well-formedness in chapter 5, and validity in chapter 3, and it would be nice if the parser enforced chapter 5 by itself, and the validator enforced chapter 3, rather than just saying that the parser+validator is going to jointly enforce the requirements of chapters 5+3. This is particularly relevant to me because we're building a development environment that enforces the syntactic well-formedness on entry (and shows validation errors but doesn't prevent them), so the distinction matters to us for that reason.

All things being equal, chapter 5 does seem pretty clear that the binary module corresponding to (func else) is malformed, and it would be nice if wasmparser could continue to detect that by failing the parse...

@keithw
Copy link
Contributor

keithw commented May 22, 2025

To be a little more concrete, as far as I can tell so far, the slowdowns seem to be in "parsing and validating thousands of tiny modules" but not in "parsing one gigantic module." But I'm not sure if we're all on the same page about this? That's certainly what I was trying to figure out in #2180. :-)

If we are on the same page about where the slowdown is, then I guess my thinking would be that it's worth it to reduce the startup overhead of the Parser (maybe in being able to share its heap allocations across runs like the Validator can do?), but I'd be less eager to use startup overhead as a reason to reclassify a module like (func else) as well-formed if chapter 5 says otherwise.

@Robbepop
Copy link
Collaborator

Hmm... Honestly, I've really tried to replicate a slowdown in parsing individual large modules

@alexcrichton Did you also use cargo bench -p wasmparser instead of using the wasm-tools CLI tool as @alexcrichton has commented earlier?

I ran the benchmarks on an Apple M2 Pro, you ran them on a AMD Ryzen 7 PRO 4750U. What did you use @alexcrichton? Both alex and me saw could reproduce the regressions via cargo bench -p wasmparser.

[..] parsing alone doesn't seem to have gotten slower, when we're talking about parsing a single huge module [..]
[..] To be a little more concrete, as far as I can tell so far, the slowdowns seem to be in "parsing and validating thousands of tiny modules" but not in "parsing one gigantic module." [..]

I also wondered about this. It was a hypothesis so far since I could not proof if it really behaves like this. If true then indeed we should see nice speed-ups by introducing ParserAllocations similar to ValidatorAllocations. Thought, the parser's state looks fairly small in contrast to the validator's state.

My view is that the spec defines a notion of binary well-formedness in chapter 5, and validity in chapter 3, and it would be nice if the parser enforced chapter 5 by itself, and the validator enforced chapter 3 [..]

I fully agree with you. It would certainly be better and we should ideally find a set of changes that allow for this while keeping good performance.

If we are on the same page about where the slowdown is, then I guess my thinking would be that it's worth it to reduce the startup overhead of the Parser [..]

@alexcrichton has shown in a prior post which parts of the parser causes which slowdowns: #2180 (comment)

I am not saying that the mentioned fixes should be implemented but they provide a clue to what is actually causing the regressions and at least in this comments it doesn't look like the regressions are stemming entirely from the startup of the parser.

@alexcrichton
Copy link
Member Author

FWIW I do agree with @keithw that I'm not really certain what the regression/benchmark being optimized for is here. I don't dispute that criterion is printing regressions in wasmparser/wasmi, but the numbers currently don't make much sense. The main number I at least personally care about is the time it takes to parse + validate a module. In wamsparser using a variety of modules the regression is in the 5% range, while wasmi shows it's in the 30% range. I don't think any of us know at this time where such a difference comes from.

I'm the same as @keithw where I've been unable to show anything more than 5% regression when simply validating a module (or within that ballpark at least). I would also agree that the benchmark of just iterating over all tests in this repository isn't necesarily representative so while it can be useful to track I don't think it's worthwhile to live and die by that number.


@keithw after thinking about this for awhile I'm now at the point where I think I have a more crisp articulation of my thinking. Right now we've effectively got two competing goals:

  1. The wasm specification has strict definitions of what is a syntactically valid module and what is a semantically valid module.
  2. Wasmparser, for speed/maintenance reasons, does not perfectly match the specification about whether an invalid module is syntactically invalid or semantically invalid.

My understanding though is that this distinction of whether an invalid module is syntactically invalid or semantically invalid is basically an artifact of the way the spec is authored and/or the spec interpreter. I'm also under the impression that web engines (and maybe no other engine?) also do not worry about distinguishing between these two kinds of modules. To me this creates a tension because it means that there's not much recourse to providing a form of pressure on the specification itself to handle a concern such as this.

This puts wasmparser in a bit of an unfortunate spot where I'd like it to follow the spec exactly ideally but it's pretty unmaintainable from a 100% compliance perspective (insofar as exactly matching whether invalid modules are semantically invalid or syntactically invalid). We already have some affordances for where wasmparser differs from the spec, and this PR could arguably be classified as such a change too.

Ok that's all a bunch of background which leads me to my articulation: I'm not sure where to draw the line of where wasmparser can/should be different from the spec. Is the current state of "just a little different" ok? Is this PR's state of "ok a bit more significantly different" ok? Personally I don't really feel equipped to answer these questions myself as it feels like we're stuck between a rock and a hard place with the way the spec is developed and tested.

Do you think that it would be worthwhile to bring these concerns to the CG perhaps? For example these are two possible (and radical) states to shift the development of the spec hypothetically:

  • Remove the distinction between assert_{malformed,invalid} and instead have one single predicate of "a module is valid or not". Precisely how it's valid/invalid is irrelevant for the tests, engines, etc. This would amount to dropping the assert_malformed assertion and possible updating the wording of the specification in various places (maybe? unsure.)
  • Massage the specification to taking into account performance/maintainability concerns on behalf of engines in exactly how the AST of a module is constructed. For example I'd like block-like instructions to look more like normal instructions for this instead of having stricter syntactic rules. I'm not sure if this is even possible though.

In any case that's at least some more thoughts (sorry about so many words on this somewhat niche topic). I'll clarify again I personally have no predetermined outcome I'd like to reach at this time, so if everyone's willing this is definitely a topic I'd like to dig into and explore as opposed to rush to any particular solution. Basically I'm not trying to land this PR ASAP or anything like that.

@Robbepop
Copy link
Collaborator

Robbepop commented Jun 23, 2025

In wamsparser using a variety of modules the regression is in the 5% range, while wasmi shows it's in the 30% range. I don't think any of us know at this time where such a difference comes from.

Indeed, that is very confusing to me as well. I tried to have minimal refactorings for the wasmparser update for Wasmi but maybe some things went sideways; OR (as seen prior) some old invariants are no longer upheld and since that once worked efficiently are not anymore.

One area where Wasmi uses wasmparser very differently from wasmparser's own benchmarks is its lazy translation. And it is exactly where we see the most regressions in Wasmi.

I will re-benchmark Wasmi and take a deeper dive to see what is going on as soon as time and capacity permits. I will inform you about my findings (if any) at #2180.

@keithw
Copy link
Contributor

keithw commented Jun 23, 2025

Hi @alexcrichton,

I don't want to derail the practical questions of dealing with the performance issues in the code, but I feel like I owe you a response on these more policy-oriented questions -- I'm sorry I missed this the first time.

One high-level response would be to ask if you think it would be a good idea to start a #wasm-tools and #git-wasm-tools topic on the Bytecode Alliance Zulip. That might be a useful place to have longer-horizon discussions vs. GitHub PR/issue threads, especially among stakeholders from different projects.

Ok that's all a bunch of background which leads me to my articulation: I'm not sure where to draw the line of where wasmparser can/should be different from the spec. Is the current state of "just a little different" ok? Is this PR's state of "ok a bit more significantly different" ok? Personally I don't really feel equipped to answer these questions myself as it feels like we're stuck between a rock and a hard place with the way the spec is developed and tested.

I think if we're looking for a semi-principled position, let me shop a proposal for an intermediate place to "draw the line," which happens to match exactly where wasm-tools is right now.

Namely: I think it's fine if wasmparser (without validation) over-accepts some malformed binary modules, as long as these modules still have a plausible representation in the abstract syntax defined by Chapter 2. That means these binaries can be translated to well-formed text and back to binary and back to the same text as before. In other words, the exceptions are only relevant to the binary format.

So the current affordances (e.g. missing DataCount section, malformed mutability flags, integer too large) are all fine under that rubric, because those modules can still be represented in the abstract syntax, and they don't cause wasmprinter to output malformed text.

By contrast, if wasmparser allowed (func else), that would be on the other side of this particular line. That module can't be represented in the abstract syntax of chapter 2 (and it's malformed even in the text format). So I think that why I was personally fine with the existing affordances but reluctant about relaxing this kind of check -- to me this is where I would draw the line.

(For context, my group is writing a structured editor to teach freshman CS where the editor enforces that the input buffer is always syntactically well-formed WAT, so when I picture myself in front of a classroom of 300 angry freshman trying to defend these distinctions to people who have never programmed before, I'm hoping for a certain moral clarity. I realize this isn't a typical use-case... I think you're totally right that a browser doesn't have to worry about these distinctions, but, the browser engines aren't trying to be general Wasm toolkits; they don't have text-mode parsers and I don't think they worry about roundtripping, or performing manipulations on well-formed Wasm modules that might take them through an invalid state.)

Do you think that it would be worthwhile to bring these concerns to the CG perhaps? For example these are two possible (and radical) states to shift the development of the spec hypothetically:

Remove the distinction between assert_{malformed,invalid} and instead have one single predicate of "a module is valid or not". Precisely how it's valid/invalid is irrelevant for the tests, engines, etc. This would amount to dropping the assert_malformed assertion and possible updating the wording of the specification in various places (maybe? unsure.)
Massage the specification to taking into account performance/maintainability concerns on behalf of engines in exactly how the AST of a module is constructed. For example I'd like block-like instructions to look more like normal instructions for this instead of having stricter syntactic rules. I'm not sure if this is even possible though.

I think a concern these would have to overcome is that most of the spec (including the validation and execution rules) are specified in terms of the abstract syntax of chapter 2. So if we wanted to allow stuff that isn't currently representable in the abstract syntax, that syntax would have to be widened to allow, and then the validation rules would have to be adapted to account for it.

E.g. looking at 2.4.8 (https://webassembly.github.io/spec/core/syntax/instructions.html#control-instructions), the abstract syntax for instr currently includes if blocktype instr* else instr* end. The "consequent" and "alternative" instruction sequences are components of the production, and the validation and execution rules later just refer to them as intrinsically present. Changing this to have else be a plain instruction that can syntactically appear anywhere would be I think a substantial change.

@alexcrichton
Copy link
Member Author

One high-level response would be to ask if you think it would be a good idea to start a #wasm-tools and #git-wasm-tools topic on the Bytecode Alliance Zulip.

I think this would be and excellent idea (working on making that web-public)

let me shop a proposal for an intermediate place to "draw the line," which happens to match exactly where wasm-tools is right now.

Ok that all sounds pretty reasonable to me. I agree with you that changing else or control flow would be a pretty big spec change and is likely not on the table at this time. I'm still slightly uncomfortable about data count bits but I can see it going either way myself really.


Overall I'm personally warming up to #2228 so my current plan is to close this when that PR merges, but I'll leave this open in the meantime for now.

(thanks again for all the discussion on this!)

@alexcrichton
Copy link
Member Author

Closing in favor of #2228

@alexcrichton alexcrichton deleted the just-check-depth branch June 23, 2025 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants