wasmparser: improve visit_operator performance #2228

keithw · 2025-06-09T07:09:21Z

This PR addresses some of the performance issues discussed in #2180 and #2202.

When validating, it eliminates the need for the reader to have its own stack of frames or a depth count. The BinaryReader has a visit_operator function again, where the visitor implements the new FrameStack trait so that visit_operator can get the current frame's kind. Validation uses this to avoid duplicating the FrameStack logic between the reader and validator.
For consumers who don't want to implement the FrameStack trait, they can use an OperatorsReader which keeps an internal FrameStack, but it can now be created from its allocations (and turned back into them), avoiding the need to allocate a stack of frames each time an OperatorsReader is constructed.

For validating lots of tiny modules and one big modules, I'm seeing these absolute timings on an Intel Core i7-9700T @ 4.3 GHz, running this command: cargo bench -p wasmparser --bench benchmark -- --baseline pre-change 'validate/(tests|spidermonkey)'. Percentages are relative to the first row.

commit	validate/tests	validate/spidermonkey
`90c156f` (just before #2134)	5.1770 ms	39.707 ms
`0354dde` (PR #2134)	5.2136 ms (+0.7%)	39.929 ms (+0.6%)
`351f153` (v1.229.0, #2149)	5.1867 ms (+0.2%)	39.474 ms (-0.6%)
`2dbe8b9` (v1.230.0, #2171)	5.2155 ms (+0.7%)	40.088 ms (+1.0%)
`315252f` (v1.231.0, #2206)	5.2113 ms (+0.7%)	41.558 ms (+4.7%)
`93610b8` (v1.232.0, #2209)	5.4440 ms (+5.2%)	42.099 ms (+6.0%)
`14096f0` (v1.233.0, #2218)	5.8129 ms (+12.3%)	41.191 ms (+3.7%)
`a05eaa5` (current main branch)	5.7693 ms (+11.4%)	41.529 ms (+4.6%)
`e3c880c` (this PR)	5.7307 ms (+10.7%)	40.691 ms (+2.4%)

These measurements seem to suggest that #2134 (and the broader issue of the BinaryReader or OperatorsReader keeping a stack of frames) may not the major culprit with regards to the slowdown discussed in #2180, but if that is a point where we want to improve things (e.g. in #2202), this PR is about as minimal as I've been able to get it. It occurs to me that the validate/tests benchmark might not be a reliable indicator of regression if the testsuite submodule is updated in-between (e.g. to add more tests)...?

pchickey · 2025-06-09T16:32:12Z

Alex has the context here so I'm swapping reviewer to him.

alexcrichton · 2025-06-12T22:11:43Z

Wanted to drop in and say I haven't forgotten about this, sorry it's been a very busy week! I hope to be able to get to this tomorrow or Monday at the latest

alexcrichton · 2025-06-16T14:58:02Z

Ok I've now gotten a chance to sit down with this and review it more closely, thanks for the patience. Also cc @Robbepop in case you hadn't already seen this.

I'd agree though that tracking the validate/tests benchmark over time isn't too useful since that is the whole test suite which includes new tests written, new spec tests, etc, so the set of what's being tested can fluctuate. Perhaps it should just be removed entirely from the benchmark suite...

Otherwise though I'm curious to dig in a bit more to the spidermonkey numbers. My overall stance on this is that I'm not a fan of making things less ergonomic in the name of performance but if it needs to be done it needs to be done. In that sense if this is a clear and unambiguous win then I think something like it should land, but I'm effectively double-checking that. In that sense I'm wary of the absolute numbers here in that Criterion is a wall-time-based benchmarking system which are generally notoriously unreliable when it comes to small timings, and 40ms here is IMO a pretty small amount of time (making it difficult for Criterion to measure changes).

For example the numbers you collected show that your change in this PR is a 2.2% improvement over the current main banch. However the numbers also show that the 1.233.0 release was a 2.3% improvement over 1.232.0. However in this release practically nothing changed related to core wasm parsing/validation (some component bits but those are unrelated). In that sense this aligns with what I'd roughly expect in that in the millisecond range Criterion, on the same code, can have +/- 5% swings (ish).

Interestingly the biggest regression you've measured, 3.7% in 1.230 -> 1.231 actually landed a few optimizations such as #2197. I'm not sure how best to explain that...

Basically I'm overall a bit skeptical of the numbers collected here, but at the same time I can also naively see how this would probably improve performance given that there's fewer data structures in play. I'd prefer to not have to land this in the sense that it regresses ergonomics and makes things a bit more difficult to use, but that's also not the end of the world and something that can be figured out regardless. I'm otherwise basically lamenting how the benchmarking infrastructure for wasmparser is not great and I at least personally feel I still don't have a great handle on what the regressions are...

Robbepop · 2025-06-16T15:47:39Z

@alexcrichton thanks for the cc!

This might not really have anything to do with performance, but while trying to use the new wasmparser version in Wasmi, I noticed that the newly introduced TypedSelectMulti operator is weird in two ways:

It is the only Operator that is not actually a valid Wasm operator.
It is the only Operator that has a non-Copy field. (This gave me some minor headache.)

Do we really want to have such a "weird" operator? To me this indicates that wasmparser's parser and validator have an unclear border or grey area in between them since recent changes.

I'm otherwise basically lamenting how the benchmarking infrastructure for wasmparser is not great and I at least personally feel I still don't have a great handle on what the regressions are...

This sounds as if we should not attempt to optimize anything until we have better data (or benchmarks).

I'd prefer to not have to land this in the sense that it regresses ergonomics and makes things a bit more difficult to use [..]

I generally agree. The wasmparser crate is quite complex already. It would be great if we could find a solution that rather simplifies wasmparser instead of adding more complexity. Then again, stripping complexity away is among the hardest work in programming.

keithw · 2025-06-17T22:47:42Z

@alexcrichton Thanks, I appreciate these thoughtful comments. I'm not particularly attached to this PR; if there is a concern about a performance hit from the state introduced in #2134, this is my best shot to eliminate that state, without affecting conformance to chapter 5. If there turns out not to be a significant performance concern, I'm fine sticking with the status quo.

From looking at a perf report, one of the contributors to performance seems to be that if we move the visit_operator implementation back into binary_reader.rs (as this PR does), calls to things like read_var_u32 are more likely to get inlined vs. when visit_operator is implemented in crates/wasmparser/src/readers/core/operators.rs. This is maybe a <2% effect but I think it is real. (These calls do get inlined if we mark read_var_u32 #[inline(always)].)

The rest might just be the play of chance with regard to any small change going through LLVM on a non-PGO build. But obviously you should run the numbers yourself and see what you think.

One idea that you might like would be adding some more "intensive" benchmark workloads that might be more statistically robust, e.g. a validation workload that alternates validating spidermonkey.wasm and bz2.wasm 100 times in a row. If you want this, happy to send a PR but you probably know what you want here.

keithw · 2025-06-17T23:19:48Z

I noticed that the newly introduced TypedSelectMulti operator is weird in two ways:
1. It is the only `Operator` that is not actually a valid Wasm operator.

Multi-value select is similar to a multi-value expression before the multi-value feature. E.g. in Wasm 1.0, this module was well-formed, and roundtrippable between binary and text, but invalid:

(func (result i32 i32) unreachable)

The multi-value feature made it valid.

Similarly, this multi-value select is well-formed and roundtrippable between binary and text, but currently invalid: https://github.com/WebAssembly/spec/blob/wasm-3.0/test/core/select.wast#L372-L378 . A future feature extension could also make it valid. wasmparser has a representation for it so it can parse testcases like that one and so wasmprinter can print it back to text.

To me this indicates that wasmparser's parser and validator have an unclear border or grey area in between them since recent changes.

I think the border is basically what's in the spec -- the readers/Parser parse the syntax defined in chapter 5, and the validator performs the "Validation Algorithm" described in the appendix (enforcing the rules of chapter 3).

- The BinaryReader can visit_operator by itself again, with no need for OperatorsReader or extra state, when the visitor implements the FrameStack trait. Validation uses this to avoid duplicating the frame stack. - OperatorsReader keeps an internal FrameStack and can be created from its allocations (and turned back into them), avoiding the need to allocate a stack of frames.

alexcrichton · 2025-06-23T17:05:46Z

Do we really want to have such a "weird" operator? To me this indicates that wasmparser's parser and validator have an unclear border or grey area in between them since recent changes.

Personally I agree with @keithw here in that there's no gray area spec-wise. The spec is quite clear that syntactically this operator is valid it's just semantically always invalid. The parser is "this is as spec-defined AST" (more-or-less) and the validator is "is this AST valid or not?".

Now to counter myself I do still think there's gray area, but I think it's different than what others are thinking. I wrote up my thoughts here on it with a tl;dr; of I think there's gray area right now in precisely how implementations are expected to follow the spec. My impression is that implementations like browsers only care about the "is this module invalid" vs "is this malformed or validation-invalid". That leaves a vacuum (in my opinion) about shaping the spec to prioritizing implementations which do want to distinguish between malformed and invalid.

I'm otherwise basically lamenting how the benchmarking infrastructure for wasmparser is not great and I at least personally feel I still don't have a great handle on what the regressions are...

This sounds as if we should not attempt to optimize anything until we have better data (or benchmarks).

While I do think better benchmarks would be useful I don't want to be so drastic as to say nothing can happen until they're improved. @keithw's idea might be good to have where a module is just validated N times to make the benchmark less noisy (in theory).

To me though the biggest discrepancy is that you're measuring 30%+ regressions in wasmi but neither Keith nor I have been able to reproduce this. I at least haven't been able to understand it for sure. What would be most helpful, ideally, is if you're able to dig in and understand where this 30% is going. It's easy enough to run a command and see "30%" pop out but understanding the structure of what led to that conclusion is crucial as well.

In any case I'm going to give a more close review to this now as this seems like an "obvious win" perf-wise (why have two stacks when you can have one?) and the numbers show a clear improvement no matter what.

alexcrichton

Ok this all looks reasonable at a high level to me code-wise. Thank you again @keithw for putting work into this!

My main thinking on this is that I feel it puts the *Allocations a bit too front-and-center for a relatively niche case of optimizing this. I'd prefer to instead enable it to be possible to reuse allocations but not necessarily make it the default path. For example get_operators_reader() I think is a reasonable signature to keep on a few locations but additionally having a get_binary_reader_for_operators (or similar) with a constructor on OperatorsReader that takes BinaryReader + allocations would be sufficient. I think that'd reduce the impact in-repo and additionally reduce the impact for external consumers too.

Is that something you'd like me to push up changes here for though? I'm happy to make the edits myself, but I'm also happy to review if you'd prefer to make the changes yourself too.

The rest might just be the play of chance with regard to any small change going through LLVM on a non-PGO build.

I'll say this is something I've been worried about as well. I find giant loops/matches (as is the case with validation/reading here) to be pretty finnicky and unrelated changes can have outsized effects when they basically shouldn't (e.g. code motion). I'm not sure how best to thread that needle, but moving everything back to binary_reader.rs seems fine either way

keithw · 2025-06-23T18:01:20Z

Thank you! It's fine with me if you want to push changes directly -- and I agree it sounds good to keep the get_operators_reader() path available for most consumers,

Make it a non-default API that other various locations can optionally use but aren't required to do so.

alexcrichton · 2025-06-23T19:04:31Z

Ok I've pushed up some changes (sorry for the noise on a few of them...)

I'm going to double-check this can be integrated well into Wasmtime and then I plan to merge.

alexcrichton · 2025-06-23T19:51:13Z

I'm going to double-check this can be integrated well into Wasmtime and then I plan to merge.

ok well that was easy with 0 changes it passes all tests, yay!

keithw requested a review from a team as a code owner June 9, 2025 07:09

keithw requested review from pchickey and removed request for a team June 9, 2025 07:09

pchickey requested review from alexcrichton and removed request for pchickey June 9, 2025 16:25

keithw mentioned this pull request Jun 17, 2025

wasmparser: 15-30% performance regressions from v228 -> v229 #2180

Open

keithw force-pushed the shared-frame-stack branch from 94d7dc0 to 7f05e07 Compare June 17, 2025 23:59

keithw force-pushed the shared-frame-stack branch from 7f05e07 to 4b449e0 Compare June 18, 2025 00:05

Robbepop mentioned this pull request Jun 18, 2025

Optimize OperatorsReader control stack #2241

Merged

alexcrichton reviewed Jun 23, 2025

View reviewed changes

alexcrichton mentioned this pull request Jun 23, 2025

Relax the block check in OperatorsReader<'_> #2202

Closed

alexcrichton added 6 commits June 23, 2025 11:41

Don't require handling OperatorsReaderAllocations

2fe7a49

Make it a non-default API that other various locations can optionally use but aren't required to do so.

More test fixes

7922819

Even more test fixes

22ab987

Simplify some adapter definitions

1e7bbbf

Simplify definition of OperatorFactory

1150709

Minor doc updates

194e9bd

Fix doc link

c9dc8cf

alexcrichton approved these changes Jun 23, 2025

View reviewed changes

alexcrichton added this pull request to the merge queue Jun 23, 2025

Merged via the queue into bytecodealliance:main with commit 40fe970 Jun 23, 2025
32 checks passed

keithw deleted the shared-frame-stack branch June 23, 2025 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wasmparser: improve visit_operator performance #2228

wasmparser: improve visit_operator performance #2228

Uh oh!

keithw commented Jun 9, 2025 •

edited

Loading

Uh oh!

pchickey commented Jun 9, 2025

Uh oh!

alexcrichton commented Jun 12, 2025

Uh oh!

alexcrichton commented Jun 16, 2025

Uh oh!

Robbepop commented Jun 16, 2025 •

edited

Loading

Uh oh!

keithw commented Jun 17, 2025 •

edited

Loading

Uh oh!

keithw commented Jun 17, 2025 •

edited

Loading

Uh oh!

alexcrichton commented Jun 23, 2025

Uh oh!

alexcrichton left a comment

Uh oh!

keithw commented Jun 23, 2025

Uh oh!

alexcrichton commented Jun 23, 2025

Uh oh!

alexcrichton commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

wasmparser: improve visit_operator performance #2228

wasmparser: improve visit_operator performance #2228

Uh oh!

Conversation

keithw commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pchickey commented Jun 9, 2025

Uh oh!

alexcrichton commented Jun 12, 2025

Uh oh!

alexcrichton commented Jun 16, 2025

Uh oh!

Robbepop commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keithw commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keithw commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Jun 23, 2025

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

keithw commented Jun 23, 2025

Uh oh!

alexcrichton commented Jun 23, 2025

Uh oh!

alexcrichton commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

keithw commented Jun 9, 2025 •

edited

Loading

Robbepop commented Jun 16, 2025 •

edited

Loading

keithw commented Jun 17, 2025 •

edited

Loading

keithw commented Jun 17, 2025 •

edited

Loading