Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Implement partial backward branch support in the Jiterpreter #82756

Merged
merged 6 commits into from Feb 28, 2023

Conversation

kg
Copy link
Contributor

@kg kg commented Feb 28, 2023

This PR adds partial support for backward branches to the jiterpreter.

Due to WASM's quirky approach to constrained control flow and loops, the code it generates is very suboptimal, but it still produces a measurable speed-up (~22sec/iter -> ~20sec/iter for one of the benchmarks that regressed, for example) and is a decent starting point to improve on.

This PR also removes a write barrier from ldelem_ref that didn't need to be there and was very expensive. That takes the regressed benchmark down further from ~20sec/iter to ~3sec/iter.

Additional statistics and a runtime option are added to go with the new backward branch support.

More detail on how it works:

  1. For any methods that contain a trace entry point, the interpreter maintains a table listing all the bblocks that are targeted by a backwards branch.
  2. If a method has one or more backward branch targets in its table, when the jiterpreter generates traces in that method it wraps them in a wasm loop instruction so that we can transfer control back to the top.
  3. The jiterpreter scans the backward branch table when emitting instructions, and when it hits something it knows is a backwards branch target, it ensures that a branch target block starts there. Each encountered backwards branch target is added to a list.
  4. Any time the jiterpreter encounters a backwards branch, if the target is in the list of branch targets we've already encountered, eip is updated and control is sent back to the top of the trace by branching to the loop. After that, execution will skip over blocks until it reaches the branch target.

Attentive readers will note that this algorithm is inefficient because we have to scan over blocks until we find the branch target - there's only one loop for the entire trace. I tried creating a separate loop for each backwards branch target (which would allow us to jump directly to it) but its interaction with forward branches (which require the ability to jump forward to the end of a control region) made it too hard to get that working, at least initially.

A better implementation of control flow would probably allow direct branching for both forwards and backwards control flow, or at least reduce the cost of the branches from what it is currently. But that implementation would likely require a second pass in the trace compiler and building some sort of CFG on the fly so I haven't started on it yet.

… looping

Remove unnecessary slow write barrier from ldelem_ref
Update heuristic for backward branches
Add back branch success rate statistic
@ghost
Copy link

ghost commented Feb 28, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds partial support for backward branches to the jiterpreter.

Due to WASM's quirky approach to constrained control flow and loops, the code it generates is very suboptimal, but it still produces a measurable speed-up (~22sec/iter -> ~20sec/iter for one of the benchmarks that regressed, for example) and is a decent starting point to improve on.

This PR also removes a write barrier from ldelem_ref that didn't need to be there and was very expensive. That takes the regressed benchmark down further from ~20sec/iter to ~3sec/iter.

Additional statistics and a runtime option are added to go with the new backward branch support.

More detail on how it works:

  1. For any methods that contain a trace entry point, the interpreter maintains a table listing all the bblocks that are targeted by a backwards branch.
  2. If a method has one or more backward branch targets in its table, when the jiterpreter generates traces in that method it wraps them in a wasm loop instruction so that we can transfer control back to the top.
  3. The jiterpreter scans the backward branch table when emitting instructions, and when it hits something it knows is a backwards branch target, it ensures that a branch target block starts there. Each encountered backwards branch target is added to a list.
  4. Any time the jiterpreter encounters a backwards branch, if the target is in the list of branch targets we've already encountered, eip is updated and control is sent back to the top of the trace by branching to the loop. After that, execution will skip over blocks until it reaches the branch target.

Attentive readers will note that this algorithm is inefficient because we have to scan over blocks until we find the branch target - there's only one loop for the entire trace. I tried creating a separate loop for each backwards branch target (which would allow us to jump directly to it) but its interaction with forward branches (which require the ability to jump forward to the end of a control region) made it too hard to get that working, at least initially.

A better implementation of control flow would probably allow direct branching for both forwards and backwards control flow, or at least reduce the cost of the branches from what it is currently. But that implementation would likely require a second pass in the trace compiler and building some sort of CFG on the fly so I haven't started on it yet.

Author: kg
Assignees: -
Labels:

arch-wasm, area-Codegen-Jiterpreter-mono

Milestone: -

Don't generate a loop for trace if all the back branch offsets are before its start offset
@kg kg merged commit ab5e28c into dotnet:main Feb 28, 2023
@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants