[wasm] Implement partial backward branch support in the Jiterpreter #82756

kg · 2023-02-28T04:57:29Z

This PR adds partial support for backward branches to the jiterpreter.

Due to WASM's quirky approach to constrained control flow and loops, the code it generates is very suboptimal, but it still produces a measurable speed-up (~22sec/iter -> ~20sec/iter for one of the benchmarks that regressed, for example) and is a decent starting point to improve on.

This PR also removes a write barrier from ldelem_ref that didn't need to be there and was very expensive. That takes the regressed benchmark down further from ~20sec/iter to ~3sec/iter.

Additional statistics and a runtime option are added to go with the new backward branch support.

More detail on how it works:

For any methods that contain a trace entry point, the interpreter maintains a table listing all the bblocks that are targeted by a backwards branch.
If a method has one or more backward branch targets in its table, when the jiterpreter generates traces in that method it wraps them in a wasm loop instruction so that we can transfer control back to the top.
The jiterpreter scans the backward branch table when emitting instructions, and when it hits something it knows is a backwards branch target, it ensures that a branch target block starts there. Each encountered backwards branch target is added to a list.
Any time the jiterpreter encounters a backwards branch, if the target is in the list of branch targets we've already encountered, eip is updated and control is sent back to the top of the trace by branching to the loop. After that, execution will skip over blocks until it reaches the branch target.

Attentive readers will note that this algorithm is inefficient because we have to scan over blocks until we find the branch target - there's only one loop for the entire trace. I tried creating a separate loop for each backwards branch target (which would allow us to jump directly to it) but its interaction with forward branches (which require the ability to jump forward to the end of a control region) made it too hard to get that working, at least initially.

A better implementation of control flow would probably allow direct branching for both forwards and backwards control flow, or at least reduce the cost of the branches from what it is currently. But that implementation would likely require a second pass in the trace compiler and building some sort of CFG on the fly so I haven't started on it yet.

… looping Remove unnecessary slow write barrier from ldelem_ref Update heuristic for backward branches Add back branch success rate statistic

ghost · 2023-02-28T04:57:46Z

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds partial support for backward branches to the jiterpreter.

Due to WASM's quirky approach to constrained control flow and loops, the code it generates is very suboptimal, but it still produces a measurable speed-up (~22sec/iter -> ~20sec/iter for one of the benchmarks that regressed, for example) and is a decent starting point to improve on.

This PR also removes a write barrier from ldelem_ref that didn't need to be there and was very expensive. That takes the regressed benchmark down further from ~20sec/iter to ~3sec/iter.

Additional statistics and a runtime option are added to go with the new backward branch support.

More detail on how it works:

For any methods that contain a trace entry point, the interpreter maintains a table listing all the bblocks that are targeted by a backwards branch.
If a method has one or more backward branch targets in its table, when the jiterpreter generates traces in that method it wraps them in a wasm loop instruction so that we can transfer control back to the top.
The jiterpreter scans the backward branch table when emitting instructions, and when it hits something it knows is a backwards branch target, it ensures that a branch target block starts there. Each encountered backwards branch target is added to a list.
Any time the jiterpreter encounters a backwards branch, if the target is in the list of branch targets we've already encountered, eip is updated and control is sent back to the top of the trace by branching to the loop. After that, execution will skip over blocks until it reaches the branch target.

Attentive readers will note that this algorithm is inefficient because we have to scan over blocks until we find the branch target - there's only one loop for the entire trace. I tried creating a separate loop for each backwards branch target (which would allow us to jump directly to it) but its interaction with forward branches (which require the ability to jump forward to the end of a control region) made it too hard to get that working, at least initially.

A better implementation of control flow would probably allow direct branching for both forwards and backwards control flow, or at least reduce the cost of the branches from what it is currently. But that implementation would likely require a second pass in the trace compiler and building some sort of CFG on the fly so I haven't started on it yet.

Author:	kg
Assignees:	-
Labels:	`arch-wasm`, `area-Codegen-Jiterpreter-mono`
Milestone:	-

src/mono/mono/mini/interp/transform.c

Don't generate a loop for trace if all the back branch offsets are before its start offset

kg added 5 commits February 27, 2023 14:51

Checkpoint backward branch support

559675e

Checkpoint

35798ed

Checkpoint

0776893

Checkpoint

d5a06a8

Turn early trace abort off when back branches are on since it impairs…

ebbcf59

… looping Remove unnecessary slow write barrier from ldelem_ref Update heuristic for backward branches Add back branch success rate statistic

kg added arch-wasm WebAssembly architecture area-Codegen-Jiterpreter-mono labels Feb 28, 2023

kg requested review from lewing, pavelsavara, vargaz, lambdageek, BrzVlad and kotlarmilos as code owners February 28, 2023 04:57

ghost assigned kg Feb 28, 2023

kg mentioned this pull request Feb 28, 2023

[jiterp] Remove write barrier from ldelem_ref #82757

Merged

vargaz approved these changes Feb 28, 2023

View reviewed changes

vargaz reviewed Feb 28, 2023

View reviewed changes

src/mono/mono/mini/interp/transform.c Outdated Show resolved Hide resolved

Improve back branch offset table allocation

c748658

Don't generate a loop for trace if all the back branch offsets are before its start offset

This was referenced Feb 28, 2023

Infra improvements for Helix #68176

Closed

Methodical_others test JIT/Methodical/Coverage/copy_prop_byref_to_native_int crashing #69832

Open

lewing approved these changes Feb 28, 2023

View reviewed changes

kg merged commit ab5e28c into dotnet:main Feb 28, 2023

radekdoulik mentioned this pull request Mar 8, 2023

[Perf] Linux/x64: 504 Improvements on 2/28/2023 10:21:25 PM dotnet/perf-autofiling-issues#13799

Open

dotnet locked as resolved and limited conversation to collaborators Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wasm] Implement partial backward branch support in the Jiterpreter #82756

[wasm] Implement partial backward branch support in the Jiterpreter #82756

kg commented Feb 28, 2023

ghost commented Feb 28, 2023

[wasm] Implement partial backward branch support in the Jiterpreter #82756

[wasm] Implement partial backward branch support in the Jiterpreter #82756

Conversation

kg commented Feb 28, 2023

ghost commented Feb 28, 2023