-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle conflict-related liverange splits arising from stack constraints without falling back to spill bundle. #49
Merged
cfallin
merged 2 commits into
bytecodealliance:main
from
cfallin:better-stack-constraint-splits
May 17, 2022
Merged
Handle conflict-related liverange splits arising from stack constraints without falling back to spill bundle. #49
cfallin
merged 2 commits into
bytecodealliance:main
from
cfallin:better-stack-constraint-splits
May 17, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ts without falling back to spill bundle. Currently, we unconditionally trim the ends of liveranges around a split when we do a split, including splits due to conflicts in a liverange/bundle's requirements (e.g., a liverange with both a register and a stack use). These trimmed ends, if they exist, go to the spill bundle, and the spill bundle may receive a register during second-chance allocation or otherwise will receive a stack slot. This was previously measured to reduce contention significantly, because it reduces the sizes of liveranges that participate in the first-chance competition for allocations. When a split has to occur, we might as well relegate the "connecting pieces" to a process that comes later, with a hint to try to get the right register if possible but no hard connection to either end. However, in the case of a split arising from a reg-to-stack / stack-to-reg conflict, as happens when references are used or def'd as registers and then cross safepoints, this extra step in the connectivity (normal LR with register use, then spill bundle, then normal LR with stack use) can lead to extra moves. Additionally, when one of the LRs has a stack constraint, contention is far less important; so it doesn't hurt to skip the trimming step. In fact, it's likely much better to put the "connecting piece" together with the stack side of the conflict. Ideally we would handle this with the same move-cost logic we use for conflicts detected during backtracking, but the requirements-related splitting happens separately and that logic would need to be generalized further. For now, this is sufficient to eliminate redundant moves as seen in e.g. bytecodealliance/wasmtime#3785.
No observed compile-time or execution-time impacts with Sightglass. (Though there was a slight compile-time regression before the |
Amanieu
approved these changes
May 17, 2022
Merged
cfallin
added a commit
to cfallin/wasmtime
that referenced
this pull request
May 17, 2022
This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in soem cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constraineed use. For example, in bytecodealliance#3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue.
cfallin
added a commit
to cfallin/wasmtime
that referenced
this pull request
May 17, 2022
This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in some cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constraineed use. For example, in bytecodealliance#3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue.
cfallin
added a commit
to cfallin/wasmtime
that referenced
this pull request
May 17, 2022
This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in some cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constrained use. For example, in bytecodealliance#3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue.
cfallin
added a commit
to bytecodealliance/wasmtime
that referenced
this pull request
May 18, 2022
* Upgrade to regalloc2 0.1.3. This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in some cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constrained use. For example, in #3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue. * Update filetests where regalloc2 improvement altered behavior with reftypes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, we unconditionally trim the ends of liveranges around a split
when we do a split, including splits due to conflicts in a
liverange/bundle's requirements (e.g., a liverange with both a register
and a stack use). These trimmed ends, if they exist, go to the spill
bundle, and the spill bundle may receive a register during second-chance
allocation or otherwise will receive a stack slot.
This was previously measured to reduce contention significantly, because
it reduces the sizes of liveranges that participate in the first-chance
competition for allocations. When a split has to occur, we might as well
relegate the "connecting pieces" to a process that comes later, with a
hint to try to get the right register if possible but no hard connection
to either end.
However, in the case of a split arising from a reg-to-stack /
stack-to-reg conflict, as happens when references are used or def'd as
registers and then cross safepoints, this extra step in the connectivity
(normal LR with register use, then spill bundle, then normal LR with
stack use) can lead to extra moves. Additionally, when one of the LRs
has a stack constraint, contention is far less important; so it doesn't
hurt to skip the trimming step. In fact, it's likely much better to put
the "connecting piece" together with the stack side of the conflict.
Ideally we would handle this with the same move-cost logic we use for
conflicts detected during backtracking, but the requirements-related
splitting happens separately and that logic would need to be generalized
further. For now, this is sufficient to eliminate redundant moves as
seen in e.g. bytecodealliance/wasmtime#3785.
Fixes #48.