Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow async interruptions on safepoints #95565

Merged
merged 44 commits into from
May 6, 2024
Merged

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Dec 4, 2023

TL;DR

== What is going on here:

Currently we have roughly two kinds of methods - fully and partially interruptible. We can initiate stackwalks in fully interruptible methods as well as can stackwalk through them. Partially interruptible allow only stackwalk through them as only call-return sites have info about GC content of stack and nonvolatile registers (and volatile ones are conveniently dead at return sites).
The advantage of partially interruptible methods is that they carry less info and thus everything that can be partially interruptible is emitted as such. The ratio of partially/fully interuptible varies, but generally partially interruptible is in vast majority.

A good question to ask - can we initiate a stack walk when stopped on a call-return site in a partially interruptible code?
The current answer is "almost". We already store the info about nonvolatile registers and stack locations for these sites, so in theory we have enough information to start stackwalk. Almost enough.
The only piece that is missing is the content of return registers. Unlike the case of walking through virtually unwound callers, when return registers are not live yet, in the case of actual return, they contain return values, and if those are object refs, they must be reported to GC. Once GC info could tell if return registers are live after a call and interesting to GC, the call-return sites become interruptible.

== What is gained:

  1. If we find a thread on an interruptible point we do not need to hijack/release and wait for the thread to self-suspend when eventually returning from the call - we are done! We've already caught the thread where we want it to be.
  2. This would add a lot of additional interruptibility points in a program, which would improve robustness of suspension.

#2 is actually more interesting. There are known cases that present a challenge to suspension. One of them is a "short call in a loop". This happens when we have a relatively tight loop that perform call(s) to short methods. At compile time we only know that there are calls, thus we do not put GC polls in the loop. As a result we end up with a loop that can be suspended only when we manage to catch the thread at a few instructions in the calee.
Superscalar multiple-issue CPUs can process multiple instructions at a time, which makes this situation worse as that effectively reduces the number of possible hijackable points in the calee. We may see the IP of a stopped thread at the call-return site way more often than inside the calee's body.

== Posible further improvements:

Once we have this, hijacking could be simplified as well.
Currently, while installing a hijack, we need to consult the GC info about the GC-refness of contained method's returns and stash that information in a transition frame, so that, if hijack is hit, the stack crawler could know if/how report the return registers.

All this special casing would be unnecessary as we could put synchronous interrupts (hijacked case) and asynchronous interrupted case (signals or OS suspension) on the same plan with respect to GC reporting. - just ask the decoder to report volatile registers for the leaf frame, of which only returns would be live at return site - same as what happens in asynchronous case. The difference would be only in how interruption happened and how registerset display was formed (i.e pushed by hijack probe or by the OS).

A sizable chunk of architecture and platform-specific c++/asm code that deals with return flags could be gone. Like a good part of this file:

// All values but GCRK_Unknown must correspond to MethodReturnKind enumeration in gcinfo.h
enum GCRefKind : unsigned char
{
GCRK_Scalar = 0x00,
GCRK_Object = 0x01,
GCRK_Byref = 0x02,
#ifdef TARGET_64BIT
// Composite return kinds for value types returned in two registers (encoded with two bits per register)
GCRK_Scalar_Obj = (GCRK_Object << 2) | GCRK_Scalar,
GCRK_Obj_Obj = (GCRK_Object << 2) | GCRK_Object,
GCRK_Byref_Obj = (GCRK_Object << 2) | GCRK_Byref,
GCRK_Scalar_Byref = (GCRK_Byref << 2) | GCRK_Scalar,
GCRK_Obj_Byref = (GCRK_Byref << 2) | GCRK_Object,
GCRK_Byref_Byref = (GCRK_Byref << 2) | GCRK_Byref,

Also we may be able to drop the return kind bits in the GC info entirely, I think there are no other uses.

@ghost
Copy link

ghost commented Dec 4, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details
Author: VSadov
Assignees: VSadov
Labels:

area-NativeAOT-coreclr

Milestone: -

@dotnet dotnet deleted a comment from azure-pipelines bot Dec 5, 2023
@dotnet dotnet deleted a comment from azure-pipelines bot Dec 5, 2023
@VSadov VSadov force-pushed the spInterrupt branch 2 times, most recently from c10522c to d12c536 Compare December 8, 2023 19:33
@dotnet dotnet deleted a comment from azure-pipelines bot Dec 8, 2023
@dotnet dotnet deleted a comment from azure-pipelines bot Dec 9, 2023
@VSadov VSadov force-pushed the spInterrupt branch 3 times, most recently from 2c94646 to d569006 Compare December 28, 2023 22:17
@ryujit-bot
Copy link

Diff results for #95565

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.12% to +0.38%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.14%
benchmarks.run_pgo.linux.arm64.checked.mch +0.12%
benchmarks.run_tiered.linux.arm64.checked.mch +0.27%
coreclr_tests.run.linux.arm64.checked.mch +0.28%
libraries.crossgen2.linux.arm64.checked.mch +0.38%
libraries.pmi.linux.arm64.checked.mch +0.22%
libraries_tests.run.linux.arm64.Release.mch +0.28%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.30%
realworld.run.linux.arm64.checked.mch +0.22%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.20%
MinOpts (+0.02% to +1.26%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.80%
benchmarks.run_pgo.linux.arm64.checked.mch +0.44%
benchmarks.run_tiered.linux.arm64.checked.mch +0.41%
coreclr_tests.run.linux.arm64.checked.mch +0.37%
libraries.crossgen2.linux.arm64.checked.mch +0.12%
libraries.pmi.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.78%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +1.26%
realworld.run.linux.arm64.checked.mch +0.57%
smoke_tests.nativeaot.linux.arm64.checked.mch +1.22%
FullOpts (+0.08% to +0.38%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.14%
benchmarks.run_pgo.linux.arm64.checked.mch +0.08%
benchmarks.run_tiered.linux.arm64.checked.mch +0.09%
coreclr_tests.run.linux.arm64.checked.mch +0.22%
libraries.crossgen2.linux.arm64.checked.mch +0.38%
libraries.pmi.linux.arm64.checked.mch +0.22%
libraries_tests.run.linux.arm64.Release.mch +0.10%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.27%
realworld.run.linux.arm64.checked.mch +0.22%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.20%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.12% to +0.35%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.14%
benchmarks.run_pgo.linux.x64.checked.mch +0.12%
benchmarks.run_tiered.linux.x64.checked.mch +0.35%
coreclr_tests.run.linux.x64.checked.mch +0.28%
libraries.crossgen2.linux.x64.checked.mch +0.35%
libraries.pmi.linux.x64.checked.mch +0.21%
libraries_tests.run.linux.x64.Release.mch +0.28%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.29%
realworld.run.linux.x64.checked.mch +0.22%
smoke_tests.nativeaot.linux.x64.checked.mch +0.16%
MinOpts (+0.02% to +1.36%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +1.36%
benchmarks.run_pgo.linux.x64.checked.mch +0.57%
benchmarks.run_tiered.linux.x64.checked.mch +0.68%
coreclr_tests.run.linux.x64.checked.mch +0.41%
libraries.crossgen2.linux.x64.checked.mch +0.15%
libraries.pmi.linux.x64.checked.mch +0.02%
libraries_tests.run.linux.x64.Release.mch +0.94%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.44%
realworld.run.linux.x64.checked.mch +0.72%
smoke_tests.nativeaot.linux.x64.checked.mch +1.25%
FullOpts (+0.07% to +0.35%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.13%
benchmarks.run_pgo.linux.x64.checked.mch +0.07%
benchmarks.run_tiered.linux.x64.checked.mch +0.08%
coreclr_tests.run.linux.x64.checked.mch +0.20%
libraries.crossgen2.linux.x64.checked.mch +0.35%
libraries.pmi.linux.x64.checked.mch +0.21%
libraries_tests.run.linux.x64.Release.mch +0.10%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.28%
realworld.run.linux.x64.checked.mch +0.21%
smoke_tests.nativeaot.linux.x64.checked.mch +0.16%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.15% to +0.38%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch +0.15%
benchmarks.run_tiered.osx.arm64.checked.mch +0.29%
coreclr_tests.run.osx.arm64.checked.mch +0.29%
libraries.crossgen2.osx.arm64.checked.mch +0.38%
libraries.pmi.osx.arm64.checked.mch +0.22%
libraries_tests.run.osx.arm64.Release.mch +0.33%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.30%
realworld.run.osx.arm64.checked.mch +0.22%
MinOpts (+0.01% to +1.29%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch +0.51%
benchmarks.run_tiered.osx.arm64.checked.mch +0.55%
coreclr_tests.run.osx.arm64.checked.mch +0.36%
libraries.crossgen2.osx.arm64.checked.mch +0.12%
libraries.pmi.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.77%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +1.29%
realworld.run.osx.arm64.checked.mch +0.57%
FullOpts (+0.05% to +0.38%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch +0.05%
benchmarks.run_tiered.osx.arm64.checked.mch +0.08%
coreclr_tests.run.osx.arm64.checked.mch +0.23%
libraries.crossgen2.osx.arm64.checked.mch +0.38%
libraries.pmi.osx.arm64.checked.mch +0.22%
libraries_tests.run.osx.arm64.Release.mch +0.11%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.28%
realworld.run.osx.arm64.checked.mch +0.22%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.15% to +0.38%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.15%
benchmarks.run_pgo.windows.arm64.checked.mch +0.15%
benchmarks.run_tiered.windows.arm64.checked.mch +0.29%
coreclr_tests.run.windows.arm64.checked.mch +0.28%
libraries.crossgen2.windows.arm64.checked.mch +0.38%
libraries.pmi.windows.arm64.checked.mch +0.22%
libraries_tests.run.windows.arm64.Release.mch +0.34%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.30%
realworld.run.windows.arm64.checked.mch +0.22%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.19%
MinOpts (+0.02% to +1.29%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.66%
benchmarks.run_pgo.windows.arm64.checked.mch +0.51%
benchmarks.run_tiered.windows.arm64.checked.mch +0.55%
coreclr_tests.run.windows.arm64.checked.mch +0.36%
libraries.crossgen2.windows.arm64.checked.mch +0.12%
libraries.pmi.windows.arm64.checked.mch +0.02%
libraries_tests.run.windows.arm64.Release.mch +0.78%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +1.29%
realworld.run.windows.arm64.checked.mch +0.57%
smoke_tests.nativeaot.windows.arm64.checked.mch +1.17%
FullOpts (+0.08% to +0.38%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.15%
benchmarks.run_pgo.windows.arm64.checked.mch +0.09%
benchmarks.run_tiered.windows.arm64.checked.mch +0.08%
coreclr_tests.run.windows.arm64.checked.mch +0.22%
libraries.crossgen2.windows.arm64.checked.mch +0.38%
libraries.pmi.windows.arm64.checked.mch +0.22%
libraries_tests.run.windows.arm64.Release.mch +0.11%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.27%
realworld.run.windows.arm64.checked.mch +0.22%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.19%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.13% to +0.36%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.13%
benchmarks.run.windows.x64.checked.mch +0.15%
benchmarks.run_pgo.windows.x64.checked.mch +0.14%
benchmarks.run_tiered.windows.x64.checked.mch +0.36%
coreclr_tests.run.windows.x64.checked.mch +0.28%
libraries.crossgen2.windows.x64.checked.mch +0.34%
libraries.pmi.windows.x64.checked.mch +0.19%
libraries_tests.run.windows.x64.Release.mch +0.33%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.26%
realworld.run.windows.x64.checked.mch +0.20%
smoke_tests.nativeaot.windows.x64.checked.mch +0.14%
MinOpts (+0.02% to +1.24%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.71%
benchmarks.run.windows.x64.checked.mch +1.09%
benchmarks.run_pgo.windows.x64.checked.mch +0.65%
benchmarks.run_tiered.windows.x64.checked.mch +0.73%
coreclr_tests.run.windows.x64.checked.mch +0.41%
libraries.crossgen2.windows.x64.checked.mch +0.15%
libraries.pmi.windows.x64.checked.mch +0.02%
libraries_tests.run.windows.x64.Release.mch +0.94%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.45%
realworld.run.windows.x64.checked.mch +0.70%
smoke_tests.nativeaot.windows.x64.checked.mch +1.24%
FullOpts (+0.06% to +0.34%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.07%
benchmarks.run.windows.x64.checked.mch +0.15%
benchmarks.run_pgo.windows.x64.checked.mch +0.06%
benchmarks.run_tiered.windows.x64.checked.mch +0.12%
coreclr_tests.run.windows.x64.checked.mch +0.20%
libraries.crossgen2.windows.x64.checked.mch +0.34%
libraries.pmi.windows.x64.checked.mch +0.19%
libraries_tests.run.windows.x64.Release.mch +0.10%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.26%
realworld.run.windows.x64.checked.mch +0.20%
smoke_tests.nativeaot.windows.x64.checked.mch +0.14%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (+0.14% to +0.42%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.14%
benchmarks.run_pgo.linux.arm.checked.mch +0.17%
benchmarks.run_tiered.linux.arm.checked.mch +0.23%
coreclr_tests.run.linux.arm.checked.mch +0.31%
libraries.crossgen2.linux.arm.checked.mch +0.42%
libraries.pmi.linux.arm.checked.mch +0.22%
libraries_tests.run.linux.arm.Release.mch +0.32%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.31%
realworld.run.linux.arm.checked.mch +0.18%
MinOpts (+0.02% to +1.66%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.90%
benchmarks.run_pgo.linux.arm.checked.mch +0.73%
benchmarks.run_tiered.linux.arm.checked.mch +0.76%
coreclr_tests.run.linux.arm.checked.mch +0.45%
libraries.crossgen2.linux.arm.checked.mch +0.14%
libraries.pmi.linux.arm.checked.mch +0.02%
libraries_tests.run.linux.arm.Release.mch +1.00%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +1.66%
realworld.run.linux.arm.checked.mch +0.71%
FullOpts (+0.10% to +0.42%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.14%
benchmarks.run_pgo.linux.arm.checked.mch +0.13%
benchmarks.run_tiered.linux.arm.checked.mch +0.10%
coreclr_tests.run.linux.arm.checked.mch +0.21%
libraries.crossgen2.linux.arm.checked.mch +0.42%
libraries.pmi.linux.arm.checked.mch +0.22%
libraries_tests.run.linux.arm.Release.mch +0.12%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.26%
realworld.run.linux.arm.checked.mch +0.18%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+0.11% to +0.33%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.19%
realworld.run.linux.arm64.checked.mch +0.20%
benchmarks.run.linux.arm64.checked.mch +0.13%
libraries.crossgen2.linux.arm64.checked.mch +0.33%
libraries_tests.run.linux.arm64.Release.mch +0.22%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.17%
benchmarks.run_tiered.linux.arm64.checked.mch +0.22%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.26%
coreclr_tests.run.linux.arm64.checked.mch +0.23%
benchmarks.run_pgo.linux.arm64.checked.mch +0.11%
MinOpts (+0.05% to +0.90%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.05%
realworld.run.linux.arm64.checked.mch +0.42%
benchmarks.run.linux.arm64.checked.mch +0.63%
libraries.crossgen2.linux.arm64.checked.mch +0.09%
libraries_tests.run.linux.arm64.Release.mch +0.58%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.90%
benchmarks.run_tiered.linux.arm64.checked.mch +0.33%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.81%
coreclr_tests.run.linux.arm64.checked.mch +0.26%
benchmarks.run_pgo.linux.arm64.checked.mch +0.35%
FullOpts (+0.07% to +0.33%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.19%
realworld.run.linux.arm64.checked.mch +0.19%
benchmarks.run.linux.arm64.checked.mch +0.12%
libraries.crossgen2.linux.arm64.checked.mch +0.33%
libraries_tests.run.linux.arm64.Release.mch +0.10%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.17%
benchmarks.run_tiered.linux.arm64.checked.mch +0.08%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.24%
coreclr_tests.run.linux.arm64.checked.mch +0.20%
benchmarks.run_pgo.linux.arm64.checked.mch +0.07%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.10% to +0.31%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.13%
realworld.run.linux.x64.checked.mch +0.19%
libraries.crossgen2.linux.x64.checked.mch +0.31%
benchmarks.run_tiered.linux.x64.checked.mch +0.27%
libraries_tests.run.linux.x64.Release.mch +0.23%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.26%
benchmarks.run_pgo.linux.x64.checked.mch +0.10%
libraries.pmi.linux.x64.checked.mch +0.19%
coreclr_tests.run.linux.x64.checked.mch +0.23%
smoke_tests.nativeaot.linux.x64.checked.mch +0.13%
MinOpts (+0.06% to +1.02%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +1.02%
realworld.run.linux.x64.checked.mch +0.54%
libraries.crossgen2.linux.x64.checked.mch +0.11%
benchmarks.run_tiered.linux.x64.checked.mch +0.53%
libraries_tests.run.linux.x64.Release.mch +0.71%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.33%
benchmarks.run_pgo.linux.x64.checked.mch +0.45%
libraries.pmi.linux.x64.checked.mch +0.06%
coreclr_tests.run.linux.x64.checked.mch +0.30%
smoke_tests.nativeaot.linux.x64.checked.mch +0.92%
FullOpts (+0.06% to +0.31%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.12%
realworld.run.linux.x64.checked.mch +0.19%
libraries.crossgen2.linux.x64.checked.mch +0.31%
benchmarks.run_tiered.linux.x64.checked.mch +0.07%
libraries_tests.run.linux.x64.Release.mch +0.09%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.25%
benchmarks.run_pgo.linux.x64.checked.mch +0.06%
libraries.pmi.linux.x64.checked.mch +0.19%
coreclr_tests.run.linux.x64.checked.mch +0.18%
smoke_tests.nativeaot.linux.x64.checked.mch +0.13%

Details here


@VSadov VSadov marked this pull request as ready for review January 27, 2024 01:05
@VSadov
Copy link
Member Author

VSadov commented Jan 27, 2024

I think this is ready for a discussion.

@jkotas
Copy link
Member

jkotas commented Jan 27, 2024

  • Why is this specific to native AOT? It should be equally applicable to CoreCLR too.
  • Is this a breaking change in the GCInfo format or semantics?

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 27, 2024
@ghost
Copy link

ghost commented Jan 27, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

TL;DR

== What is going on here:

Currently we have roughly two kinds of methods - fully and partially interruptible. We can initiate stackwalks in fully interruptible methods as well as can stackwalk through them. Partially interruptible allow only stackwalk through them as only call-return sites have info about the GC content of nonvolatile registers (and volatile ones are conveniently dead at return sites).
The advantage of partially interruptible methods is that they carry less info and thus everything that can be partially interruptible is emitted as such. The ratio of partially/fully interuptible varies, but generally partially interruptible is in vast majority.

A good question to ask - can we initiate a stack walk when stopped on a call-return site in a partially interruptible code?
The current answer is "almost". We already store the info about nonvolatile registers at these sites, so in theory we have enough information to start stackwalk. Almost enough.
The only piece that is missing is the content of return registers. Unlike the case of walking through virtually unwound callers, when return registers are not live yet, in the case of actual return, they contain return values, and if those are object refs, they must be reported to GC. Once GC info could tell if return registers are live after a call, the call-return sites become interruptible.

== What is gained:

  1. If we find ourselves on an interruptible point we do not need to hijack and wait for the thread to take the bait - we are done! We've already caught the thread where we want it to be.
  2. This would add a lot of additional interruptibility points in a program, which would improve robustness of suspension.

#2 is actually more interesting. There are known cases that present challenge to suspension. One of them is a "short call in a loop". This happens when we have a relatively tight loop that perform call(s) to short methods. At compile time we only know that there are calls, thus we do not put GC polls in the loop. As a result we end up with a loop that can be suspended only when/if we manage to catch the thread at a few instructions in the calee.
Superscalar multiple-issue CPUs can process multiple instructions at a time, which makes this situation worse as that effectively reduces the number of possible hijackable points in the calee. We may see the thread IP of a stopped thread at the call-return site way more often than inside the calee's body.

== Posible further improvements:

Once we have this, hijacking could be simplified as well.
Currently, while installing a hijack, we need to consult the GC info about the GC-refness of contained method's returns and stash that information in a transition frame, so that, if hijack is hit, the stack crawler could know if/how report the return registers.

All this special casing would be unnecessary as we could put synchronous interrupts (hijacked case) and asynchronous interrupted case (signals or OS suspension) on the same plan with respect to GC reporting. - just ask the decoder to report volatile registers, of which only returns would be live at return site - same as what happens in asynchronous case. The difference would be only in how interruption happened and how registerset display was formed (i.e pushed by hijack probe or by the OS).

A sizable chunk of architecture and platform-specific c++/asm code that deals with return flags could be gone. Like a good part of this file:

// All values but GCRK_Unknown must correspond to MethodReturnKind enumeration in gcinfo.h
enum GCRefKind : unsigned char
{
GCRK_Scalar = 0x00,
GCRK_Object = 0x01,
GCRK_Byref = 0x02,
#ifdef TARGET_64BIT
// Composite return kinds for value types returned in two registers (encoded with two bits per register)
GCRK_Scalar_Obj = (GCRK_Object << 2) | GCRK_Scalar,
GCRK_Obj_Obj = (GCRK_Object << 2) | GCRK_Object,
GCRK_Byref_Obj = (GCRK_Object << 2) | GCRK_Byref,
GCRK_Scalar_Byref = (GCRK_Byref << 2) | GCRK_Scalar,
GCRK_Obj_Byref = (GCRK_Byref << 2) | GCRK_Object,
GCRK_Byref_Byref = (GCRK_Byref << 2) | GCRK_Byref,

Also we may be able to drop the return kind bits in the GC info entirely, I think there are no other uses.

Author: VSadov
Assignees: VSadov
Labels:

area-CodeGen-coreclr, area-NativeAOT-coreclr

Milestone: -

@VSadov
Copy link
Member Author

VSadov commented Jan 27, 2024

The reason why this is NativeAOT is just because NativeAOT is more straightforward in this area and as such more approachable to experimenting.
This will work for CoreCLR as well, but I figured - should do one step at a time.

I did experiment with CoreCLR locally. It worked and passed libraries tests, but I was not confident enough to include those changes in the same PR.
It is not a lot of additional changes, but more scattered than I'd like, so I was unsure about completeness of the change.

@VSadov
Copy link
Member Author

VSadov commented May 5, 2024

rebased onto recent main, resolved conflicts

@AndyAyersMS
Copy link
Member

Seems like Bruce is out for a while longer still.

@kunalspathak are you going to review?

@kunalspathak
Copy link
Member

Seems like Bruce is out for a while longer still.

@kunalspathak are you going to review?

I have asked @jakobbotsch last week to take a look.

@VSadov
Copy link
Member Author

VSadov commented May 6, 2024

I have asked @jakobbotsch last week to take a look.

@jakobbotsch looked through the changes after that and signed off. (Thanks, Jakob!!)
So I wonder if we wait for someone else or good to go.

@AndyAyersMS
Copy link
Member

I think we are good.

@VSadov
Copy link
Member Author

VSadov commented May 6, 2024

I will update the JIT Guid and proceed to merging this, if nothing comes up.

@VSadov
Copy link
Member Author

VSadov commented May 6, 2024

Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.cmd has failed is #97735

The rest is green.

@VSadov VSadov merged commit 8608181 into dotnet:main May 6, 2024
104 of 106 checks passed
@VSadov VSadov deleted the spInterrupt branch May 6, 2024 23:12
@VSadov
Copy link
Member Author

VSadov commented May 6, 2024

Thanks everybody for reviewing and helping with this!!!

michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
* allow async interruptions on safepoints

* ARM64 TODO

* report GC ref/byref returns at partially interruptible callsites

* enable on all platforms

* tweak

* fix after rebasing

* do not record tailcalls

* IsInterruptibleSafePoint

* update gccover

* turn on new behavior on a gcinfo version

* tailcalls tweak

* do not report unused returns

* CORINFO_HELP_FAIL_FAST  should not be a safepoint

* treat tailcalls as emitNoGChelper

* versioning tweak

* enable in CoreCLR (not just for GC stress scenarios)

* fix x86 build

* other architectures

* added a knob DOTNET_InterruptibleCallSites

* moved DOTNET_InterruptibleCallSites check to the code manager

* JIT_StackProbe should not be a safepoint (stack is not cleaned yet)

* Hooked up GCInfo version to R2R file version

* formatting

* GCStress support for RISC architectures

* Update src/coreclr/inc/gcinfo.h

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

* make InterruptibleSafePointsEnabled static

* fix linux-x86 build.

* ARM32 actually can`t return 2 GC references, so can filter out R1 early

* revert unnecessary change

* Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs

Co-authored-by: Filip Navara <filip.navara@gmail.com>

* removed GCINFO_VERSION cons from GcInfo.cs

* Use RBM_INTRET/RBM_INTRET_1

* Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

* do not skip safe points twice (stress failure)

* revert unnecessary change in gccover.

* fix after rebase

* make sure to check `idIsNoGC` on all codepaths in `emitOutputInstr`

* make CORINFO_HELP_CHECK_OBJ a no-gc helper (because we can)

* mark a test that tests WaitForPendingFinalizers as GCStressIncompatible

* NOP

* these helpers should not form GC safe points

* require that the new block has BBF_HAS_LABEL

* Apply suggestions from code review

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

* updated JITEEVersionIdentifier GUID

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Filip Navara <filip.navara@gmail.com>
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@VSadov
Copy link
Member Author

VSadov commented May 11, 2024

For the final effects on suspension robustness and outliers in CoreCLR from this change + from ported NativeAOT thread suspension algorithm (Re: #101782 )

I have tried the current bits from the main branch with the same repro as used above (#95565 (comment))

The benchmark prints out GC pauses in milliseconds. Smaller is better.

CoreCLR on Windows 10
(AMD Ryzen 9 7950X)

Before this PR and #101782 we saw multi-minute pauses as measured in #95565 (comment)

0
26
15
31
47
46
47
0
30
96160 I guess the code got tiered up here.
1175
2556
18854
351623
96504
623567
50948
295441
9274
174658

With bits from current main I see:

0
0
0
0
0
0
0
0
0
0
0
0
0
0

The suspension happens in sub-millisecond range and thus below the sensitivity of the benchmark.

Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
* allow async interruptions on safepoints

* ARM64 TODO

* report GC ref/byref returns at partially interruptible callsites

* enable on all platforms

* tweak

* fix after rebasing

* do not record tailcalls

* IsInterruptibleSafePoint

* update gccover

* turn on new behavior on a gcinfo version

* tailcalls tweak

* do not report unused returns

* CORINFO_HELP_FAIL_FAST  should not be a safepoint

* treat tailcalls as emitNoGChelper

* versioning tweak

* enable in CoreCLR (not just for GC stress scenarios)

* fix x86 build

* other architectures

* added a knob DOTNET_InterruptibleCallSites

* moved DOTNET_InterruptibleCallSites check to the code manager

* JIT_StackProbe should not be a safepoint (stack is not cleaned yet)

* Hooked up GCInfo version to R2R file version

* formatting

* GCStress support for RISC architectures

* Update src/coreclr/inc/gcinfo.h

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

* make InterruptibleSafePointsEnabled static

* fix linux-x86 build.

* ARM32 actually can`t return 2 GC references, so can filter out R1 early

* revert unnecessary change

* Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs

Co-authored-by: Filip Navara <filip.navara@gmail.com>

* removed GCINFO_VERSION cons from GcInfo.cs

* Use RBM_INTRET/RBM_INTRET_1

* Update src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64/GcInfo.cs

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

* do not skip safe points twice (stress failure)

* revert unnecessary change in gccover.

* fix after rebase

* make sure to check `idIsNoGC` on all codepaths in `emitOutputInstr`

* make CORINFO_HELP_CHECK_OBJ a no-gc helper (because we can)

* mark a test that tests WaitForPendingFinalizers as GCStressIncompatible

* NOP

* these helpers should not form GC safe points

* require that the new block has BBF_HAS_LABEL

* Apply suggestions from code review

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

* updated JITEEVersionIdentifier GUID

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Filip Navara <filip.navara@gmail.com>
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@github-actions github-actions bot locked and limited conversation to collaborators Jun 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants