Skip to content

Conversation

janvorli
Copy link
Member

Backport of #119863 to release/10.0

Customer Impact

  • Customer reported
  • Found internally

There were some GC holes discovered caused by the fact that GC can be triggered during 2nd pass of EH in-between calls to finally handlers and catch handler. After considering options, moving the 2nd pass to native code seems the most reasonable solution.
The issue was reported by the RavenDB folks in #110769. It causes access violation crash on their system on Windows once in one or two days on average.

Regression

  • Yes
  • No

Introduced in .NET 8 by #88034 - the new exception handling, but got reported in .NET 9 since that's where the new EH was enabled by default.
It took almost 9 months to figure out the culprit due to the very narrow window in which it can occur and due to the fact that GC holes in general don't necessarily result in an observed bad behavior. That's why it was fixed this late in the release cycle.

Testing

  • Repro test created based on the customer code structure with instrumented coreclr where GC is injected at a specific problematic spot. It fails before the fix with a GC hole and passes with the fix.
  • Diagnostic tests
  • coreclr / libraries tests

Risk

Low, the change is 1:1 port of the 2nd pass of the EH from C# to C++ and then removal of a number of extra handling in the stack frame iterator and GC reference enumeration that was only necessary to support the cases when GC could have been happening in-between calls to finallys. That's no longer possible with the fix.

@janvorli janvorli requested a review from jkotas September 30, 2025 17:14
@janvorli janvorli self-assigned this Sep 30, 2025
@Copilot Copilot AI review requested due to automatic review settings September 30, 2025 17:14
@janvorli janvorli added Servicing-consider Issue for next servicing release review area-ExceptionHandling-coreclr labels Sep 30, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR moves the CoreCLR exception handling second pass from managed C# code to native C++ code to fix GC holes that occurred when the garbage collector was triggered between finally handler and catch handler calls during the second pass. This change addresses a customer-reported access violation crash in RavenDB that occurred intermittently due to these GC holes.

Key changes:

  • Remove managed second pass dispatch code and move logic to native implementation
  • Eliminate complex GC reference tracking infrastructure that was needed to handle GC during managed EH code
  • Simplify stack walking by removing special handling for managed exception handling frames

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/stackwalk.h Remove GC reference tracking fields and methods for funclet handling
src/coreclr/vm/stackwalk.cpp Remove complex GC reference tracking logic during stack walking
src/coreclr/vm/qcallentrypoints.cpp Remove QCall entry points for managed funclet calls
src/coreclr/vm/metasig.h Remove method signature for managed unwind and intercept
src/coreclr/vm/gcenv.ee.common.cpp Remove saved funclet info GC reference reporting
src/coreclr/vm/exinfo.h Add new fields for catch handler info, remove saved funclet tracking
src/coreclr/vm/exinfo.cpp Remove saved funclet initialization
src/coreclr/vm/exceptionhandlingqcalls.h Remove QCall declarations for managed funclet calls
src/coreclr/vm/exceptionhandling.h Remove second pass funclet caller marker
src/coreclr/vm/exceptionhandling.cpp Add native second pass implementation, convert funclet callers to native
src/coreclr/vm/excep.cpp Add second pass calls and remove old intercept unwind code
src/coreclr/vm/corelib.h Remove managed unwind and intercept method definition
src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs Add conditional compilation for CoreCLR vs NativeAOT paths
src/coreclr/System.Private.CoreLib/src/System/Runtime/ExceptionServices/InternalCalls.cs Remove managed funclet call declarations
src/coreclr/System.Private.CoreLib/src/System/Runtime/ExceptionServices/AsmOffsets.cs Update structure sizes and add new ExInfo field offsets

@janvorli janvorli force-pushed the backport-move-eh-2nd-pass-to-native branch 3 times, most recently from b8ea280 to bf4fa55 Compare October 1, 2025 16:22
There were some GC holes discovered caused by the fact that GC can be
triggered during 2nd pass of EH in-between calls to finally handlers and
catch handler. After considering options, moving the 2nd pass to native
code seems the most reasonable solution.
@janvorli janvorli force-pushed the backport-move-eh-2nd-pass-to-native branch from bf4fa55 to 7fc9c86 Compare October 1, 2025 17:50
@janvorli
Copy link
Member Author

janvorli commented Oct 1, 2025

The interpreter test is failing because it would need another PR from main to work correctly with this changed EH. The reason is that resume after catch in the .NET 10 branch expects that the RhThrowEx and the like are on the stack. I've simplified / fixed that in main.
@agocke, @jkotas I am going to disable the interpreter test in .NET 10 branch so that I don't need to pull in extra changes that are not needed for .NET 10. The change was not trivial.

@janvorli janvorli force-pushed the backport-move-eh-2nd-pass-to-native branch from c666bc0 to 288916d Compare October 1, 2025 21:46
@rbhanda rbhanda added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 2, 2025
@rbhanda rbhanda added this to the 10.0.0 milestone Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants