Interpreter EH support in the runtime #114649

janvorli · 2025-04-14T18:59:43Z

This change adds support for exception handling for cases when interpreter frames either throw an exception or the exception is propagated through them. It doesn't add all the exception handling support to the interpreter itself, there is a follow up PR that contains that code.

I had to reorder the code around handling thread abort when resuming after catch for the case when the resume occurs in the interpreted code. The resuming goes to the native context of the InterpExecMethod in that case, so we need to update the current context in the REGDISPLAY to point there before we extract the dwResumePC. The InterpreterFrame stores copies of the registers that we reuse for interpreted frames stack walking and restores them before resuming there and at the same time stores the interpreter PC and SP for the resuming is stored in the InterpreterFrame

Copilot

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

src/coreclr/interpreter/intops.def: Language not supported

Copilot · 2025-04-14T19:00:12Z

src/coreclr/vm/arm64/cgencpu.h

+inline void SetFirstArgReg(CONTEXT *context, TADDR value)
+{
+    LIMITED_METHOD_DAC_CONTRACT;
+    SetReg(context, 0, reg);


Undefined variable 'reg' is used in SetFirstArgReg; it should use the parameter 'value' instead.

Suggested change

SetReg(context, 0, reg);

SetReg(context, 0, value);

src/coreclr/vm/interpexec.cpp

src/coreclr/vm/exceptionhandling.cpp

BrzVlad · 2025-04-17T10:08:51Z

src/coreclr/vm/interpexec.cpp

+
+                TADDR resumeSP;
+                TADDR resumeIP;
+                pInterpreterFrame->GetAndClearResumeContext(&resumeSP, &resumeIP);


Could we have all this throwing logic extracted in a separate method (like mono's interp_throw) ? There will be tons of places where we will throw exceptions as part of different opcodes, like null checks, ovf checks etc.

I would expect that exceptions from things like null checks are going to be thrown using COMPlusThrow, e.g. COMPlusThrow(kNullReferenceException), just like exceptions are thrown everywhere else in the VM. C++ exception is then going to caught by try/catch that's wraps the interpreter execution loop and sent for managed exception processing from there. @janvorli Does this sound right?

Those scenarios are logically identical to INTOP_THROW so I would say they should be handled the same way. You could implement CEE_LDIND for example either as a single INTOP_LDIND opcode that does the null check as part of the opcode and throws the exception or as multiple granular opcodes: ldnull + bne; ldnull + throw; ldind_unsafe;. Also I'm not sure why we would want to throw an additional c++ exception when we could do the managed exception dispatch directly.

Also resuming the context from the c++ catch handler back inside the try (since execution might resume in those interpreter frames) seems dubious. It also seems unclear whether wasm would work in this configuration, given longjmp is probably implemented via an exception throw.

The interpret needs to able to deal with C++ exceptions thrown in the rest of the VM.. For example, https://github.com/dotnet/runtime/pull/114529/files#diff-13492d6b1897666f5322f30e5f2ae4a9b35f084ce84d7f78827541070cc57329R990 can throw C++ exception that needs to be converted to managed exception and handled as managed exception. How is it going to be done? Is there a viable alternative to resuming from the C++ catch handler?

I think that the exceptions like NullReferenceException thrown by the interpret are more like C++ exceptions thrown in the rest of the VM, so we may want to use the same mechanism for them. It follows what we do with regular JIT. For example, calling an interface method on null pointer can end up here:

runtime/src/coreclr/vm/virtualcallstub.cpp

Line 1715 in b951b3d

COMPlusThrow(kNullReferenceException);

. We throw C++ exception that gets converted to managed NullReferenceException like any other C++ exception thrown by the VM. In theory, we can avoid the C++ exception and throw managed NullReferenceException directly, but it would be more complicated for no good reason.

The way I imagined solving this problem would be to wrap calls into the runtime that might throw with a try/catch. For example:

try { pMD = pMD->GetMethodDescOfVirtualizedCode(pThisArg, pMD->GetMethodTable()); } catch (ex) { ex_managed = convert_to_managed (ex); } if (ex_managed) { ThrowManagedEx(ex_managed); // resume to the right interp frame here }

And for all exceptions produced by interpreter code (that we know exactly), we would directly use ThrowManagedEx. Seems like this approach would be the simplest, I don't expect a ton of callsites where we would need to do this.

With respect to the current implementation (i.e. not the discussion with C++ try/catch), I have found a bug with that.

By the time we resume into the interpreter to execute the logic after the catch block, the right context to restore into is not the same as the context capture during throw.

In this example, things will fail.

try { try { throw null; } catch { throw null; } } catch { } // During the second resume, I am about to resume to here ...

During the second throw, the caller has the following stack (callee on top, caller at the bottom convention)

DispatchException InterpExecMethod calling funclet // The second throw captured this context DispatchException InterpExecMethod calling function

When the second catch resumes, we really shouldn't have the CallFunclet on the native stack.

Logically, the context to restore should be the context captured on the native frame that hosted with the catch, not with the throw.

jkotas · 2025-04-22T06:16:39Z

The way I imagined solving this problem would be to wrap calls into the runtime that might throw with a try/catch.

it would be useful to demonstrate that this works on one callsite as part of this PR.

cshung · 2025-04-22T17:32:05Z

it would be useful to demonstrate that this works on one callsite as part of this PR.

I have been working with @janvorli on exception handling for the interpreter. On my side, building on top of this change, we know that resuming from InterpreterCodeManager::CallFunclet works in various cases, at least on Windows where I routinely work with.

main...cshung:runtime:public/interpreter-exception

Note, it is still a work in progress. There are known scenario that doesn't work yet.

janvorli · 2025-04-22T18:17:13Z

it would be useful to demonstrate that this works on one callsite as part of this PR.

As Andrew said, without his changes, there is not much we can demonstrate here. With his changes, we can do that.

This change adds support for exception handling for cases when interpreter frames participate in the process. That means when an exception is thrown from an interpreter frame or when it is propagated over interpreter frames. This doesn't add all the exception handling support to the interpreter itself, there is a follow up PR that contains that code.

ifdef-out setting m_SSP in the InterpreterFrame

We cannot try to get the code manager from a crawl frame when it is not on a frameless frame.

This commit moves the resuming after catch to using native exception handling instead of fragile context capturing, which was not correct anyways. It also adds handling of exceptions comming out of native runtime methods called from the interpreter.

janvorli · 2025-04-28T23:38:44Z

I've just added a commit that moves the resuming after catch to a helper native exception propagation plan. When resuming in an interpreted frame, we virtually unwind from the current context upto the first native frame that is in the same block of native frames as the InterpExecMethod of the target frame and then throw the special exception from there.
I've also added cleanup of the localloc stuff during EH.
Besides that, I've added handling of exceptions stemming from native runtime calls from the interpreter.
I had to update the GCFrame so that when it is popped from the chain, it destructor doesn't do anything. To do that, I've changed its "next" pointer value indicating the end of list to the same plan as the Frame derived frames have. That means that NULL indicates the frame is not on the list at all.

src/coreclr/vm/interpexec.cpp

jkotas · 2025-04-29T00:08:38Z

I had to update the GCFrame so that when it is popped from the chain, it destructor doesn't do anything.

Why do we need that? I would expect that the core interpreter won't need to call GC protect, and any (throwing) calls to the VM would need to be behind unwind-and-continue wrapper (like Vlad suggested in #114649 (comment)) that that will do natural C++ uwnwind.

src/coreclr/vm/stackwalk.cpp

src/coreclr/vm/interpexec.cpp

janvorli · 2025-04-29T12:46:37Z

Why do we need that? I would expect that the core interpreter won't need to call GC protect, and any (throwing) calls to the VM would need to be behind unwind-and-continue wrapper (like Vlad suggested in #114649 (comment)) that that will do natural C++ uwnwind.

@jkotas this is related to the lengthy comment above related to your question on the INSTALL_RESUME_AFTER_CATCH_HANDLER . This state of the change doesn't need it, it was another thing that I've moved accidentally to this change, as it was needed for the case when we do the "full" propagation over all intermediate native frames. However, it will be needed for WASM, because in that case, the exception will go through the DispatchManagedException and that one has GC_PROTECT stuff in it around the call to the managed EH code.

jkotas · 2025-04-29T16:00:55Z

However, it will be needed for WASM, because in that case, the exception will go through the DispatchManagedException and that one has GC_PROTECT stuff in it around the call to the managed EH code.

I would expect WASM exception handling to call the C++ destructor naturally whenever it unwinds a C++ frame. Is it not going to be the case?

jkotas · 2025-04-29T16:09:26Z

src/coreclr/vm/interpexec.cpp

-
-        switch (*ip)
+        INSTALL_MANAGED_EXCEPTION_DISPATCHER;
+        INSTALL_UNWIND_AND_CONTINUE_HANDLER;


Is INSTALL_UNWIND_AND_CONTINUE_HANDLER going to break exception filter semantics when we have mix of interpreted and JIIT/AOT frames on the stack?

I guess it works fine with filter. I do not understand why we need INSTALL_UNWIND_AND_CONTINUE_HANDLER here. What is going to break if INSTALL_UNWIND_AND_CONTINUE_HANDLER is deleted here?

That's for handling exceptions coming out of calls to native runtime. The INSTALL_MANAGED_EXCEPTION_DISPATCHER does nothing for Windows. Without it, those exceptions would just flow through.

Make sense. This means that the interpreter should be able to use regular COMPlusThrow to throw exceptions like I originally assumed in #114649 (comment)

janvorli · 2025-04-29T17:25:47Z

I would expect WASM exception handling to call the C++ destructor naturally whenever it unwinds a C++ frame. Is it not going to be the case?

Yes and that's why that change was added. Without it, the destructor of the GCFrame would try to pop it out of the frame list and it would assert, because it would find the frame is not the first on the list. The other way to solve that would be to not to pop the GCFrames upfront when resuming after catch and let the destructor pop them. However, that would make it different from when those frames are popped on WASM and non-WASM and I'd prefer this to be uniform.

jkotas · 2025-04-29T18:00:11Z

We have introduced a dedicated list for GC protect a while ago since popping them out-of-line as part Frame list was a mess with a lot of problems. It sounds like we are recreating the problem again. Is it possible to make this work without touching how GC protect frames work?

janvorli · 2025-04-29T18:05:58Z

I am not sure I understand. This change ensures that it works the same way with and without the interpreter. The frames with the GC protects are virtually dead.

jkotas · 2025-04-29T21:04:10Z

I am not sure I understand. This change ensures that it works the same way with and without the interpreter. The frames with the GC protects are virtually dead.

Ok, I have missed that we have special handling for GCFrames in PopExplicitFrames. My mental model for GCFrames was that they are ordinary C++ holders with destructors called by C++ runtime, but it is not actually the case.

Could you please add a comment to PopExplicitFrames about the GCFrames that require us to do the pop them explicitly? I assume that there are a very few of those, most of them are on the ordinary C++ holder plan.

src/coreclr/vm/frames.cpp

src/coreclr/vm/interpexec.cpp

jkotas · 2025-04-29T21:22:45Z

src/coreclr/vm/interpexec.cpp

-
-        switch (*ip)
+        INSTALL_MANAGED_EXCEPTION_DISPATCHER;
+        INSTALL_UNWIND_AND_CONTINUE_HANDLER;


Make sense. This means that the interpreter should be able to use regular COMPlusThrow to throw exceptions like I originally assumed in #114649 (comment)

src/coreclr/vm/exceptionhandling.cpp

src/coreclr/vm/stackwalk.cpp

janvorli · 2025-04-29T23:31:43Z

Make sense. This means that the interpreter should be able to use regular COMPlusThrow to throw exceptions like I originally assumed in #114649 (comment)

Yes, I've verified that it works already.

janvorli · 2025-04-30T00:12:13Z

Would it make sense to add some asserts to guarantee that nested or parallel stackwalks are not trying to store conflicting values into m_interpExec... fields?

I am actually going to do that differently, storing those in the stack walker state instead. The storage in the frame is a remainder of the first version when I was using that during restoring context after catch and I think that keeping it in that state didn't work for some reason.

* Move the saved registers to the StackFrameIterator * Fix MUSL build * Add {Get/Set}{First/Second}ArgumentRegister to all architectures and use those in ExecuteFunctionBelowContext.

janvorli · 2025-04-30T13:47:22Z

@jkotas I believe I have addressed all of your feedback.

jkotas · 2025-04-30T23:01:43Z

src/coreclr/vm/codeman.cpp

+        EE_ILEXCEPTION_CLAUSE EHClause;
+        pJitMan->GetNextEHClause(&EnumState, &EHClause);
+
+        if (EHClause.HandlerStartPC <= relOffset && relOffset < EHClause.HandlerEndPC)


Does this work for filters?

I am not sure whether you wanted to push this into this PR since there is no matching logic in the JIT, so it is hard to tell whether it is correct.

Yes, it works for filters too. The interpreter compiler sets the HandlerEndPC even for filters.

I am not sure whether you wanted to push this into this PR since there is no matching logic in the JIT, so it is hard to tell whether it is correct.

Makes sense, let's make it part of Andrew's part of the change instead.

Hmm, actually, even JIT sets the HandlerEndPC for filters

The problem that I see is that the filter funclet is a separate funclet that is not included in the (HandlerStartPC, HandledEndPC) range, with the JIT at least. I assume that you want to use the same encoding rules for interpreter as well.

I have done a quick test.

try { Console.WriteLine("Try"); } catch (Exception e) when (args.Length > 10) { Console.WriteLine("Catch"); }

Produces

Notice that FilterOffset is not included in (HandlerStartPC, HandledEndPC).

Ah, I am sorry, you are right, the ranges I am scanning contain the catch handler for the filter, not the filter itself. That's the same thing for the interpreter then.

We will need to figure out some other mechanism for getting the end address of filter funclets. The JIT uses unwind info stuff, which we don't have.

You can add a rule that the filter funclet must immediately precede the handler. (This rule would apply to interpreter JIT only.)

Thank you, that sounds like a reasonable solution.

jkotas · 2025-05-01T00:14:28Z

The failures look related to the change.

janvorli · 2025-05-02T14:42:27Z

The failures look related to the change.

These were interesting. I was missing FEATURE_INTERPRETER definition for the managed code. The offset verification was passing, because that's done in native build, but the managed code was using wrong offsets. I'll push a fix in a minute.

This reverts commit d967861.

janvorli · 2025-05-02T15:00:20Z

I've fixed the issue causing the tests failures and reverted the GetFuncletStartAddress change

janvorli added the area-ExceptionHandling-coreclr label Apr 14, 2025

janvorli requested a review from cshung April 14, 2025 18:59

janvorli self-assigned this Apr 14, 2025

Copilot AI review requested due to automatic review settings April 14, 2025 18:59

janvorli requested review from BrzVlad and kg as code owners April 14, 2025 18:59

janvorli requested a review from jkotas April 14, 2025 19:00

Copilot AI reviewed Apr 14, 2025

View reviewed changes

janvorli force-pushed the add-interpreter-eh-support branch from 2a3de0d to 9e28998 Compare April 14, 2025 19:10

BrzVlad reviewed Apr 15, 2025

View reviewed changes

src/coreclr/vm/interpexec.cpp Outdated Show resolved Hide resolved

This was referenced Apr 15, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

jkotas reviewed Apr 16, 2025

View reviewed changes

src/coreclr/vm/exceptionhandling.cpp Outdated Show resolved Hide resolved

BrzVlad reviewed Apr 17, 2025

View reviewed changes

janvorli added 7 commits April 28, 2025 21:17

Fix arm64 build

308b731

Fix few issues

048b32f

Fix non-windows amd64 builds

fe14067

ifdef-out setting m_SSP in the InterpreterFrame

Fix x86, arm and arm64 build break

86f31f3

Fix incorrect assert

be6fae0

We cannot try to get the code manager from a crawl frame when it is not on a frameless frame.

janvorli force-pushed the add-interpreter-eh-support branch from c5e9b79 to d0a342d Compare April 28, 2025 23:31

jkotas reviewed Apr 28, 2025

View reviewed changes

src/coreclr/vm/interpexec.cpp Outdated Show resolved Hide resolved

jkotas reviewed Apr 28, 2025

View reviewed changes

src/coreclr/vm/interpexec.cpp Outdated Show resolved Hide resolved

jkotas reviewed Apr 28, 2025

View reviewed changes

src/coreclr/vm/interpexec.cpp Outdated Show resolved Hide resolved

jkotas reviewed Apr 29, 2025

View reviewed changes

src/coreclr/vm/stackwalk.cpp Show resolved Hide resolved

BrzVlad reviewed Apr 29, 2025

View reviewed changes

src/coreclr/vm/interpexec.cpp Outdated Show resolved Hide resolved

Remove some unneeded stuff and fix Unix build

14e36a8

jkotas reviewed Apr 29, 2025

View reviewed changes

Move the saved InterpExecMethod context regs

d78f1df

* Move the saved registers to the StackFrameIterator * Fix MUSL build * Add {Get/Set}{First/Second}ArgumentRegister to all architectures and use those in ExecuteFunctionBelowContext.

janvorli added 4 commits April 30, 2025 22:37

Few fixes

e226678

Fix builds with disabled interpreter

145a52e

One more MUSL build fix

c560757

Implement proper GetFuncletStartAddress for interpreter

d967861

jkotas reviewed Apr 30, 2025

View reviewed changes

janvorli added 2 commits May 2, 2025 16:57

Fix problem with missing managed FEATURE_INTERPRETER define

6e5dbff

Revert "Implement proper GetFuncletStartAddress for interpreter"

9e1603a

This reverts commit d967861.

jkotas approved these changes May 2, 2025

View reviewed changes

janvorli merged commit cec44d6 into dotnet:main May 2, 2025
95 checks passed

github-actions bot locked and limited conversation to collaborators Jun 2, 2025

Interpreter EH support in the runtime #114649

Interpreter EH support in the runtime #114649

Uh oh!

Conversation

janvorli commented Apr 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BrzVlad Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BrzVlad Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cshung Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas commented Apr 22, 2025

Uh oh!

cshung commented Apr 22, 2025

Uh oh!

janvorli commented Apr 22, 2025

Uh oh!

janvorli commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkotas commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janvorli commented Apr 29, 2025

Uh oh!

jkotas commented Apr 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janvorli commented Apr 29, 2025

Uh oh!

jkotas commented Apr 29, 2025

Uh oh!

janvorli commented Apr 29, 2025

Uh oh!

jkotas commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

janvorli commented Apr 29, 2025

Uh oh!

janvorli commented Apr 30, 2025

Uh oh!

janvorli commented Apr 30, 2025

BrzVlad Apr 17, 2025 •

edited

Loading

BrzVlad Apr 17, 2025 •

edited

Loading

cshung Apr 18, 2025 •

edited

Loading

jkotas commented Apr 29, 2025 •

edited

Loading

jkotas May 2, 2025 •

edited

Loading