Skip to content

Interpreter EH support in the runtime #114649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 2, 2025

Conversation

janvorli
Copy link
Member

This change adds support for exception handling for cases when interpreter frames either throw an exception or the exception is propagated through them. It doesn't add all the exception handling support to the interpreter itself, there is a follow up PR that contains that code.

I had to reorder the code around handling thread abort when resuming after catch for the case when the resume occurs in the interpreted code. The resuming goes to the native context of the InterpExecMethod in that case, so we need to update the current context in the REGDISPLAY to point there before we extract the dwResumePC. The InterpreterFrame stores copies of the registers that we reuse for interpreted frames stack walking and restores them before resuming there and at the same time stores the interpreter PC and SP for the resuming is stored in the InterpreterFrame

@janvorli janvorli requested a review from cshung April 14, 2025 18:59
@janvorli janvorli self-assigned this Apr 14, 2025
@Copilot Copilot AI review requested due to automatic review settings April 14, 2025 18:59
@janvorli janvorli requested review from BrzVlad and kg as code owners April 14, 2025 18:59
@janvorli janvorli requested a review from jkotas April 14, 2025 19:00
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • src/coreclr/interpreter/intops.def: Language not supported

inline void SetFirstArgReg(CONTEXT *context, TADDR value)
{
LIMITED_METHOD_DAC_CONTRACT;
SetReg(context, 0, reg);
Copy link
Preview

Copilot AI Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined variable 'reg' is used in SetFirstArgReg; it should use the parameter 'value' instead.

Suggested change
SetReg(context, 0, reg);
SetReg(context, 0, value);

Copilot uses AI. Check for mistakes.

@janvorli janvorli force-pushed the add-interpreter-eh-support branch from 2a3de0d to 9e28998 Compare April 14, 2025 19:10

TADDR resumeSP;
TADDR resumeIP;
pInterpreterFrame->GetAndClearResumeContext(&resumeSP, &resumeIP);
Copy link
Member

@BrzVlad BrzVlad Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have all this throwing logic extracted in a separate method (like mono's interp_throw) ? There will be tons of places where we will throw exceptions as part of different opcodes, like null checks, ovf checks etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that exceptions from things like null checks are going to be thrown using COMPlusThrow, e.g. COMPlusThrow(kNullReferenceException), just like exceptions are thrown everywhere else in the VM. C++ exception is then going to caught by try/catch that's wraps the interpreter execution loop and sent for managed exception processing from there. @janvorli Does this sound right?

Copy link
Member

@BrzVlad BrzVlad Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those scenarios are logically identical to INTOP_THROW so I would say they should be handled the same way. You could implement CEE_LDIND for example either as a single INTOP_LDIND opcode that does the null check as part of the opcode and throws the exception or as multiple granular opcodes: ldnull + bne; ldnull + throw; ldind_unsafe;. Also I'm not sure why we would want to throw an additional c++ exception when we could do the managed exception dispatch directly.

Also resuming the context from the c++ catch handler back inside the try (since execution might resume in those interpreter frames) seems dubious. It also seems unclear whether wasm would work in this configuration, given longjmp is probably implemented via an exception throw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpret needs to able to deal with C++ exceptions thrown in the rest of the VM.. For example, https://github.com/dotnet/runtime/pull/114529/files#diff-13492d6b1897666f5322f30e5f2ae4a9b35f084ce84d7f78827541070cc57329R990 can throw C++ exception that needs to be converted to managed exception and handled as managed exception. How is it going to be done? Is there a viable alternative to resuming from the C++ catch handler?

I think that the exceptions like NullReferenceException thrown by the interpret are more like C++ exceptions thrown in the rest of the VM, so we may want to use the same mechanism for them. It follows what we do with regular JIT. For example, calling an interface method on null pointer can end up here:

COMPlusThrow(kNullReferenceException);
. We throw C++ exception that gets converted to managed NullReferenceException like any other C++ exception thrown by the VM. In theory, we can avoid the C++ exception and throw managed NullReferenceException directly, but it would be more complicated for no good reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I imagined solving this problem would be to wrap calls into the runtime that might throw with a try/catch. For example:

try {
    pMD = pMD->GetMethodDescOfVirtualizedCode(pThisArg, pMD->GetMethodTable());
} catch (ex) {
    ex_managed = convert_to_managed (ex);
}
if (ex_managed) {
    ThrowManagedEx(ex_managed);
     // resume to the right interp frame here
}

And for all exceptions produced by interpreter code (that we know exactly), we would directly use ThrowManagedEx. Seems like this approach would be the simplest, I don't expect a ton of callsites where we would need to do this.

Copy link
Contributor

@cshung cshung Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to the current implementation (i.e. not the discussion with C++ try/catch), I have found a bug with that.

By the time we resume into the interpreter to execute the logic after the catch block, the right context to restore into is not the same as the context capture during throw.

In this example, things will fail.

try
{
  try
  {
    throw null;
  }
  catch
  {
    throw null;
  }
}
catch
{
}
// During the second resume, I am about to resume to here ...

During the second throw, the caller has the following stack (callee on top, caller at the bottom convention)

DispatchException
InterpExecMethod calling funclet    // The second throw captured this context
DispatchException
InterpExecMethod calling function

When the second catch resumes, we really shouldn't have the CallFunclet on the native stack.

Logically, the context to restore should be the context captured on the native frame that hosted with the catch, not with the throw.

@jkotas
Copy link
Member

jkotas commented Apr 22, 2025

The way I imagined solving this problem would be to wrap calls into the runtime that might throw with a try/catch.

it would be useful to demonstrate that this works on one callsite as part of this PR.

@cshung
Copy link
Contributor

cshung commented Apr 22, 2025

it would be useful to demonstrate that this works on one callsite as part of this PR.

I have been working with @janvorli on exception handling for the interpreter. On my side, building on top of this change, we know that resuming from InterpreterCodeManager::CallFunclet works in various cases, at least on Windows where I routinely work with.

main...cshung:runtime:public/interpreter-exception

Note, it is still a work in progress. There are known scenario that doesn't work yet.

@janvorli
Copy link
Member Author

it would be useful to demonstrate that this works on one callsite as part of this PR.

As Andrew said, without his changes, there is not much we can demonstrate here. With his changes, we can do that.

This change adds support for exception handling for cases when
interpreter frames participate in the process. That means when an
exception is thrown from an interpreter frame or when it is propagated
over interpreter frames. This doesn't add all the exception handling
support to the interpreter itself, there is a follow up PR that contains
that code.
ifdef-out setting m_SSP in the InterpreterFrame
We cannot try to get the code manager from a crawl frame when it is not
on a frameless frame.
This commit moves the resuming after catch to using native exception
handling instead of fragile context capturing, which was not correct
anyways.

It also adds handling of exceptions comming out of native runtime
methods called from the interpreter.
@janvorli janvorli force-pushed the add-interpreter-eh-support branch from c5e9b79 to d0a342d Compare April 28, 2025 23:31
@janvorli
Copy link
Member Author

I've just added a commit that moves the resuming after catch to a helper native exception propagation plan. When resuming in an interpreted frame, we virtually unwind from the current context upto the first native frame that is in the same block of native frames as the InterpExecMethod of the target frame and then throw the special exception from there.
I've also added cleanup of the localloc stuff during EH.
Besides that, I've added handling of exceptions stemming from native runtime calls from the interpreter.
I had to update the GCFrame so that when it is popped from the chain, it destructor doesn't do anything. To do that, I've changed its "next" pointer value indicating the end of list to the same plan as the Frame derived frames have. That means that NULL indicates the frame is not on the list at all.

@jkotas
Copy link
Member

jkotas commented Apr 29, 2025

I had to update the GCFrame so that when it is popped from the chain, it destructor doesn't do anything.

Why do we need that? I would expect that the core interpreter won't need to call GC protect, and any (throwing) calls to the VM would need to be behind unwind-and-continue wrapper (like Vlad suggested in #114649 (comment)) that that will do natural C++ uwnwind.

@janvorli
Copy link
Member Author

Why do we need that? I would expect that the core interpreter won't need to call GC protect, and any (throwing) calls to the VM would need to be behind unwind-and-continue wrapper (like Vlad suggested in #114649 (comment)) that that will do natural C++ uwnwind.

@jkotas this is related to the lengthy comment above related to your question on the INSTALL_RESUME_AFTER_CATCH_HANDLER . This state of the change doesn't need it, it was another thing that I've moved accidentally to this change, as it was needed for the case when we do the "full" propagation over all intermediate native frames. However, it will be needed for WASM, because in that case, the exception will go through the DispatchManagedException and that one has GC_PROTECT stuff in it around the call to the managed EH code.

@jkotas
Copy link
Member

jkotas commented Apr 29, 2025

However, it will be needed for WASM, because in that case, the exception will go through the DispatchManagedException and that one has GC_PROTECT stuff in it around the call to the managed EH code.

I would expect WASM exception handling to call the C++ destructor naturally whenever it unwinds a C++ frame. Is it not going to be the case?


switch (*ip)
INSTALL_MANAGED_EXCEPTION_DISPATCHER;
INSTALL_UNWIND_AND_CONTINUE_HANDLER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is INSTALL_UNWIND_AND_CONTINUE_HANDLER going to break exception filter semantics when we have mix of interpreted and JIIT/AOT frames on the stack?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it works fine with filter. I do not understand why we need INSTALL_UNWIND_AND_CONTINUE_HANDLER here. What is going to break if INSTALL_UNWIND_AND_CONTINUE_HANDLER is deleted here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for handling exceptions coming out of calls to native runtime. The INSTALL_MANAGED_EXCEPTION_DISPATCHER does nothing for Windows. Without it, those exceptions would just flow through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. This means that the interpreter should be able to use regular COMPlusThrow to throw exceptions like I originally assumed in #114649 (comment)

@janvorli
Copy link
Member Author

I would expect WASM exception handling to call the C++ destructor naturally whenever it unwinds a C++ frame. Is it not going to be the case?

Yes and that's why that change was added. Without it, the destructor of the GCFrame would try to pop it out of the frame list and it would assert, because it would find the frame is not the first on the list. The other way to solve that would be to not to pop the GCFrames upfront when resuming after catch and let the destructor pop them. However, that would make it different from when those frames are popped on WASM and non-WASM and I'd prefer this to be uniform.

@jkotas
Copy link
Member

jkotas commented Apr 29, 2025

We have introduced a dedicated list for GC protect a while ago since popping them out-of-line as part Frame list was a mess with a lot of problems. It sounds like we are recreating the problem again. Is it possible to make this work without touching how GC protect frames work?

@janvorli
Copy link
Member Author

I am not sure I understand. This change ensures that it works the same way with and without the interpreter. The frames with the GC protects are virtually dead.

@jkotas
Copy link
Member

jkotas commented Apr 29, 2025

I am not sure I understand. This change ensures that it works the same way with and without the interpreter. The frames with the GC protects are virtually dead.

Ok, I have missed that we have special handling for GCFrames in PopExplicitFrames. My mental model for GCFrames was that they are ordinary C++ holders with destructors called by C++ runtime, but it is not actually the case.

Could you please add a comment to PopExplicitFrames about the GCFrames that require us to do the pop them explicitly? I assume that there are a very few of those, most of them are on the ordinary C++ holder plan.


switch (*ip)
INSTALL_MANAGED_EXCEPTION_DISPATCHER;
INSTALL_UNWIND_AND_CONTINUE_HANDLER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. This means that the interpreter should be able to use regular COMPlusThrow to throw exceptions like I originally assumed in #114649 (comment)

@janvorli
Copy link
Member Author

Make sense. This means that the interpreter should be able to use regular COMPlusThrow to throw exceptions like I originally assumed in #114649 (comment)

Yes, I've verified that it works already.

@janvorli
Copy link
Member Author

Would it make sense to add some asserts to guarantee that nested or parallel stackwalks are not trying to store conflicting values into m_interpExec... fields?

I am actually going to do that differently, storing those in the stack walker state instead. The storage in the frame is a remainder of the first version when I was using that during restoring context after catch and I think that keeping it in that state didn't work for some reason.

* Move the saved registers to the StackFrameIterator
* Fix MUSL build
* Add {Get/Set}{First/Second}ArgumentRegister to all architectures and use
  those in ExecuteFunctionBelowContext.
@janvorli
Copy link
Member Author

@jkotas I believe I have addressed all of your feedback.

EE_ILEXCEPTION_CLAUSE EHClause;
pJitMan->GetNextEHClause(&EnumState, &EHClause);

if (EHClause.HandlerStartPC <= relOffset && relOffset < EHClause.HandlerEndPC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for filters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether you wanted to push this into this PR since there is no matching logic in the JIT, so it is hard to tell whether it is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works for filters too. The interpreter compiler sets the HandlerEndPC even for filters.

I am not sure whether you wanted to push this into this PR since there is no matching logic in the JIT, so it is hard to tell whether it is correct.

Makes sense, let's make it part of Andrew's part of the change instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually, even JIT sets the HandlerEndPC for filters

Copy link
Member

@jkotas jkotas May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem that I see is that the filter funclet is a separate funclet that is not included in the (HandlerStartPC, HandledEndPC) range, with the JIT at least. I assume that you want to use the same encoding rules for interpreter as well.

I have done a quick test.

try
{
    Console.WriteLine("Try");
}
catch (Exception e) when (args.Length > 10)
{
    Console.WriteLine("Catch");
}

Produces
image
Notice that FilterOffset is not included in (HandlerStartPC, HandledEndPC).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I am sorry, you are right, the ranges I am scanning contain the catch handler for the filter, not the filter itself. That's the same thing for the interpreter then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to figure out some other mechanism for getting the end address of filter funclets. The JIT uses unwind info stuff, which we don't have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a rule that the filter funclet must immediately precede the handler. (This rule would apply to interpreter JIT only.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that sounds like a reasonable solution.

@jkotas
Copy link
Member

jkotas commented May 1, 2025

The failures look related to the change.

@janvorli
Copy link
Member Author

janvorli commented May 2, 2025

The failures look related to the change.

These were interesting. I was missing FEATURE_INTERPRETER definition for the managed code. The offset verification was passing, because that's done in native build, but the managed code was using wrong offsets. I'll push a fix in a minute.

@janvorli
Copy link
Member Author

janvorli commented May 2, 2025

I've fixed the issue causing the tests failures and reverted the GetFuncletStartAddress change

@janvorli janvorli merged commit cec44d6 into dotnet:main May 2, 2025
95 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants