Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ignoring unhandled exceptions in UnhandledException event handler #42275

Closed
jkotas opened this issue Sep 15, 2020 · 89 comments
Closed

Allow ignoring unhandled exceptions in UnhandledException event handler #42275

jkotas opened this issue Sep 15, 2020 · 89 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-Host User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Sep 15, 2020

Background and Motivation

Scenarios like designers or REPLs that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these type of scenarios.

The discussion that lead to this proposal is in #39587

Proposed API

Allow ignoring unhandled exceptions on threads created by the runtime from managed UnhandledException handler:

 namespace System
 {
     public class UnhandledExceptionEventArgs
     {
         // Existing property. Always true in .NET Core today. The behavior will be changed to:
+        // - `false` for exceptions that can be ignored (ie thread was created by the runtime)
+        // - `true` for exceptions that cannot be ignored (ie foreign thread or other situations when it is not reasonably possible to continue execution)
         // This behavior is close to .NET Framework 1.x behavior of this property.
         public bool IsTerminating { get; }

+        // The default value is false. The event handlers can set it to true to make
+        // runtime ignore the exception. It has effect only when IsTerminating is false.
+        // The documentation will come with usual disclaimer for bad consequences of ignoring exceptions
+        public bool Ignore { get; set; }
     }
 }

Usage Examples

AppDomain.CurrentDomain.UnhandledException += (sender, e) =>
{
    if (DesignMode && !e.IsTerminating)
    {
        DisplayException(e.ExceptionObject);
        e.Ignore = true;
    }
};

Alternative Designs

Unmanaged hosting API that enables this behavior. (CoreCLR has poorly documented and poorly tested configuration option for this today.)

Similar prior art:

Risks

This API can be abused to ignore unhandled exceptions in scenarios where it is not warranted.

@jkotas jkotas added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Sep 15, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Sep 15, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@jkotas jkotas added area-ExceptionHandling-coreclr area-System.Runtime api-ready-for-review API is ready for review, it is NOT ready for implementation and removed area-ExceptionHandling-coreclr api-ready-for-review API is ready for review, it is NOT ready for implementation api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Sep 15, 2020
@danmoseley
Copy link
Member

Is “Ignorable” a potential alternative to “ IsTerminating” name? I find it a little confusing

@jkotas
Copy link
Member Author

jkotas commented Sep 16, 2020

IsTerminating is an existing property. We can certainly leave the existing property alone and introduce a new one with a better name.

@danmoseley
Copy link
Member

Ah of course

@terrajobst terrajobst removed the untriaged New issue has not been triaged by the area owner label Sep 17, 2020
@terrajobst
Copy link
Member

terrajobst commented Sep 22, 2020

Video

  • Looks good as proposed
namespace System
{
    public partial class UnhandledExceptionEventArgs : EventArgs
    {
        // Existing property.
        // public bool IsTerminating { get; }
        
        public bool Ignore { get; set; }
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 22, 2020
@cheverdyukv
Copy link

Is it possible to know when it will be implemented? .NET 5.0 or perhaps in one of the updates. Or perhaps .NET 6.0?

@jkotas
Copy link
Member Author

jkotas commented Oct 16, 2020

.NET 5.0 is done. We only ship critical bug fixes in servicing updates, no new features.

I am not sure whether the core .NET team will get to work on this in .NET 6.

Would you be interested in contributing the implementation yourself?

@cheverdyukv
Copy link

@jkotas After checking source code for quite some time, I feel that it should be done by someone really familiar with exception handling for different platforms. Let me explain why.

I found that C++ code function InternalUnhandledExceptionFilter_Worker is core of exception handling. Looks like this SEH code and as result it is Windows specific. I have no idea where is code for other platforms and I have no idea how exceptions handled there. I also found that Mono has different way to handle this. I don't know about Blazor and how it works there.

But even for Windows how is it possible to know what should I set for IsTerminating for different type of exceptions? I checked that Stack Overload does not call UnhandledException but Out Of Memory does. I feel like IsTerminating should be set to false for all types of exceptions because to me looks like they all could be suppressed using that obsolete flag. Perhaps it will be better to invite here somebody from team who knows more about exceptions to help us decide?

From what I read in comments this problem is definitely not easy to solve and has a lot of tricks and caveats . Perhaps it will be easier to revert to original proposition with passing some flag during host creation. That code is already there and it is already able to suppress exceptions.

Next problem that certain code rely on IsTerminating. For example:

src\libraries\System.Data.OleDb\src\System\Data\ProviderBase\DbConnectionPoolCounters.cs
src\libraries\System.Runtime.Caching\src\System\Runtime\Caching\MemoryCache.cs

And perhaps there could be some 3rd party code that could rely on this as well.

As you can see there are calls to Dispose when process is terminating.
But now IsTerminating will be false for most cases but application will still be terminating if nobody set Ignore to true. As result certain code could break. Probability is low, but it could happen.

@jkotas
Copy link
Member Author

jkotas commented Oct 19, 2020

I feel like IsTerminating should be set to false for all types of exceptions because to me looks like they all could be suppressed using that obsolete flag.

One specific case that cannot be possible suppressed are the unhandled exception on a foreign threads on Unix.

The existing obsolete hosting flag has zero testing. It is likely that it does not behave correctly in some cases.

Perhaps it will be better to invite here somebody from team who knows more about exceptions to help us decide?

cc @janvorli

original proposition with passing some flag during host creation

It would not make the problem with getting this right any easier.

Next problem that certain code rely on IsTerminating.

Good point. This will need careful thought.

@cheverdyukv
Copy link

cheverdyukv commented Oct 22, 2020

One specific case that cannot be possible suppressed are the unhandled exception on a foreign threads on Unix.

Do you know best way to test it? I have feeling that these will not call unhandled exception at all including the same on Windows. For example in .NET Framework when some thread called .NET and there is exception that is unhandled, then it was passed to that environment as exception and that could be treated normally using standard SEH.

The existing obsolete hosting flag has zero testing. It is likely that it does not behave correctly in some cases.

I did check code and it looks like same (or quite similar) flag existed in .NET Framework. But I never used that flag and not sure how will it behave.

It would not make the problem with getting this right any easier.

Well I still working thru that code, but a have feeling that everything is done already :) except setting this flag of course. I just don't have time to do proper testing.

@darkguy2008
Copy link

It seems this issue is also blocking an important plugin for Unreal Engine ( nxrighthere/UnrealCLR#33 ) while this was moved to the Future milestone, I'd like to push this to be taken into consideration for .NET 6 or even .NET 7. It's kinda discouraging to see it stuck in that milestone that only god-knows-when will be done.

Considering the positive impact that project is making in the UE community as a whole, I think it's important to look at.

Excluding the UE topic, this issue is pretty valid to me in some apps I've developed in the past, too. Haven't needed it, but it would've been a good addition to some workarounds I've had to implement.

@danmoseley
Copy link
Member

@jkotas should this be labeled up for grabs?

@jkotas jkotas added the help wanted [up-for-grabs] Good issue for external contributors label Mar 8, 2022
@jkotas
Copy link
Member Author

jkotas commented Apr 22, 2024

int32_t error_code

What kind of error_code is this?

Cast to PEXCEPTION_POINTERS on Windows or ucontext_t* on non-Windows.

EXCEPTION_POINTERS is bundle of EXCEPTION_RECORD and CONTEXT. It corresponds to siginfo_t* and ucontext_t*. We may want to split it into two fields like this:

struct ExceptionData
{
...
    void* info; // Cast to PEXCEPTION_RECORD on Windows or siginfo_t* on non-Windows.
    void* ucontext; // Cast to PCONTEXT on Windows or ucontext_t* on non-Windows.
...
};

ExceptionData

We should call it FatalErrorData (or FatalErrorInfo) to match SetFatalErrorHandler.

@VSadov
Copy link
Member

VSadov commented Apr 23, 2024

int32_t error_code

What kind of error_code is this?

The error codes are from:
C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\Include\um\CorError.h

or, in PAL:
https://github.com/dotnet/runtime/blob/2e75aac8dbbae5420a93f6fbe9568812dbeb3472/src/coreclr/pal/prebuilt/inc/corerror.h

Ex: COR_E_FAILFAST, COR_E_STACKOVERFLOW

@jkotas
Copy link
Member Author

jkotas commented Apr 23, 2024

I would call it int32_t hresult then (to match Exception.HResult that is the same kind of error code).

@VSadov
Copy link
Member

VSadov commented Apr 23, 2024

What do we do with strings, exception strings in particular?

if hostfxr as an example, the strings should be const pal::char_t*

         const pal::char_t* argExceptionString=NULL,         // whatever exception.ToString() produced.
         const pal::char_t* message=NULL,                    // Ex: "SSE and SSE2 processor support required."
         const pal::char_t* detailMessage=NULL               // detailMessage that is passed to Assert(condition, message, detailMessage)
         const pal::char_t* errorSource=NULL                 // Ex: "Assertion Failed"

The most contentious is argExceptionString:

  • is it worth including?
  • do we want stack trace in addition?
  • for the stack trace - should it be one string (whatever Exception.StackTrace returns) or array of strings, or some kind of iterator?

I think we can only provide the stack when this was an unhandled managed exception and only as much as we can get from new StackTrace (Exception e).

At one extreme we could capture a snapshot of the stack and build entire infrastructure similar to StackTrace/StackFrame on top of that (modulo System.Reflection.MethodBase GetMethod()).
On the other hand we can say that what StackTrace.ToString produces is parseable enough and give that out as one string.

@jkotas
Copy link
Member Author

jkotas commented Apr 23, 2024

if hostfxr as an example, the strings should be const pal::char_t*

Yeah, char16_t vs char_t is pain to deal with. char_t is better for portability, but it may require conversions. Do you have an opinion?

The most contentious is argExceptionString:
is it worth including?

I think it depends on the handler. Some handlers may want to get more detailed information (even if it means that secondary crashes can occur while gathering this extra information), some may want to get minimal information, some may want to get more detailed information only for certain types of crashes. It makes me think that we may want to have some sort of callback-based pattern to optionally gather the more detailed information. If we do that, we can start with something very minimal and evolve it based on feedback.

@VSadov
Copy link
Member

VSadov commented Apr 23, 2024

Yeah, char16_t vs char_t is pain to deal with. char_t is better for portability, but it may require conversions. Do you have an opinion?

I'd prefer char_t. For portability reasons and similarity with hosting APIs.

Unlike old APIs influenced by COM and Win32, hostfxr is relatively new and some thinking about portability must have been done. I am not aware of const char_t* in its APIs causing any friction.
Crash handler is not exactly a hosting component, but in a way it is related to app's life cycle. - Another reason I looked at hosting for API shape examples.

@AaronRobinsonMSFT
Copy link
Member

I'd prefer char_t. For portability reasons and similarity with hosting APIs.

I don't think this is a good idea. The hosting APIs have very different UX and rules. The hosting APIs are expecting strings that are created by the user and provided. This API is entirely in the domain of runtime provided strings, so we should not introduce any additional munging in that space and provide what we have. char16_t is far more preferable to yet another set of PAL definitions that users will need to manage.

@VSadov
Copy link
Member

VSadov commented Apr 23, 2024

I don't think this is a good idea. The hosting APIs have very different UX and rules. The hosting APIs are expecting strings that are created by the user and provided. This API is entirely in the domain of runtime provided strings, so we should not introduce any additional munging in that space and provide what we have. char16_t is far more preferable to yet another set of PAL definitions that users will need to manage.

Makes sense. I did notice that unlike hosting, the strings go in a different direction (from runtime to the handler).

By char16_t - do you mean on both Windows and Unix? Any thoughts about encoding?

@AaronRobinsonMSFT
Copy link
Member

By char16_t - do you mean on both Windows and Unix? Any thoughts about encoding?

My preference would be keep them in UTF-16. If we wanted to marshal them to some canonical form on a per OS basis, then I think we should go with our previous conversation and use void* and document what users should cast the pointer to. My main quibble here would be defining any PAL sounding types/APIs for such a narrow and low-level feature.

@VSadov
Copy link
Member

VSadov commented Apr 23, 2024

The most contentious is argExceptionString:
is it worth including?

I think it depends on the handler. Some handlers may want to get more detailed information (even if it means that secondary crashes can occur while gathering this extra information), some may want to get minimal information, some may want to get more detailed information only for certain types of crashes. It makes me think that we may want to have some sort of callback-based pattern to optionally gather the more detailed information. If we do that, we can start with something very minimal and evolve it based on feedback.

I think the managed stack trace is important and hard to get in other ways, we should add it, and should add it separately from argExceptionString even if it would often overlap, because argExceptionString comes from ToString() and could be truncated or overriden to not have stack in it.

I would produce the stacktrace in just one string though. Any kind of iterator or object model on top of that could be overthinking. It would still be in some textual form, just chunked up.
So maybe more structural stack info could come in the future improvements, if there is a need.

@jkotas
Copy link
Member Author

jkotas commented Apr 23, 2024

I think the managed stacktrace needs to be optional and only computed if the handler wants it given that computing it is a very complex operation. I do not have a strong opinion about how it should be formatted.

hard to get in other ways

This may change as we make progress on cDAC.

@VSadov
Copy link
Member

VSadov commented Apr 23, 2024

I think the managed stacktrace needs to be optional and only computed if the handler wants it given that computing it is a very complex operation. I do not have a strong opinion about how it should be formatted.

Would that apply to ex.ToString() as well?

That is what we currently do in QCALLTYPE Environment_FailFast where we still can run managed code. After that it comes to the EEPolicy::HandleFatalError in a form of a string for which we've already paid.

@jkotas
Copy link
Member Author

jkotas commented Apr 23, 2024

If the fail-fast is originating as Environment.FailFast, I do not have concerns about collecting the stacktrace upfront like we do that today.

I have concerns about computing the stacktrace automatically for failfasts that are e.g. crashes in the middle of GC. We do print the managed stacktrace to console for these today, but it is not clear whether it is a good idea. We have seen situations where it caused secondary crashes and made the original crash undiagnosable.

@VSadov
Copy link
Member

VSadov commented Apr 23, 2024

Right. A managed stack is safe to get if it was a managed unhandled exception, not a random corruption. And even that may need to exclude cases like OOM.
Since Environment_FailFast covers the "safe" unhandled managed case anyways, if we get argExceptionString from it, we will pass it to the user, but otherwise can avoid any additional complex calls.

@VSadov
Copy link
Member

VSadov commented Apr 24, 2024

The latest iteration that should capture all the suggestions so far. Let me know if I missed/misunderstood something or there are some additional thoughts.

The API definition:

Managed API to set up the handler

public static class ExceptionHandling
{
    /// <summary>
    /// .NET runtime is going to call `fatalErrorHandler` set by this method before its own
    /// fatal error handling (creating .NET runtime-specific crash dump, etc.).
    /// </summary>
    /// <exception cref="ArgumentNullException">If fatalErrorHandler is null</exception>
    /// <exception cref="InvalidOperationException">If a handler is already set</exception>
    public static void SetFatalErrorHandler(delegate* unmanaged<uint, void*, int> fatalErrorHandler);
}

The shape of the FatalErrorHandler, if implemented in c++
(the default calling convention for the given platform is used)

// expected signature of the handler
FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data);

With FatalErrorHandlerResult and FatalErrorInfo defined in "TBD.h" as:

enum FatalErrorHandlerResult : int32_t
{
    RunDefaultHandler = 0,
    SkipDefaultHandler = 1,
};

struct FatalErrorInfo
{
    size_t size;    // size of the FatalErrorInfo instance
    void*  address; // code location correlated with the failure (i.e. location where FailFast was called)

    // exception/signal information, if available
    void* info;     // Cast to PEXCEPTION_RECORD on Windows or siginfo_t* on non-Windows.
    void* context;  // Cast to PCONTEXT on Windows or ucontext_t* on non-Windows.

    // UTF-16 strings with additional information as appropriate
    char16_t* errorSource;     // example: "Assertion Failed"
    char16_t* message;         // error message. For example the one passed to "FailFast(message)"
    char16_t* detailMessage;   // detailMessage that was passed to "Assert(condition, message, detailMessage)"
    char16_t* exceptionString; // for unhandled managed exceptions the result of "ex.ToString()"
};

Typical use:

Setting up the handler for the app (C# code in the actual app):

internal unsafe class Program
{
    [DllImport("myCustomCrashHandler.dll")]
    public static extern delegate* unmanaged<uint, void*, int> GetFatalErrorHandler();

    static void Main(string[] args)
    {
        ExceptionHandling.SetFatalErrorHandler(GetFatalErrorHandler());

        RunMyProgram();
    }
}

The handler. (c++ in myCustomCrashHandler.dll)

#include "TBD.h"

static FatalErrorHandlerResult FatalErrorHandler(int32_t hresult, struct FatalErrorInfo* data)
{
    // this is a special handler that analyzes OOM crashes
    if (hresult != COR_E_OUTOFMEMORY)
    {
        return FatalErrorHandlerResult::RunDefaultHandler;
    }

    DoSomeCustomProcessingOfOOM(data);

    // no need for a huge crash dump after an OOM.
    return FatalErrorHandlerResult::SkipDefaultHandler;
}

extern "C" DLL_EXPORT void* GetFatalErrorHandler()
{
    return &FatalErrorHandler;
}

@jkotas
Copy link
Member Author

jkotas commented Apr 24, 2024

    [UnmanagedCallersOnly]
    [DllImport("myCustomCrashHandler.dll")]

I do not think we allow a single method to be both UnmanagedCallersOnly and DllImport. I should be more like this:

    [DllImport("myCustomCrashHandler.dll")]
    public static extern delegate* unmanaged<...> GetFatalErrorHandler();

// returns false if default handler is not desired.

I think this can use clarification. One possible behavior:

  • If this returns true, we will proceed with the default processing - print exception to console and invoke Watson/CrashDump.
  • If this returns false, we will just call TeminateProcess/abort without doing anything else.

Is it what you meant?

We may want to make it an enum to allow extending the possible actions in future. It would also avoid the need for x-plat definition of BOOL.

__cdecl

The explicit __cdecl is not in the managed signature. I would avoid the explicit calling convention, and just say that it is the platform default.

struct ExceptionData

Nit: This should be FatalErrorInfo.

@VSadov
Copy link
Member

VSadov commented Apr 24, 2024

Yes to all the above.
I will update the prototype in place.

@VSadov
Copy link
Member

VSadov commented Apr 24, 2024

I do not think we allow a single method to be both UnmanagedCallersOnly and DllImport.

When I try that without UnmanagedCallersOnly, I get a compile error when taking an address. (&)

Error	CS8786	Calling convention of 'Program.FatalErrorHandler(uint, void*)' is not compatible with 'Unmanaged'.	

With the attribute it compiles.
At least C# thinks it is needed for this scenario. (I did not check what it emits)

@jkotas
Copy link
Member Author

jkotas commented Apr 24, 2024

Try to run:

using System.Runtime.InteropServices;

unsafe
{
    delegate* unmanaged<uint, void*, int> p = &FatalErrorHandler;
    p(0, null);

    [UnmanagedCallersOnly]
    [DllImport("myCustomCrashHandler.dll")]
    static extern int FatalErrorHandler(uint exitCode, void* fatalErrorHandler);
}

You will get: "Unhandled exception. System.NotSupportedException: Method's type signature is not PInvoke compatible."

@VSadov
Copy link
Member

VSadov commented Apr 24, 2024

You will get: "Unhandled exception. System.NotSupportedException: Method's type signature is not PInvoke compatible."

I saw that, but thought it is at the call site failure - since the caller is not unmanaged.
That is just an example of use though. We can come up with something else. There must be a way to fetch a pointer to a native function from managed code. Perhaps more than one.

Although I kind of wonder if we need to encode the native method signature in the API, or whether it is a function at all.

public static void SetFatalErrorHandler(delegate* unmanaged<uint, void*, int> fatalErrorHandler);

vs.

public static void SetFatalErrorHandler(void* fatalErrorHandler);

I am thinking back and forth about this.

We can assign a specific delegate type to the handler, but not sure if it brings a lot of benefits. There is void* in the parameters anyways, so it is already weakly typed.

@jkotas
Copy link
Member Author

jkotas commented Apr 24, 2024

There must be a way to fetch a pointer to a native function from managed code. Perhaps more than one.

  • Create a helper method in the native library that returns the function pointer, DllImport the helper method, and call it to get the function pointer. I think this is the best option.
  • NativeLibrary.Load + NativeLibrary.GetExport. It does not work with static linking

Although I kind of wonder if we need to encode the native method signature in the API, or whether it is a function at all.

https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.objectivec.objectivecmarshal.initialize?view=net-8.0 is recently added API with similar shape. It has the function pointers in the signature.

@VSadov
Copy link
Member

VSadov commented Apr 24, 2024

I have applied all the suggestions.
I think this is ready to be put in one proposal (together with managed handler part) and send to API review.

@jkotas
Copy link
Member Author

jkotas commented Apr 24, 2024

LGTM

Nit: The example has GetFatalErrorHandler on the managed side, but it does not show it on the unmanaged side.

@VSadov
Copy link
Member

VSadov commented Apr 25, 2024

Nit: The example has GetFatalErrorHandler on the managed side, but it does not show it on the unmanaged side.

And with GetFatalErrorHandler pattern the handler itself does not need to be DLL_EXPORT

@VSadov
Copy link
Member

VSadov commented Apr 25, 2024

I think we want to put FatalErrorHandlerResult and FatalErrorInfo in a header file. Any existing files this that can go into?

@jkotas
Copy link
Member Author

jkotas commented Apr 25, 2024

I would create a new small header file for this. It can live in https://github.com/dotnet/runtime/tree/main/src/native/public. I would not worry about shipping it - just tell people to copy it over from the repo if they need it.

(We have header files for debugger, profiler or hosting. Each of these are handled differently, there is not much consistency.)

@VSadov
Copy link
Member

VSadov commented Apr 25, 2024

The combined proposal is at #101560

@jkotas @AaronRobinsonMSFT - huge thanks for helping with this!!

@jkotas
Copy link
Member Author

jkotas commented Apr 25, 2024

Superseded by #101560

@jkotas jkotas closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-Host User Story A single user-facing feature. Can be grouped under an epic.
Projects
Status: No status
Development

No branches or pull requests