Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FatalExecutionEngineError debugging and FailFast #1137

Closed
johnsimons opened this issue Apr 29, 2013 · 13 comments
Closed

FatalExecutionEngineError debugging and FailFast #1137

johnsimons opened this issue Apr 29, 2013 · 13 comments
Milestone

Comments

@johnsimons
Copy link
Member

This issue was raised on the mailing list, see http://tech.groups.yahoo.com/group/nservicebus/message/18755

The problem is that if there is a configuration/startup exception in the NServiceBus host, the error/stacktrace of the root exception is not shown.

The code that is problematic is this (copied here for convenience):

try
{
    PerformConfiguration();

    bus = Configure.Instance.CreateBus();
    if (bus != null && !SettingsHolder.Get<bool>("Endpoint.SendOnly"))
    {
        bus.Start();
    }

    configManager.Startup();
    wcfManager.Startup();
}
catch (Exception ex)
{
    //we log the error here in order to avoid issues with non serializable exceptions
    //going across the appdomain back to topshelf
     LogManager.GetLogger(typeof(GenericHost)).Fatal("Exception when starting endpoint.", ex);
     Environment.FailFast("Exception when starting endpoint.", ex);
}

The problem is we assume that LogManager has already been configured but that may not be correct because for example:

public class LoggingConfiguration : IConfigureLoggingForProfile<Production>
{
    public void Configure(IConfigureThisEndpoint specifier)
    {
        SetLoggingLibrary.NLog();

        throw new Exception("Foo boo aaa...");

        NLog.Config.SimpleConfigurator.ConfigureForTargetLogging(new NLog.Targets.ColoredConsoleTarget());
    }
}

What would be the state of the logger?

To address this issue I recommend we do a Console.Out as well.
So the code would end up like:

catch (Exception ex)
{
    //we log the error here in order to avoid issues with non serializable exceptions
    //going across the appdomain back to topshelf
    LogManager.GetLogger(typeof(GenericHost)).Fatal("Exception when starting endpoint.", ex);

    Console.Out.WriteLine("Exception when starting endpoint: {0}", ex);

    if (Environment.UserInteractive)
    {
        Thread.Sleep(10000); // so that user can see on their screen the problem
    }

    Environment.FailFast("Exception when starting endpoint.", ex);
}

The Environment.FailFast is still required because it logs to the Event Viewer as a backup in case neither of the above worked.

@SimonCropp
Copy link
Contributor

can we just clear up one point to avoid confusion

The problem is that if there is a configuration/startup exception in the NServiceBus host, the error/stacktrace of the root exception is not shown.

the error+stacktrace of the root exception is not shown in the configured log because of an error in config.
However the error+stacktrace does show up in the event log

@jerrosenberg
Copy link
Contributor

@SimonCropp Apparently not when debugging in the IDE. I get no entry at all, let alone a stack trace.

When running without debugging, I do get 3 entries in the log, but the original stack is lost. Perhaps it's in the crash dump, but not going there :) Here are the entries:

  1. Error
    Application: NServiceBus.Host.exe
    Framework Version: v4.0.30319
    Description: The application requested process termination through System.Environment.FailFast(string message).
    Message: Exception when starting endpoint.
    Stack:
    at System.Environment.FailFast(System.String, System.Exception)
    at NServiceBus.Hosting.GenericHost.Start()
    at Magnum.StateMachine.EventActionList1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Execute(System.__Canon, Magnum.StateMachine.Event, System.Object) at Magnum.StateMachine.State1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].RaiseEvent(System.__Canon, Magnum.StateMachine.BasicEvent1<System.__Canon>, System.Object) at Magnum.StateMachine.StateMachine1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].RaiseEvent(Magnum.StateMachine.Event)
    at Topshelf.Internal.ServiceControllerProxy.Start()
    at Topshelf.Internal.ServiceCoordinator.Start()
    at Topshelf.Internal.Hosts.ConsoleHost.Run()
    at NServiceBus.Hosting.Windows.Program.Main(System.String[])
  2. Error
    Faulting application name: NServiceBus.Host.exe, version: 4.0.0.2, time stamp: 0x5177f8a4
    Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
    Exception code: 0x80131623
    Fault offset: 0x000007f842e1aa78
    Faulting process id: 0x73c0
    Faulting application start time: 0x01ce44f308f4be38
    Faulting application path: C:\projects.net\Alerts\Mailer\bin\Debug\NServiceBus.Host.exe
    Faulting module path: unknown
    Report Id: 477ab846-b0e6-11e2-be80-6805ca08681d
    Faulting package full name:
    Faulting package-relative application ID:
  3. Information
    Fault bucket , type 0
    Event Name: CLR20r3
    Response: Not available
    Cab Id: 0
        
    Problem signature:
    P1: nservicebus.host.exe
    P2: 4.0.0.2
    P3: 5177f8a4
    P4: ema.Alerts.Tracing
    P5: 1.0.0.0
    P6: 517e98db
    P7: 1
    P8: b
    P9: System.NullReferenceException
    P10:
        
    Attached files:
        
    These files may be available here:
    C:\ProgramData\Microsoft\Windows\WER\ReportArchive\AppCrash_nservicebus.host_72a4b9949c8a957786a2a501e8ce07654f3dd98_672b10db
        
    Analysis symbol:
    Rechecking for solution: 0
    Report Id: 477ab846-b0e6-11e2-be80-6805ca08681d
    Report Status: 2048
    Hashed bucket:

@johnsimons
Copy link
Member Author

@jerrosenberg you not alone there!
I have exactly the same problem, no stacktrace in event log!
And I'm 100% sure this worked before! I wonder if is another v4.5 bug (I mean feature!)

@SimonCropp
Copy link
Contributor

@jerrosenberg i stand corrected. With the debugger i have noticed some qwerks in what gets loged in the eventlog

@jerrosenberg
Copy link
Contributor

I think there's 2 issues here.

  1. Failures while initializing logging are confuzzled by the debugger.
  2. FailFast seems pretty evil :)

I think your solution to 1 is a good one.

Here's my opinion on why 2:

  • You are not allowed to examine most state in the debugger to determine the cause, as the program has been effectively dismembered.
  • It skips finally blocks, finalizers, etc.
  • It doesn't (at least anymore) log consistently to the event log.
  • According to the docs, it creates a crash dump each time, unconfigurably?
  • It just LOOKS evil. FatalExecutionEngineError from the NServiceBus stack. How does it make you feel first time?

If you really want this just for the event log write, then just write to the event log, and shut down gracefully, or at least not SO hard. :)

If you're worried about the event log write failing, you could always try catch that and failfast if you can't write to the log! Then I think you would definitely meet the criteria in the msdn doc:

Use the FailFast method instead of the Exit method to terminate your application if the state of your application is damaged beyond repair...

I think this methodology should be applied across the board wherever FailFast is currently used.

But I've been wrong before..

My cents.

Btw, you may enjoy this comment found here:

However, in CLR v4, Environment.FailFast when a debugger is attached gives you an MDA saying you've hit a bug in the runtime or unsafe managed code, and this is most likely caused by heap corruption or a stack imbalance from COM Interop or P/Invoke. That extremely misleading error isn't right, and we can temporarily work around this by using Environment.Exit if a debugger is attached. The right fix is to plumb FailFast correctly through our native Watson code, adding in a TypeOfReportedError for fatal managed errors.

@ghost ghost assigned yvesgoeleven and SimonCropp May 17, 2013
@SimonCropp
Copy link
Contributor

note: perhaps try catch from the code that calls topshelf and re-throw a new aggregate exception to preserve stack trace

@SimonCropp
Copy link
Contributor

@johnsimons @andreasohlund proposed fix here d1c2203

the Idea is to

  • catch the topshelf exception
  • extract the inner exception
  • patch the stack trace of the inner exception
  • rethrow the inner exception
  • let the runtime handle it as an unhandled exception

thoughts?

@johnsimons
Copy link
Member Author

How do u find such obscure Apis?
It feels like voodoo!

The only thing I don't like is the public extension method on the Exception
class that we do not own :(

On Tuesday, June 18, 2013, Simon Cropp wrote:

@johnsimons https://github.com/johnsimons @andreasohlundhttps://github.com/andreasohlundproposed fix here
d1c2203d1c2203

the Idea is to

  • catch the topshelf exception
  • extract the inner exception
  • patch the stack trace of the inner exception
  • rethrow the inner exception
  • let the runtime handle it as an unhandled exception

thoughts?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1137#issuecomment-19605989
.

Regards
John Simons
NServiceBus

@SimonCropp
Copy link
Contributor

Fair point. I will make it internal.

As for the voodoo... There are other ways of doing it but they involve reflection.

Also we can remove the voodoo when we move to 4.5 since there is a simplified public api in 4.5

@johnsimons
Copy link
Member Author

So what does the exception look like?
Can u show a before and after?

On Tuesday, June 18, 2013, Simon Cropp wrote:

Fair point. I will make it internal.

As for the voodoo... There are other ways of doing it but they involve
reflection.

Also we can remove the voodoo when we move to 4.5 since there is a
simplified public api in 4.5


Reply to this email directly or view it on GitHubhttps://github.com//issues/1137#issuecomment-19607530
.

Regards
John Simons
NServiceBus

@SimonCropp
Copy link
Contributor

Have a look at the unit test I added

@andreasohlund
Copy link
Member

Looks good, add a comment /YT to get rid of it when we are 4.5 only

On Tue, Jun 18, 2013 at 2:32 PM, Simon Cropp notifications@github.comwrote:

Have a look at the unit test I added


Reply to this email directly or view it on GitHubhttps://github.com//issues/1137#issuecomment-19607984
.

http://andreasohlund.net
http://twitter.com/andreasohlund

@johnsimons
Copy link
Member Author

Agree, I'll merge it in today

On Wednesday, June 19, 2013, Andreas Öhlund wrote:

Looks good, add a comment /YT to get rid of it when we are 4.5 only

On Tue, Jun 18, 2013 at 2:32 PM, Simon Cropp <notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.com');>>wrote:

Have a look at the unit test I added


Reply to this email directly or view it on GitHub<
https://github.com/NServiceBus/NServiceBus/issues/1137#issuecomment-19607984>

.

http://andreasohlund.net
http://twitter.com/andreasohlund


Reply to this email directly or view it on GitHubhttps://github.com//issues/1137#issuecomment-19638195
.

Regards
John Simons
NServiceBus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants