Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Host is not stopped when BackgroundService crashes #43637

Closed
marcselis opened this issue Oct 20, 2020 · 42 comments · Fixed by #50569
Closed

Generic Host is not stopped when BackgroundService crashes #43637

marcselis opened this issue Oct 20, 2020 · 42 comments · Fixed by #50569
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-Extensions-Hosting help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@marcselis
Copy link

marcselis commented Oct 20, 2020

Updated by @maryamariyan:

Description:

We want to add a new opt-in option on HostOptions that reacts to failures on a background service.

It generally might not be always desirable to allow for a failure/crash on a background service to stop the application altogether and therefore logging the failure is a default behavior.

API Proposal:

API below would add the option to stop the host on background service exception.

namespace Microsoft.Extensions.Hosting
{
    public class HostOptions
    {
        public bool StopOnBackgroundServiceException { get; set; }
    }
}
Original Description (click to view)

Description

According to the documentation of the BackgroundService.ExecuteAsync method that method should return a Task that represents the lifetime of the long running operation(s) being performed.

For me that implies that this Task is monitored by the generic host, so I would expect that when the BackgroundService.ExecuteAsync's Task instance is completed (successfully or due to an exception) the host itself would also stop. This is not the case. As a result, a Windows service where the BackgroundService.ExecuteAsync method has crashed or stopped will continue to appear running in the Services Management Console, but will no longer be performing any work., causing all monitoring systems to keep reporting that everything is ok.

Configuration

  • .NET Core 3.1.402
  • OS Windows 10.0.18363
  • Architecture X64

Sample application

The following sample demonstrates the problem. You can either run it on the console or install it as windows service and start it. The program will continue to run after the background service has crashed.

using System;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;

namespace WorkerTest
{
    public static class Program
    {
        public static void Main(string[] args)
        {
            CreateHostBuilder(args).Build().Run();
        }

        private static IHostBuilder CreateHostBuilder(string[] args) =>
            Host.CreateDefaultBuilder(args)
                .UseWindowsService()
                .ConfigureServices((hostContext, services) =>
                {
                    services.AddHostedService<Worker>();
                });
    }

    public class Worker : BackgroundService
    {
        private readonly ILogger<Worker> _logger;
        private int _counter;

        public Worker(ILogger<Worker> logger)
        {
            _logger = logger;
        }

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            try
            {
                while (!stoppingToken.IsCancellationRequested)
                {
                    _logger.LogInformation("Worker running at: {time} ({counter})", DateTimeOffset.Now, _counter++);
                    if (_counter > 10)
                        throw new Exception("Something happened!");
                    await Task.Delay(1000, stoppingToken).ConfigureAwait(false);
                }
            }
            finally
            {
                if (!stoppingToken.IsCancellationRequested)
                    _logger.LogError("Worker stopped unexpectedly");
            }
        }
    }

}
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Extensions-Hosting untriaged New issue has not been triaged by the area owner labels Oct 20, 2020
@ghost
Copy link

ghost commented Oct 20, 2020

Tagging subscribers to this area: @eerhardt, @maryamariyan
See info in area-owners.md if you want to be subscribed.

@maryamariyan
Copy link
Member

I don't think the lifetime of Generic Host depends on what its background services do and therefore what you see is an expected outcome.

@cwe1ss
Copy link
Contributor

cwe1ss commented Oct 20, 2020

We are using the following base class for background services that should terminate the application on error:

public abstract class TerminatingBackgroundService : BackgroundService
{
    protected IServiceProvider ApplicationServices { get; }
    protected IHostApplicationLifetime ApplicationLifetime { get; }
    protected ILogger Logger { get; }

    protected TerminatingBackgroundService(IServiceProvider applicationServices)
    {
        ApplicationServices = applicationServices;
        ApplicationLifetime = applicationServices.GetRequiredService<IHostApplicationLifetime>();
        Logger = ApplicationServices.GetRequiredService<ILoggerFactory>().CreateLogger(GetType());
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        try
        {
            await ExecuteCoreAsync(stoppingToken);

            Logger.LogInformation("Execute has finished. IsCancellationRequested: {IsCancellationRequested}", stoppingToken.I
        }
        catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
        {
            // We're shutting down already, so we don't have to do anything else.
            Logger.LogInformation("OperationCanceledException on shutdown");
        }
        catch (Exception exception)
        {
            Logger.LogCritical(exception, exception.Message);

            // We need StopApplication() so that it properly flushes and disposes the logging system.
            // However, it is a graceful shutdown that exits with code 0.
            Environment.ExitCode = 1;

            if (!stoppingToken.IsCancellationRequested)
            {
                ApplicationLifetime.StopApplication();
            }
        }
    }

    protected abstract Task ExecuteCoreAsync(CancellationToken stoppingToken);
}

@marcselis
Copy link
Author

How can it be an "exepcted outcome" that the host process continues to run when the background service doing the actual work has crashed without the host process even knowing? This is a very dangerous situation where all monitoring systems that monitor running Windows services report that everything is OK while no work is being performed.

As the host is unaware that the worker service has stopped it also cannot take any corrective actions like restarting the worker and/or reporting the error...

@maryamariyan
Copy link
Member

maryamariyan commented Oct 20, 2020

Related to #36388 (comment) and #36388 (comment)

@eerhardt
Copy link
Member

In .NET 6 we have fixed #36017. Now when an exception is thrown from a BackgroundService, it will get logged as an error.

@marcselis
Copy link
Author

Just logging the exception is not enough. This will still have the same effect: the process keeps running but doesn't perform any work anymore, causing all monitoring systems to think that everything is ok, unless you have some alerting system on your logs.

I agree with @GeeWee in #36388 (comment) that you should be able to specify (per BackgroundService) what the host should do when that BackgroundService stops, as some background services are more important than others. My options would be: ignore, restart and stop the host with a (specifiable) error code.

@maryamariyan
Copy link
Member

maryamariyan commented Oct 20, 2020

As per @davidfowl 's comment in #36388 (comment) there could be a new opt-in option on HostOptions that reacts to failures on a background service.

It generally might not be always desirable to allow for a failure/crash on a background service to stop the application altogether and therefore logging the failure is a default behavior.

@davidfowl
Copy link
Member

davidfowl commented Jan 18, 2021

Lets do this for 6.0. Add the option to stop the host on background service exception.

namespace Microsoft.Extensions.Hosting
{
    public class HostOptions
    {
        public bool StopOnBackgroundServiceException { get; set; }
    }
}

cc @maryamariyan @eerhardt @ericstj

@maryamariyan maryamariyan removed the untriaged New issue has not been triaged by the area owner label Jan 18, 2021
@maryamariyan maryamariyan added this to the 6.0.0 milestone Jan 18, 2021
@maryamariyan maryamariyan added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Jan 18, 2021
@GeeWee
Copy link

GeeWee commented Jan 19, 2021

I think this is a pretty good API suggestion. I would like to suggest that it either

  • Defaults to true (which is a breaking change, but will match the mental model of most programmers that haven't hit this behaviour before)
  • Is required, so you have to set it (not a breaking change, little more verbose, but does force the user to know what's going on.

@Joe4evr
Copy link
Contributor

Joe4evr commented Jan 19, 2021

  • Is required, so you have to set it (not a breaking change, little more verbose, but does force the user to know what's going on.

That is most definitely a breaking change, since if a previously compiled app is running against a new framework version, there will be no matching signature for the emitted call.

@GeeWee
Copy link

GeeWee commented Jan 19, 2021

Good point - I hadn't thought about that.

@davidfowl
Copy link
Member

We can argue about defaults but currently as proposed the default is false and there's no breaking change

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 21, 2021
@terrajobst
Copy link
Member

terrajobst commented Jan 21, 2021

Video

  • It seems the new behavior is more desirable than the current behavior but it's hard to know how many apps would be broken by the new behavior.
  • The API isn't very discoverable, so if it's purely opt-in, then most customers aren't likely to benefit from the new behavior.
  • A compromise would be:
    • Change the templates for .NET 6 to set this property to BackgroundServiceExceptionBehavior.StopHost
    • Based on customer feedback, we can change the default for .NET 7 and simplify the templates
  • On the other hand, this is early for .NET 6
    • Let's ship the API
    • Let's change the default now, not update the templates, see what feedback we get for the previews, and revert back to opt-in if it breaks many people.
namespace Microsoft.Extensions.Hosting
{
    public enum BackgroundServiceExceptionBehavior
    {
        Ignore,
        StopHost
    }
    public class HostOptions
    {
        public BackgroundServiceExceptionBehavior BackgroundServiceExceptionBehavior { get; set; }
    }
}

@davidfowl
Copy link
Member

I like it

@PureKrome
Copy link
Contributor

I like it

@maryamariyan maryamariyan added the help wanted [up-for-grabs] Good issue for external contributors label Jan 23, 2021
@fredrikhr
Copy link
Contributor

There are some different behaviours to consider here:

  • The API to run something on the background of the host is not BackgroundService, but IHostedService (which BackgroundService implements. What does that mean for the naming of the new BackgroundServiceExceptionBehavior enum? If I implement my own IHostedService does the behaviour in the enum apply or not?
    Note: A background service is added to the host in ConfigureServices via the AddHostedService<T> API.
  • About defaulting the behaviour. Maybe the setting should be nullable. The reasoning being that it is common to create smaller watchdog-like background processing services in a web application that are not mission critical (since the main important thing in a Web app is serving HTTP requests).
    However, in a console application, or a deamon-service app there is probably only one background service (the processor) which likely is mission-critical, so it failing should tear down the entire app.
    So when decing the defaults, maybe the ConfigureWebHostDefaults extension should set the Ignore behaviour if not already set to something else? An only use StopHost when dealing with the pure generic host (default)?

@davidfowl
Copy link
Member

If I implement my own IHostedService does the behaviour in the enum apply or not?

No, we can't track work done by IHostedService.

So when decing the defaults, maybe the ConfigureWebHostDefaults extension should set the Ignore behaviour if not already set to something else? An only use StopHost when dealing with the pure generic host (default)?

Yep, I think this makes sense though it might be confusing as it will be overridden when you set it.

@fredrikhr
Copy link
Contributor

Yep, I think this makes sense though it might be confusing as it will be overridden when you set it.

That's why I propose to make the setting nullable. Then, ConfigureWebHostDefaults would have the possibility to avoid overwriting a previously set value if the setting is non-null when ConfigureWebHostDefaults sees the host builder.

@fredrikhr
Copy link
Contributor

And then Build would set the setting to StopHost when the setting is null

@marcselis
Copy link
Author

I hope that when setting BackgroundServiceExceptionBehavior to StopHost, the host still has a way to catch the exception, figure out what background service has failed and decide whether to stop or not. Or even has the option to restart the failed service?

@GeeWee
Copy link

GeeWee commented Jan 28, 2021

I'd imagine if you'd wanted to do that, wouldn't the "natural" course of action to do a try/catch block inside the RunAsync ?

@marhja
Copy link

marhja commented Feb 2, 2021

I just experienced this problem and I vote for changing the default to stop the application on unhandled exceptions. As a .NET developer I don't expect exceptions to be swallowed by the framework. I can always do that myself if that is the desired behavior.

Would it be an alternative that instead of the HostOptions.BackgroundServiceExceptionBehavior property, which I assume would affect all background services in an application, add a BackgroundService.ExceptionBehavior property, so that the behavior can be controlled per background service as requested by @marcselis above?

Since either solution won't change the different behavior of unhandled exceptions in ExecuteAsync thrown before the first await (which crashes the application with an unhandled exception) and during or after the first await (which will now stop the application gracefully), could that at least be documented for the ExecuteAsync method?

@davidfowl
Copy link
Member

I think one of the problems here is the fact that arbitrary libraries can add background services that run as part of your process. If those misbehave should it crash the application?

It's one thing to crash if it's code you control but it's not clear to me if I'd want arbitrary code to crash my app. It's the same reason ASP.NET catches unhandled exceptions that happen on the request. Otherwise it could result in a process crash.

If you have an orchestrator or watchdog monitoring your process and restarting it, crashing might be better (you can "alert" based on number of restarts). But if not it might be annoying figuring out why something failed (especially if the logs aren't written anywhere persistent)

@marhja
Copy link

marhja commented Feb 3, 2021

@davidfowl You certainly got an important concern there. But the unhandled exception on a request in ASP.NET isn't just swallowed, it shows up in the developer exception page or as http status code 500.

The main problem I have is that (some) unhandled exceptions are swallowed. The logging added in .NET 6 certainly will help, but as others have pointed out that might not be enough in a production environment.

As I mentioned, the fact that the sync part of ExecuteAsync works as I expect (i.e. exceptions bubbling up) while the async part is fire-and-forget is very confusing for me when implementing ExecuteAsync. I usually don't pay much attention to where my first await in an async method occurs and refactoring might easily change that. Ideally I think ExecuteAsync should have been two methods - InitAsync awaited by BackgroundService.StartAsync and RunAsync which isn't. Since I guess it's too late for that change I suggest better documentation of the behavior of ExecuteAsync.

This confusing behavior of ExecuteAsync makes me wonder if anyone is actually (correctly) relying on the current behavior. If they explicitly try/catch all exceptions in ExecuteAsync the changed default will not affect them, otherwise "sync" exceptions will crash the application while "async" exceptions won't.

@fredrikhr
Copy link
Contributor

@marhja

I usually don't pay much attention to where my first await in an async method occurs

Well when doing async you always have to, no way around it. That's not just for this case. When you call an async method you always can get exception both on the method call invocation (i.e. before the first await in the method) or while you're waiting on the await of the returned Task.

Yes, when your code is just await MethodCallAsync() you get the exception on the same line for both cases. But as soon as your code stores the Task you have to think about that exceptions can occur in both places:

var task = MethodCallAsync(); // Might get a synchronous exception here
// Do some work while async does its thing
var result = await task; // Might get an asynchronous exception here

Also note that you might get both normal Exception (or derivatives) and AggregateException, so you'll also have to handle two different exception types.

@marhja
Copy link

marhja commented Feb 3, 2021

@fredrikhr

Well when doing async you always have to, no way around it. That's not just for this case. When you call an async method you always can get exception both on the method call invocation (i.e. before the first await in the method) or while you're waiting on the await of the returned Task.

Yes, of course. But when implementing ExecuteAsync I don't know about it's unexpected behavior unless I read the source code of BackgroundService.StartAsync.

@marhja
Copy link

marhja commented Feb 3, 2021

@davidfowl

I think one of the problems here is the fact that arbitrary libraries can add background services that run as part of your process. If those misbehave should it crash the application?

Is this concern specific to background services? Don't every developer using any library in any type of application deal with this risk every day? To me, an unhandled exception in production indicates a bug, either in my code or in some library code. I'd like to know about that as soon as possible so I can fix it (or report it to the library author) and possible catch a specific exception from a library while waiting for a fix.

@davidfowl
Copy link
Member

@davidfowl You certainly got an important concern there. But the unhandled exception on a request in ASP.NET isn't just swallowed, it shows up in the developer exception page or as http status code 500.

The error page only shows up of that middleware is enabled. If it isn't the exception is caught and logged by the server (much like it is in .NET 6 with background services).

As I mentioned, the fact that the sync part of ExecuteAsync works as I expect (i.e. exceptions bubbling up) while the async part is fire-and-forget is very confusing for me when implementing ExecuteAsync. I usually don't pay much attention to where my first await in an async method occurs and refactoring might easily change that. Ideally I think ExecuteAsync should have been two methods - InitAsync awaited by BackgroundService.StartAsync and RunAsync which isn't. Since I guess it's too late for that change I suggest better documentation of the behavior of ExecuteAsync.

I agree with this, the behavior is confusing and I wish we had done a Task.Run from the start. There doesn't need to be an InitAsync because there's already a StartAsync. We can change it now but it'll be a breaking change.

Is this concern specific to background services? Don't every developer using any library in any type of application deal with this risk every day? To me, an unhandled exception in production indicates a bug, either in my code or in some library code. I'd like to know about that as soon as possible so I can fix it (or report it to the library author) and possible catch a specific exception from a library while waiting for a fix

I think that's a false equivalence. You don't usually worry about libraries throwing exceptions on arbitrary background threads crashing the process. Throwing exceptions is one thing but this specific abstraction is made for running background services and without everyone explicitly handling errors anyone can crash the entire application.

In think having logs changes the equation and makes things easier to debug. That said, we agreed to change then default so we'll do that in .NET 6 and see if it's a net positive for everyone

@marhja
Copy link

marhja commented Feb 3, 2021

@davidfowl

The error page only shows up of that middleware is enabled.

When I commented out app.UseDeveloperExceptionPage(); in Startup.cs the response have http status code 500 and the browser shows a generic error page.

Throwing exceptions is one thing but this specific abstraction is made for running background services and without everyone explicitly handling errors anyone can crash the entire application.

I don't see the problem with this, it's usually a good thing to fail fast on unknown errors instead of keep running and hope that whatever error occurred won't for example corrupt any data. The BCL is taking on a big responsibility here that the rest of the application won't be affected.

When your code cannot recover from an exception, don't catch that exception.

(From Best practices for exceptions)

That said, we agreed to change then default so we'll do that in .NET 6 and see if it's a net positive for everyone

👍😃

@davidfowl
Copy link
Member

When I commented out app.UseDeveloperExceptionPage(); in Startup.cs the response have http status code 500 and the browser shows a generic error page.

That's because the server is doing you a solid and preventing your application from crashing.

I don't see the problem with this, it's usually a good thing to fail fast on unknown errors instead of keep running and hope that whatever error occurred won't for example corrupt any data. The BCL is taking on a big responsibility here that the rest of the application won't be affected.

Easy to say that when logging is configured. I'm sure people would be annoyed if their application died without a trace. No logs, no crash dump.

(From Best practices for exceptions)

This doesn't apply.

@marhja
Copy link

marhja commented Feb 3, 2021

Easy to say that when logging is configured. I'm sure people would be annoyed if their application died without a trace. No logs, no crash dump.

I'm certainly annoyed when my background service die without a trace now in .NET 5. 😅 I think it would be preferable that an unhandled exception in the background service would be propagated to the application, but I guess that's not possible. Then logging the exception and shutting down the application as suggested is acceptable.

@davidfowl
Copy link
Member

Right, it was impossible to be because nothing was tracking the background task itself and that's why it had it be done by the instance and not the host.

Raising it to the application would mean some sort of OnBackgroundSericeUnhandledException event?

@marhja
Copy link

marhja commented Feb 3, 2021

Such an event would not help much since it requires that the developer adds a handler.

@davidfowl
Copy link
Member

Unfortunately there's no practical way to make the application aware other than to crash

@marhja
Copy link

marhja commented Feb 3, 2021

What do you mean by "crash"? Process.Kill()? Environment.FailFast()? I thought the idea was a graceful application shutdown using IHostApplicationLifetime.StopApplication().

@davidfowl
Copy link
Member

The current plan is to do that yes, which will in most cases unblock main and will quit the program.

@pinkfloydx33
Copy link

I'm a bit hard pressed to think of a specific case at the moment, but one of the comments above got me thinking...

Would it be possible to have a default value on the HostOptions and then some optional way to specify it on a per-service level? I'm sure there could be services that you may be OK ignoring as they might not be mission critical while others should tear down the app. I don't know how you'd track/register such a thing though. You wouldn't want it as a property on the service class/interface since lib authors could potentially control your app's lifetime but there's also no way to specify the option in an AddHostedService call since it's just adding to the service collection.

I'm sure the current API is fine for now and it doesn't seem like it would preclude such an enhancement in the future. Just throwing it out there....

@IEvangelist
Copy link
Member

I see this is up-for-grabs - I'm interested in working on this one, are you okay with me assigning it to myself @maryamariyan and @eerhardt?

@eerhardt
Copy link
Member

@IEvangelist - I assigned it to you. Thanks for taking this. I'm looking forward to seeing it being completed.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 1, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 16, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-Extensions-Hosting help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet