[API Proposal]: When stopping an IHost instance, IHostedService instances StopAsync method is called sequentially instead of asynchronously #68036

jerryk414 · 2022-04-14T18:15:16Z

Background and motivation

As it is right now, when an IHost is asked to StopAsync for graceful shutdown, each IHostedService is requested to StopAsync in sequential order. There's no reason why each IHostedService.StopAsync task can't be executed asynchronously and awaited altogether.

Why is this a problem? Well technically everything is functioning to spec, but in practice, it's problematic.

So normally the idea of being able to set the ShutdownTimeout is great.. we can set it to say, 30 seconds, and say we expect all IHostedService instances to take no longer than 30 seconds to shutdown gracefully. The problem is that, this is an additive timeout. That is to say, if you have 10 IHostedService instances, and each one of them takes 20 seconds to shutdown, your graceful shutdown now takes 200 seconds. Why do that when you can just have everything shutdown asynchronously in 20 seconds?

One example of the problem with this is that, when deploying apps to ECS in AWS, when ECS triggers an instance of an app to shutdown (which occurs under normal circumstances such as when scaling down), the maximum amount of time it gives for the app to gracefully shutdown is 120 seconds (see stopTimeout), after that it just kills the process.

There's no reason that i can see that the IHostedService instances should be shutdown sequentially, they are async for a reason, and if shutdown order is important, than it should be managed by the consumer themselves, or at the very least, there should be an option for asynchronous IHostedService shutdown.

Regression?

This is not a regression

Analysis

Link to code causing the issue:
https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Hosting/src/Internal/Host.cs#L126

This code could easily be modified to something similar to:

HashSet<Task> tasks = new HashSet<Task>();

foreach (IHostedService hostedService in _hostedServices.Reverse()) /* Why does this need to be reversed anyways? */
{
   tasks.Add(hostedService.StopAsync(token));
}

// WhenAll would be great here, but it will swallow up exceptions.. so this is the next best thing
foreach (Task task in tasks)
{
   try
   {
      await task.ConfigureAwait(false);
   }
   catch (Exception ex)
   {
      exceptions.Add(ex);
   }
}

API Proposal

    public class HostOptions
    {
+        public HostedServiceStopBehavior HostedServiceStopBehavior { get; set; } =
+            HostedServiceStopBehavior.Asynchronous;

        public BackgroundServiceExceptionBehavior BackgroundServiceExceptionBehavior { get; set; } =
            BackgroundServiceExceptionBehavior.StopHost;
    }
    
+    /// <summary>
+    /// Specifies a behavior that the <see cref="IHost"/> will honor when stopping <see cref="IHostedService"/> instances.
+    /// </summary>
+    public enum HostedServiceStopBehavior
+    {
+        /// <summary>
+        /// Indicates that the host will stop the <see cref="IHostedService"/> instances asynchronously.
+        /// </summary>
+        Asynchronous = 0,
+
+        /// <summary>
+        /// Indicates that the host will wait for the previous <see cref="IHostedService"/> instance to stop before stopping the next instance.
+        /// </summary>
+        Sequential = 1
+    }

API Usage

   public static IHostBuilder CreateWebHostBuilder(string[] args)
   {
      return Host.CreateDefaultBuilder(args)
         .ConfigureServices((context, services) =>
         {
            services.Configure<HostOptions>(options => 
            {
               options.BackgroundServiceStopBehavior = BackgroundServiceStopBehavior.Sequential; // Defaults to asynchronous
            }
         });
   }

Risks

The default behavior will technically be changing, which would be a breaking change. However, it may be a "safe" breaking change in the sense that the likelihood of someone building an app which depends on this previously internal behavior is highly unlikely and IF they did happen to, there is a way to revert back to the old behavior.

The text was updated successfully, but these errors were encountered:

ghost · 2022-04-14T18:15:22Z

Tagging subscribers to this area: @dotnet/area-extensions-hosting
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

As it is right now, when an IHost is asked to StopAsync for graceful shutdown, each IHostedService is requested to StopAsync in sequential order. There's no reason why each IHostedService.StopAsync task can't be executed asynchronously and awaited altogether.

Why is this a problem? Well technically everything is functioning to spec, but in practice, it's problematic.

So normally the idea of being able to set the ShutdownTimeout is great.. we can set it to say, 30 seconds, and say we expect all IHostedService instances to take no longer than 30 seconds to shutdown gracefully. The problem is that, this is an additive timeout. That is to say, if you have 10 IHostedService instances, and each one of them takes 20 seconds to shutdown, your graceful shutdown now takes 200 seconds. Why do that when you can just have everything shutdown asynchronously in 20 seconds?

The real problem with this is that, when deploying apps to ECS in AWS, for example, when ECS triggers an instance of an app to shutdown, the maximum amount of time it gives for the app to gracefully shutdown is 120 seconds (see stopTimeout)e, after that it just kills the process.

There's no reason that i can see that the IHostedService instances should be shutdown sequentially, they are async for a reason, and if shutdown order is important, than it should be managed by the consumer themselves, or at the very least, there should be an option for asynchronous IHostedService shutdown.

Configuration

Regression?

I don't think it is

Data

Analysis

Link to code causing the issue:
https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Hosting/src/Internal/Host.cs#L126

HashSet<Task> tasks = new HashSet<Task>();

foreach (IHostedService hostedService in _hostedServices.Reverse()) /* Why does this need to be reversed anyways? */
{
   tasks.Add(hostedService.StopAsync(token));
}

// WhenAll would be great here, but it will swallow up exceptions.. so this is the next best thing
foreach (Task task in tasks)
{
   try
   {
      await task.ConfigureAwait(false);
   }
   catch (Exception ex)
   {
      exceptions.Add(ex);
   }
}

Author:	jerryk414
Assignees:	-
Labels:	`tenet-performance`, `untriaged`, `area-Extensions-Hosting`
Milestone:	-

eerhardt · 2022-04-14T19:01:28Z

cc @davidfowl - we were just discussing this a couple weeks ago.

davidfowl · 2022-04-15T06:18:02Z

Agree we should do this and list it as a breaking change (as stop async will run concurrently now). I don't know if we need a flag to opt out of this behavior though.

davidfowl · 2022-04-15T06:18:50Z

@jerryk414 would you be willing to send a PR for this change?

quixoticaxis · 2022-04-17T22:20:41Z

Agree we should do this and list it as a breaking change (as stop async will run concurrently now). I don't know if we need a flag to opt out of this behavior though.

Any knob to keep the old "started first — stopped last" would be appreciated. We use it to stop stateful in-process services which have cross dependencies.

jerryk414 · 2022-04-17T23:38:06Z

I can do that. I'll get an initial implementation done this week to see what thoughts are once the changes are visible.

jerryk414 · 2022-04-19T14:45:01Z

@davidfowl @quixoticaxis
I've added an initial PR for this to make the default behavior asynchronous, with an option for specifying the behavior to be sequential. It would still be a breaking change, but would add the ability to change the behavior back to the way it was.

There are no tests yet - I want to get input if this is the route you would prefer prior to spending time on that.

quixoticaxis · 2022-04-19T18:45:08Z

@davidfowl @quixoticaxis
I've added an initial PR for this to make the default behavior asynchronous, with an option for specifying the behavior to be sequential. It would still be a breaking change, but would add the ability to change the behavior back to the way it was.

There are no tests yet - I want to get input if this is the route you would prefer prior to spending time on that.

For our product it would be easy to switch back to, looks great.

jerryk414 · 2022-05-06T17:29:38Z

@davidfowl @quixoticaxis I've added an initial PR for this to make the default behavior asynchronous, with an option for specifying the behavior to be sequential. It would still be a breaking change, but would add the ability to change the behavior back to the way it was.

There are no tests yet - I want to get input if this is the route you would prefer prior to spending time on that.

@davidfowl @quixoticaxis
The PR for this is ready for review.

eerhardt · 2022-05-19T17:31:29Z

I've updated the API proposal above and marked this proposal as ready for review. We will need to get the API proposal approved before we can make this change.

jerryk414 · 2022-05-19T19:06:00Z

I've updated the API proposal above and marked this proposal as ready for review. We will need to get the API proposal approved before we can make this change.

Thank you @eerhardt!

terrajobst · 2022-06-23T17:57:29Z

Video

We'd prefer Concurrent over Asynchronous.
It seems start is currently also sequential
- Would we want to configure this as well?
- Does this imply that this can breaking because people have interdependencies between services?

namespace Microsoft.Extensions.Hosting;

public partial class HostOptions
{
    public bool ServicesStartConcurrently { get; set; } // = false; Maybe change in 8?
    public bool ServicesStopConcurrently { get; set; } // = false; Maybe change in 8?
}

quixoticaxis · 2022-06-23T19:16:01Z

- Does this imply that this can breaking because people have interdependencies between services?

It would probably not only break dependencies, but also mess up the exception handling around StartAsync.

jerryk414 · 2022-06-24T18:38:47Z

@terrajobst
I'm presuming that based on the comment, you would want the default to be false for the time being.

I like the idea of just making it a bool since there really should only ever be two behaviors. I can work on getting this change together and updating the initial PR changes to account for this.

Just to answer this question more specifically... when hosting apps in AWS ECS, you have an absolute maximum shutdown time of 120 seconds - even if you set that higher in the host itself via ShutdownTimeout . We were running into it in a real-life production scenario where shutdown was taking longer than that. We fixed it by basically just announcing to all services that a shutdown is incoming and have them all start prepping to shutdown concurrently (if one starts shutting down, we can presume they all will be soon).

From the description in the issue:
One example of the problem with this is that, when deploying apps to ECS in AWS, when ECS triggers an instance of an app to shutdown (which occurs under normal circumstances such as when scaling down), the maximum amount of time it gives for the app to gracefully shutdown is 120 seconds (see [stopTimeout](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html)), after that it just kills the process.

@quixoticaxis
I do see what you're saying - as it is right now StartAsync doesn't do anything to manage multiple exceptions - it will just stop on the first failure and return that.

I can update it to behave the same as StopAsync though where it will aggregate all exceptions and just throw an AggregateException when there is more than one failure with some generic message One or more hosted services failed to start.

Basically copy what is being done here:
https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Hosting/src/Internal/Host.cs#L156

quixoticaxis · 2022-06-25T01:20:02Z

@jerryk414 I personally am not certain AggregateException would be a desired scenario, because in case some services failing and others starting you'd have to wait for all of the StartAsync either completing or faulting. Maybe throwing on the first exception and cancelling all other services (either currently starting or already started) is a better way.

Starting seems to be fundamentally different from stopping, IMHO.

jerryk414 · 2022-06-25T01:44:16Z

@quixoticaxis

I would disagree with that for two reasons.

In order to guarantee stopping on the first exception, you need to run things synchronously, which defeats the purpose. In addition, trying to cancel tasks or ignoring them and immediately returning on an exception sounds very misleading.
The aggregate exception would only occur on new behavior - you'd have to opt into it. I would also feel secure saying throwing an aggregate exception in a process that spawns many other tasks is not necessarily unexpected behavior.

Another fun point, assuming StartAsync is awaited using await, a bit of fancy magic happens that will cause the first exception in the AggregateException to be what is actually thrown when the task returns. So you as a consumer, you will effectively get the first exception... Unless you do something like a ContinueWith and inspect the actual exception.

quixoticaxis · 2022-06-25T01:48:18Z

@jerryk414

In order to guarantee stopping on the first exception, you need to run things synchronously, which defeats the purpose. In addition, trying to cancel tasks or ignoring them and immediately returning on an exception sounds very misleading.

No, I believe I don't, but, sorry, I'll post the code snippet tomorrow.

Your seconds point seems valid to me though.

jerryk414 · 2022-06-25T01:58:51Z

@jerryk414

In order to guarantee stopping on the first exception, you need to run things synchronously, which defeats the purpose. In addition, trying to cancel tasks or ignoring them and immediately returning on an exception sounds very misleading.

No, I believe I don't, but, sorry, I'll post the code snippet tomorrow.

Your seconds point seems valid to me though.

Yeah maybe I'm just unaware, but I'm not aware of any way to guarantee stopping on a first error without running each startup one by one. I don't have anything in front of me right now, but is there even a cancelation token that's passed into each start async? Even if there is, I doubt many people check it.

And assuming they don't check it, you could end up in a situation where either you're killing threads with these tasks, which could lead to corrupted states, or you could end up just swallowing other real exceptions, which is also bad.

quixoticaxis · 2022-06-25T17:58:16Z

@jerryk414 , if someone starts the services concurrently, I suppose it's only fair to assume that their starting logic may be running for prolonged period of time, so there's a fair chance that they respect cancellation requests (at the very least, we do in our stateful services that pre-load data in StartAsync). If starting phase is fast, than what's the point of doing it in parallel?

Considering your point about AggregateException being an expected behavior I suppose the start may look like something along the lines (throws AggregateException and handles cancellation in case of long-running starting logic):

using var stopper = CancellationTokenSource.CreateLinkedTokenSource(originalToken);

await Task.WhenAll(
    services
        .Select(
            service => service.StartAsync(stopper.Token))
        .Select(
            task => task.ContinueWith(original =>
            {
                if (original.IsFaulted)
                {
                    try
                    {
                        stopper.Cancel();
                    }
                    catch (AggregateException)
                    {
                        // probably report cancellation callback exceptions
                    }

                    var dispatcher = ExceptionDispatchInfo.Capture(
                        original.Exception!);
                    dispatcher.Throw();
                }
            })));

Anyway, the longer I think about it, the more I'm inclined to assume that concurrent start is much more tricky than concurrent stop.

jerryk414 · 2022-06-25T19:43:16Z

@jerryk414 , if someone starts the services concurrently, I suppose it's only fair to assume that their starting logic may be running for prolonged period of time, so there's a fair chance that they respect cancellation requests (at the very least, we do in our stateful services that pre-load data in StartAsync). If starting phase is fast, than what's the point of doing it in parallel?

Considering your point about AggregateException being an expected behavior I suppose the start may look like something along the lines (throws AggregateException and handles cancellation in case of long-running starting logic):
using var stopper = CancellationTokenSource.CreateLinkedTokenSource(originalToken);

await Task.WhenAll(
    services
        .Select(
            service => service.StartAsync(stopper.Token))
        .Select(
            task => task.ContinueWith(original =>
            {
                if (original.IsFaulted)
                {
                    try
                    {
                        stopper.Cancel();
                    }
                    catch (AggregateException)
                    {
                        // probably report cancellation callback exceptions
                    }

                    var dispatcher = ExceptionDispatchInfo.Capture(
                        original.Exception!);
                    dispatcher.Throw();
                }
            })));
Anyway, the longer I think about it, the more I'm inclined to assume that concurrent start is much more tricky than concurrent stop.

Yeah, I'm just thinking it doesn't have to be that complicated. With it being a new feature, it could just be accepted as being the way it is. If you have a problem with it and have interdependencies that could cause you to need to stop a start async method if one fails first or require order, you could revert back to the old way, or add your own inter-service communication.

I personally would argue the ROI isn't there for the extra complexity of ensuring an immediate return on first exceptions.

quixoticaxis · 2022-06-26T00:34:13Z

@jerryk414 my point was that I doubt that concurrent start should be implemented in the first place: too many different usage scenarios.

buyaa-n · 2022-12-07T18:15:56Z

@jerryk414 are you still planning to work on it?

jerryk414 · 2022-12-30T14:53:51Z

@jerryk414 are you still planning to work on it?

I am - I had a PR up but honestly, it's funny. I'm doing this on my own time - i've got kids, family, and a fulltime job, and it's the holidays, so i just didn't get around to addressing all comments in the PR, and my personal computer took a crap on me, so i've got to get this all setup on another machine. It certainly would be nice if someone else who is paid to do this took ownership and addressed the very minor issues with the PR below.

#75894

If nobody does, I do have plans to get back around to it, but not until the new year.

buyaa-n · 2022-12-30T18:25:37Z

I am - I had a PR up but honestly, it's funny. I'm doing this on my own time - i've got kids, family, and a fulltime job, and it's the holidays, so i just didn't get around to addressing all comments in the PR, and my personal computer took a crap on me, so i've got to get this all setup on another machine. It certainly would be nice if someone else who is paid to do this took ownership and addressed the very minor issues with the PR below.

Sure understand, no rush or any pressure, this is open source and it's up to you if you work on it or when you work. Though we would like to assign the issue if somebody already started working on it, so that other contributes would not start working on it simultaneously, if nobody working on it, we would add help wanted label so that other contributors could find it easily. I just saw you put up a PR and checking if you still planning to finish that PR.

If nobody does, I do have plans to get back around to it, but not until the new year.

Thank you, for now, nobody has planned to work on it, sounds like we can add help wanted label

pedrobsaila · 2023-03-27T19:38:19Z

Hey @buyaa-n I want to give it a try, you can assign it to me

jerryk414 added the tenet-performance Performance related issue label Apr 14, 2022

dotnet-issue-labeler bot added area-Extensions-Hosting untriaged New issue has not been triaged by the area owner labels Apr 14, 2022

jerryk414 changed the title ~~When stopping an IHost instance, IHostedService instances shutdown sequentially instead of asynchronously~~ When stopping an IHost instance, IHostedService instances StopAsync method is called sequentially instead of asynchronously Apr 14, 2022

jerryk414 mentioned this issue Apr 19, 2022

Stop IHostedService instances asynchronously by default #68216

Closed

ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 19, 2022

eerhardt added the api-ready-for-review API is ready for review, it is NOT ready for implementation label May 19, 2022

eerhardt added this to the 7.0.0 milestone May 19, 2022

ghost removed the untriaged New issue has not been triaged by the area owner label May 19, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 19, 2022

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 23, 2022

eerhardt modified the milestones: 7.0.0, Future Jul 19, 2022

jerryk414 mentioned this issue Sep 20, 2022

Add the ability to start and stop IHostedService instances concurrently #75894

Closed

maryamariyan modified the milestones: Future, 8.0.0 Nov 7, 2022

maryamariyan mentioned this issue Nov 7, 2022

Microsoft.Extensions work planned for .NET 8 #77390

Closed

36 tasks

buyaa-n added the help wanted [up-for-grabs] Good issue for external contributors label Dec 30, 2022

buyaa-n assigned pedrobsaila Mar 28, 2023

buyaa-n removed the help wanted [up-for-grabs] Good issue for external contributors label Mar 28, 2023

pedrobsaila mentioned this issue Mar 28, 2023

offers to start and stop IHostedService concurrently #84048

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 28, 2023

buyaa-n closed this as completed in #84048 Apr 11, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 11, 2023

pedrobsaila mentioned this issue Apr 22, 2023

offers the synchronous part of start and stop in IHostedService to run concurrently #85191

Merged

ghost locked as resolved and limited conversation to collaborators May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: When stopping an IHost instance, IHostedService instances StopAsync method is called sequentially instead of asynchronously #68036

[API Proposal]: When stopping an IHost instance, IHostedService instances StopAsync method is called sequentially instead of asynchronously #68036

jerryk414 commented Apr 14, 2022 •

edited by eerhardt

ghost commented Apr 14, 2022

Description

Configuration

Regression?

Data

Analysis

eerhardt commented Apr 14, 2022

davidfowl commented Apr 15, 2022

davidfowl commented Apr 15, 2022

quixoticaxis commented Apr 17, 2022

jerryk414 commented Apr 17, 2022

jerryk414 commented Apr 19, 2022

quixoticaxis commented Apr 19, 2022

jerryk414 commented May 6, 2022

eerhardt commented May 19, 2022

jerryk414 commented May 19, 2022

terrajobst commented Jun 23, 2022 •

edited by dotnet-api-review bot

quixoticaxis commented Jun 23, 2022

jerryk414 commented Jun 24, 2022 •

edited

quixoticaxis commented Jun 25, 2022 •

edited

jerryk414 commented Jun 25, 2022

quixoticaxis commented Jun 25, 2022 •

edited

jerryk414 commented Jun 25, 2022

quixoticaxis commented Jun 25, 2022

jerryk414 commented Jun 25, 2022

quixoticaxis commented Jun 26, 2022

buyaa-n commented Dec 7, 2022

jerryk414 commented Dec 30, 2022 •

edited

buyaa-n commented Dec 30, 2022 •

edited

pedrobsaila commented Mar 27, 2023 •

edited

[API Proposal]: When stopping an IHost instance, IHostedService instances StopAsync method is called sequentially instead of asynchronously #68036

[API Proposal]: When stopping an IHost instance, IHostedService instances StopAsync method is called sequentially instead of asynchronously #68036

Comments

jerryk414 commented Apr 14, 2022 • edited by eerhardt

Background and motivation

Regression?

Analysis

API Proposal

API Usage

Risks

ghost commented Apr 14, 2022

Description

Configuration

Regression?

Data

Analysis

eerhardt commented Apr 14, 2022

davidfowl commented Apr 15, 2022

davidfowl commented Apr 15, 2022

quixoticaxis commented Apr 17, 2022

jerryk414 commented Apr 17, 2022

jerryk414 commented Apr 19, 2022

quixoticaxis commented Apr 19, 2022

jerryk414 commented May 6, 2022

eerhardt commented May 19, 2022

jerryk414 commented May 19, 2022

terrajobst commented Jun 23, 2022 • edited by dotnet-api-review bot

quixoticaxis commented Jun 23, 2022

jerryk414 commented Jun 24, 2022 • edited

quixoticaxis commented Jun 25, 2022 • edited

jerryk414 commented Jun 25, 2022

quixoticaxis commented Jun 25, 2022 • edited

jerryk414 commented Jun 25, 2022

quixoticaxis commented Jun 25, 2022

jerryk414 commented Jun 25, 2022

quixoticaxis commented Jun 26, 2022

buyaa-n commented Dec 7, 2022

jerryk414 commented Dec 30, 2022 • edited

buyaa-n commented Dec 30, 2022 • edited

pedrobsaila commented Mar 27, 2023 • edited

jerryk414 commented Apr 14, 2022 •

edited by eerhardt

terrajobst commented Jun 23, 2022 •

edited by dotnet-api-review bot

jerryk414 commented Jun 24, 2022 •

edited

quixoticaxis commented Jun 25, 2022 •

edited

quixoticaxis commented Jun 25, 2022 •

edited

jerryk414 commented Dec 30, 2022 •

edited

buyaa-n commented Dec 30, 2022 •

edited

pedrobsaila commented Mar 27, 2023 •

edited