-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make aspnetcore shutdown work by default on Kubernetes #30387
Comments
I don't think we should detect if you're in k8s, it should just be a setting that we recommend people set in k8s. That would be the least surprising thing to do. Likely something akin to this ShutdownTimeout or maybe we can repurpose it? Should we throw if this setting is < the shutdown timeout? Are those 2 things configured in the same place? I think this should be implemented in the GenericWebHostedService.StopAsync. |
Thanks for contacting us. |
If we're going to require a specific action from developers to enable this this, why not just recommend doing the following as suggested in https://learnk8s.io/graceful-shutdown? lifecycle:
preStop:
exec:
command: ["sleep", "15"] From reading that article, this seems like a common problem. Do we know if any other frameworks continue processing new requests and accepting new connections after receiving a SIGTERM? Or if they even have the option? |
We do right now.
Not sure, but I think we should build it in because it's simple and solves a common problem in a core scenario. |
We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process. |
per my knowledge of Golang, when the SIGTERM is triggered, the golang web server will stop to accept new connection, and I think it will be helpful for aspnet customer if it can provide similar capacity. Currently, in our own application, we plan to add a customized middle ware to count the active http connection number (count ++ for incoming request, count-- when the request is done), and hold the applicationstopping handler until the connection number is 0 |
ASP.NET Core does this already. You can verify this by having a connection that does a Task.Delay for 10 seconds and watch the behavior on shutdown. The above describes that we should introduce a small delay before we start the shutdown process. Which you can do today in 2 ways:
|
Moving this into .NET 8 planning. We can make some small changes here to make this work a bit better than it does today in k8s (it can even be opt in). |
Graceful shutdown here is a bit of a misnomer because we shutdown gracefully today already. This is about dealing with the lack of coordination between signaling that the pod should be shutdown and spinning up a new pod. In a different reality, the new pod would be spun up before sending SIGTERM but it happens in parallel leading to this mess 😄 |
indeed, the mentioned delay is needed to bring some order to a process where two actions are executed in parallel - (1) where k8s removes the pod from the service Endpoints (from LoadBalancer) and (2) where the pod is signaled. |
Well, not quite. When
Either way you have some work to do to be really graceful
No, this is about the inevitable (as in "by design") delay between K8s signaling the pod to shut down via Handling this problem via I'd argue a better option is to introduce a mechanism for a delay between |
Based on this issue and dotnet/dotnet-docker#4502. Chronologically, this seems to be the desired behavior: After SIGTERM:
Maybe: during 1 and 2 new requests should be answered immediately taking into account the SIGTERM, for example, reply with Then there are two timeouts: *: this time should be chosen appropriately for when Kubernetes is expected to no longer send requests to this pod after SIGTERM. |
I believe the application should continue to accept new requests as long as Kubernetes continues to send them. so, I think the behaviour should be:
|
We've run into this recently on some high-volume applications, and after moving to k8s, we'd noticed a small amount of traffic lost on deployments. We've implemented a fix based on recommendations from https://github.com/dotnet/dotnet-docker/blob/main/samples/kubernetes/graceful-shutdown/graceful-shutdown.md, which has us following the sequence of events that @tmds described. My totally biased opinion would be that this would be a good candidate for .NET 8, as this hit us rather unexpectedly, I think we had a bit of a preconception that this would "just work". If this was something that was to be considered for some time soon, I guess two options for implementations could be to either:
|
I still like the shutdown delay idea. |
FWIW, essentially just reused
|
I also like the nullable |
I guess one question worth asking is where in the process of shutting down should a delay be added. The Host Lifetime feels like the most "natural" place, as there the shutdown of all services can be delayed uniformly, but that also has the most wide-reaching implications and would add a soft requirement to implement this delay to derivers of Another could be to have the delay in |
@hwoodiwiss I love it. @karolz-ms You wrote a doc about this recently right? |
Yes, https://github.com/dotnet/dotnet-docker/blob/main/samples/kubernetes/graceful-shutdown/graceful-shutdown.md is the guidance that @richlander and myself came up with. |
@karolz-ms My PoC implementation and the one we've used internally is heavily inspired by this guidance, thank you! I'm happy to raise a PR based on my PoC, at least as a talking point, and I can flesh it out with tests, and an implementation in |
Lets get it into .NET 9! |
I believe we are seeing this issue on our services but would like a way to reproduce it consitently. Any suggestions on how to do it? EDIT: Found this blog post that helps with reproducing this behaviour https://blog.markvincze.com/graceful-termination-in-kubernetes-with-asp-net-core/ |
Hi there @karolz-ms, Does this method work better than this approach? public class ApplicationLifetimeHostedService(IHostApplicationLifetime hostApplicationLifetime, ILogger<ApplicationLifetimeHostedService> logger) : BackgroundService
{
protected override Task ExecuteAsync(CancellationToken stoppingToken)
{
hostApplicationLifetime.ApplicationStopping.Register(() =>
{
logger.LogError("SIGTERM received, waiting for 90 seconds");
Thread.Sleep(TimeSpan.FromSeconds(90));
logger.LogError("Termination delay complete, continuing stopping process");
});
return Task.CompletedTask;
}
} |
@ArminShoeibi setting the ShutdownTimeout or having the ApplicationStopping handler block for a certain amount of time are essentially equivalent. Both will give requests in-flight extra time to finish. But they do not help with new requests that will come AFTER Kubernetes sent the replica a SIGTERM, but BEFORE the ingress has updated its routing tables. To allow these “late” requests to be processed and avoid 502 errors, you need to apply a custom IHostLifetime implementation. This is all described in more detail in the guide you referenced, specifically https://github.com/dotnet/dotnet-docker/blob/main/samples/kubernetes/graceful-shutdown/graceful-shutdown.md#adding-a-shutdown-delay part. |
Right now, as soon as an aspnetcore app sees CTRL + C, it immediately starts shutting down the server, rejecting and draining connections. This works well when running on command line, but when you move into k8s, requests may be dropped.
This article gives a general overview of the problem. Kubernetes, the ingress controller, CoreDNS need time to remove the IP address from their internal state.
There are some solutions where you could have a dotnet app listen for endpoint changes within the k8s cluster. The recommended approach is to actually just wait after receiving a SIGTERM signal, rather than immediately shutting down.
The modification would be to enable shutdown on k8s to:
A recommend time to wait is 15 seconds.
I think the most difficult part of this will be detecting if you are running inside of k8s. Besides that, a Task.Delay seems like an appropriate solution.
The text was updated successfully, but these errors were encountered: