Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kestrel execution timeout support #10079

Closed
effyteva opened this issue May 8, 2019 · 8 comments
Closed

Kestrel execution timeout support #10079

effyteva opened this issue May 8, 2019 · 8 comments
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions

Comments

@effyteva
Copy link

effyteva commented May 8, 2019

Hi,

We've been facing some high CPU issues on our ASP.NET core 2.2 production environment for the past few weeks. The only way to "revive" our instances, was to restart Kestrel (or the server instance of course).
Following some hard time, we ended up finding some very rare scenario, in which our application was going through an infinite loop, which Kestrel never stopped processing.
This means any single request hitting this bug, created a 100% CPU usage task on our server instance, which never goes away...

We tried using the "RequestTimeout" setting on web.config (after switching to OutOfProcess hosting model), which returned an error to the client ("The specified CGI application encountered an error and the server terminated the process"), however, the server kept processing the request infinitely.

I believe the ExecutionTimeout which never made it to Kestrel on ASP.NET core might be a good solution for error handling on the server level.
For additional regarding ExecutionTimeout information, please see @Tratcher and @benaadams review and remarks here: aspnet/KestrelHttpServer#485, aspnet/KestrelHttpServer#611.

To re-produce the issue, a simple "while (true)" should be called during the request processing. We placed it on the view for simplicity.

Obviously all of our code is using the RequestAborted cancellation token when possible, yet in the case we mentioned, we had a complex code which shouldn't have been a long running CPU task, but ended up acting as an infinite loop.

We also tried creating a Middleware for limited the execution time, it didn't much good either, as we couldn't a way to "kill" the running task. Perhaps it could be done?

Tagging for future documentation purposes: dotnet/AspNetCore.Docs#12330

Thank you,
Effy

@Tratcher
Copy link
Member

Tratcher commented May 8, 2019

ASP.NET 4.x implemented this using Thread Abort which only worked because it assumed the request was executed by a single thread (or tracked by the execution context?).

There is no plan to bring back Thread Aborts nor would they be likely to work very well now that Core is fundamentally asynchronous.

@effyteva
Copy link
Author

effyteva commented May 9, 2019

ASP.NET 4.x implemented this using Thread Abort which only worked because it assumed the request was executed by a single thread (or tracked by the execution context?).

There is no plan to bring back Thread Aborts nor would they be likely to work very well now that Core is fundamentally asynchronous.

Thanks @Tratcher, I understand that,
What do you think could be done to resolve this? Perhaps on the application side?
Eventually there's software bugs, and on rare cases, bugs will cause infinite loops. I don't think the web server should crash due to a single Task being looped.

@Tratcher
Copy link
Member

Tratcher commented May 9, 2019

This isn't a web server specific problem, it could happen in any application. It's quite difficult for the application to self-diagnose and distinguish between an expensive operation and a broken one, let alone recover from it in such a way that the application is stable afterwards.

For this class of error I think you're better off using external monitoring and health checks. If the application ceases to respond in an appropriate manner then it can be terminated and restarted by an external agent and return to a stable state.

@guardrex
Copy link
Contributor

guardrex commented May 9, 2019

@Tratcher Would you like me to make such a remark (your 2nd para) in the vicinity of ANCM request timeout coverage for in-proc in the topic? If not, I'll close the doc issue.

@Tratcher
Copy link
Member

Tratcher commented May 9, 2019

No, ANCM doesn't really handle this.

@glennc does this make sense in the context of health checks?

@guardrex
Copy link
Contributor

guardrex commented May 9, 2019

btw - IIRC, it's the only place this type of timeout is covered, so that's why I thought it might be the best spot for a passing remark. Alternatively, we could surface it in a very short section of the server's topic. https://docs.microsoft.com/aspnet/core/fundamentals/servers/

The prob with Health Checks is that devs probably won't look there for this ... and search may leave them hanging.

@analogrelay
Copy link
Contributor

we couldn't a way to "kill" the running task

There's no way to forcibly terminate .NET Tasks. This is why we can't stop run-away CPU processes ourselves. Since we don't dedicate a thread to each request, there's no way for us to forcibly terminate the execution of a request. The best we can do is set a CancellationToken and expect the application to consider it.

Closing this as we have no plans to introduce a system to forcibly terminate request processes.

@effyteva
Copy link
Author

effyteva commented May 12, 2019

Thank you everyone,
As an alternative, we've ended up creating a Middleware for logging slow requests, this should allow at least tracing those issues on our production environment (in case they ever return).

using (var TokenSource = System.Threading.CancellationTokenSource.CreateLinkedTokenSource(Context.RequestAborted))
{
Context.RequestAborted = TokenSource.Token;
var DelayTask = Task.Delay(5000, TokenSource.Token);
var NextTask = _next.Invoke(Context);
if (await Task.WhenAny(DelayTask, NextTask).ConfigureAwait(false) == DelayTask)
{
//LogStuff
}
TokenSource.Cancel();
}

@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2019
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

No branches or pull requests

6 participants