-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Akka internals immune against thread starvation issues #23576
Comments
I think I brought up a similar thing while we were discussing the [in]famous SO answer "Akka HTTP: Blocking in a future blocks the server", where blocking on a route kills everything since it blocks the default dispatcher... There I argued for giving that one a special dispatcher to protect the routing layer specifically, but yes - the same issue is elsewhere too. So... I'd be in favour of such move if we can afford it. |
re 3rd point - It is true that "don't block!" will remain a valid and super important advice, but at least it will shield the system messages more from bad stuff so it may help delay issues somewhat for users until they're "in a really bad spot".. |
I agree that we might have to revisit this old decision. I'm not sure, but I think the arguments were that we by default should have small footprint (be able to run on Android and such), but it's probably not that important any longer. We could make it opt-in to use same dispatcher. The performance penalty of the extra async boundaries in akka-http might not be great for benchmarks, but we can once again have it opt-in to use same dispatcher (without boundaries). Realistically, we can't change these defaults in 2.5.x, but we can make it a recommended setting to turn on, and then change defaults in 2.6. |
The default settings with
Yep, it should be an option with the default changing in 2.6, I agree. |
yes, but note that FJP doesn't create threads eagerly and it stops unused threads, so it's not that bad (but I agree) |
In our internal RPC framework, the default configuration which been used at very large scale is true as what @jrudolph just suggested. We have a separated And every service stub methods could configure their own dedicated thread pool too. And user could also config sharing the I like this idea, as this is what we are currently using too and helps. If the user received an SMS notification warning about that |
As part of this we should also change parallelism-factor for default dispatcher to something closer to 1 |
It could be a good idea to split the default-internal dispatcher work from the stream isolation part, they are fairly independent and the stream one seems more risky in terms of performance (putting the Tcp stream stages on a separate dispatcher for example) |
* Another dispatcher can be selected through akka.actor.internal-dispatcher * Factor of default dispatcher lowered to 1
There are uncountable numbers of issues caused by thread starvation issues. In many cases not only user code is affected (bad performance, bad CPU utilization, etc) but often Akka internals are affected as well. Examples are e.g. all kinds of remoting issues due to late messages that get no attention because threads are blocked. In streams it's quite easy to block or even deadlock
ActorGraphInterpreters
by user code blocking inside of (sub)streams. In Akka-Http basic infrastructure (like accepting new requests, handling timeout, delivering entity streams) can stop working when user code blocks.The basic issue is that Akka's internals are by default not shielded against user code. We always tell people to use an IO or blocking dispatcher for "dangerous" tasks. That's definitely a good strategy for the user to "contain the damage". However, I wonder if we could more actively prevent user code from affecting Akka internals code by introducing containment ourselves by running internals on a dedicated dispatcher by default consciously trading off the latest bits of performance to keep the engine running.
Concretely that would mean:
The trade-off is a few extra threads and asynchronous boundaries. The exact number of internal threads would have to be decided. In the worst case of full utilization, it might mean that there are a few more threads battling for CPU resources (but note how our current defaults are also already overprovisioning threads) but preemptive OS scheduling of threads would still be fairer than user code blocking Akka internals from executing at all.
The expected result would be:
The text was updated successfully, but these errors were encountered: