Make Akka internals immune against thread starvation issues #23576

jrudolph · 2017-08-29T11:37:30Z

There are uncountable numbers of issues caused by thread starvation issues. In many cases not only user code is affected (bad performance, bad CPU utilization, etc) but often Akka internals are affected as well. Examples are e.g. all kinds of remoting issues due to late messages that get no attention because threads are blocked. In streams it's quite easy to block or even deadlock ActorGraphInterpreters by user code blocking inside of (sub)streams. In Akka-Http basic infrastructure (like accepting new requests, handling timeout, delivering entity streams) can stop working when user code blocks.

The basic issue is that Akka's internals are by default not shielded against user code. We always tell people to use an IO or blocking dispatcher for "dangerous" tasks. That's definitely a good strategy for the user to "contain the damage". However, I wonder if we could more actively prevent user code from affecting Akka internals code by introducing containment ourselves by running internals on a dedicated dispatcher by default consciously trading off the latest bits of performance to keep the engine running.

Concretely that would mean:

we define an internal dispatcher used for all internal use cases, especially remoting and clustering
for framework-provided streams in general and for http in particular, we enforce (configurable) asynchronous boundaries between framework stream stages and user stream parts

The trade-off is a few extra threads and asynchronous boundaries. The exact number of internal threads would have to be decided. In the worst case of full utilization, it might mean that there are a few more threads battling for CPU resources (but note how our current defaults are also already overprovisioning threads) but preemptive OS scheduling of threads would still be fairer than user code blocking Akka internals from executing at all.

The expected result would be:

No more spurious reports of broken remoting connections due to thread starvation. Remaining issues be more likely due to excessive GC or real connection problems.
Some deadlock issues in Akka-Http would be automatically prevented because framework code and (potentially blocking) user code can still make progress.
Blocking user code will still affect user code and might lead to bad utilization.

The text was updated successfully, but these errors were encountered:

ktoso · 2017-08-29T12:21:38Z

I think I brought up a similar thing while we were discussing the [in]famous SO answer "Akka HTTP: Blocking in a future blocks the server", where blocking on a route kills everything since it blocks the default dispatcher... There I argued for giving that one a special dispatcher to protect the routing layer specifically, but yes - the same issue is elsewhere too.

So... I'd be in favour of such move if we can afford it.

ktoso · 2017-08-29T12:22:53Z

re 3rd point - It is true that "don't block!" will remain a valid and super important advice, but at least it will shield the system messages more from bad stuff so it may help delay issues somewhat for users until they're "in a really bad spot"..

patriknw · 2017-08-29T12:53:23Z

I agree that we might have to revisit this old decision. I'm not sure, but I think the arguments were that we by default should have small footprint (be able to run on Android and such), but it's probably not that important any longer. We could make it opt-in to use same dispatcher.

The performance penalty of the extra async boundaries in akka-http might not be great for benchmarks, but we can once again have it opt-in to use same dispatcher (without boundaries).

Realistically, we can't change these defaults in 2.5.x, but we can make it a recommended setting to turn on, and then change defaults in 2.6.

jrudolph · 2017-08-29T12:57:17Z

I'm not sure, but I think the arguments were that we by default should have small footprint (be able to run on Android and such), but it's probably not that important any longer.

The default settings with parallelism-factor = 3 actually contradict this intention ;)

The performance penalty of the extra async boundaries in akka-http might not be great for benchmarks, but we can once again have it opt-in to use same dispatcher (without boundaries).

Yep, it should be an option with the default changing in 2.6, I agree.

patriknw · 2017-08-29T12:58:37Z

The default settings with parallelism-factor = 3 actually contradict this intention ;)

yes, but note that FJP doesn't create threads eagerly and it stops unused threads, so it's not that bad (but I agree)

He-Pin · 2019-01-03T18:39:54Z

In our internal RPC framework, the default configuration which been used at very large scale is true as what @jrudolph just suggested.

We have a separated networking pool with 8 threads and a separated callbacks pool which with maximum 720 threads by default, and using a SynchronousQueue for communication. when there is no thread is free out there, the RPC framework will throw ThreadPool is full exception.

And every service stub methods could configure their own dedicated thread pool too. And user could also config sharing the networking pool with the user code and eliminating the callbacks pool too if the user's code is fully async.

I like this idea, as this is what we are currently using too and helps. If the user received an SMS notification warning about that Thread pool is full and he can do an online thread dumping to see what's happening and fix it up in the next release.

patriknw · 2019-03-08T13:14:46Z

As part of this we should also change parallelism-factor for default dispatcher to something closer to 1

johanandren · 2019-04-11T15:32:13Z

It could be a good idea to split the default-internal dispatcher work from the stream isolation part, they are fairly independent and the stream one seems more risky in terms of performance (putting the Tcp stream stages on a separate dispatcher for example)

* Another dispatcher can be selected through akka.actor.internal-dispatcher * Factor of default dispatcher lowered to 1

jrudolph added discuss Tickets that need some discussion before proceeding. Not decided if it's a good idea. t:core t:stream labels Aug 29, 2017

patriknw added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted and removed discuss Tickets that need some discussion before proceeding. Not decided if it's a good idea. labels Oct 30, 2017

patriknw added this to the 2.6 milestone Jan 3, 2019

patriknw added the medium label Mar 8, 2019

johanandren self-assigned this Apr 17, 2019

johanandren added 3 - in progress Someone is working on this ticket and removed 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted labels Apr 17, 2019

johanandren added a commit to johanandren/akka that referenced this issue Apr 18, 2019

Internal dispatcher for everything internal in akka-actor akka#23576

3e5a0cc

* Another dispatcher can be selected through akka.actor.internal-dispatcher * Factor of default dispatcher lowered to 1

johanandren mentioned this issue Apr 18, 2019

Make internals starvation safe #26775

Closed

johanandren mentioned this issue Apr 30, 2019

Internal dispatcher to protect against starvation #26816

Merged

patriknw modified the milestones: 2.6, 2.6.0-M1 May 2, 2019

patriknw removed the 3 - in progress Someone is working on this ticket label May 2, 2019

patriknw closed this as completed May 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Akka internals immune against thread starvation issues #23576

Make Akka internals immune against thread starvation issues #23576

jrudolph commented Aug 29, 2017 •

edited by patriknw

Loading

ktoso commented Aug 29, 2017

ktoso commented Aug 29, 2017

patriknw commented Aug 29, 2017

jrudolph commented Aug 29, 2017

patriknw commented Aug 29, 2017

He-Pin commented Jan 3, 2019

patriknw commented Mar 8, 2019

johanandren commented Apr 11, 2019

Make Akka internals immune against thread starvation issues #23576

Make Akka internals immune against thread starvation issues #23576

Comments

jrudolph commented Aug 29, 2017 • edited by patriknw Loading

ktoso commented Aug 29, 2017

ktoso commented Aug 29, 2017

patriknw commented Aug 29, 2017

jrudolph commented Aug 29, 2017

patriknw commented Aug 29, 2017

He-Pin commented Jan 3, 2019

patriknw commented Mar 8, 2019

johanandren commented Apr 11, 2019

jrudolph commented Aug 29, 2017 •

edited by patriknw

Loading