Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of numConnectedSlots for connection pool #3904

Closed
brharrington opened this issue Sep 15, 2021 · 4 comments
Closed

Improve performance of numConnectedSlots for connection pool #3904

brharrington opened this issue Sep 15, 2021 · 4 comments
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted bug t:client Issues related to the HTTP Client t:client-new-pool t:core Issues related to the akka-http-core module
Milestone

Comments

@brharrington
Copy link
Contributor

Testing a use-case that involves making requests to many different hosts I found it would occasionally get bogged down using 100% CPU with most of the threads having the following stack trace:

    prio=5 group=main state=RUNNABLE
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$Slot.isConnected(NewHostConnectionPool.scala:213)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1.$anonfun$numConnectedSlots$1(NewHostConnectionPool.scala:143)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1.$anonfun$numConnectedSlots$1$adapted(NewHostConnectionPool.scala:143)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$$Lambda$1641/0x00007f5879546040.apply(Native Method)
        at scala.collection.IterableOnceOps.count(IterableOnce.scala:605)
        at scala.collection.IterableOnceOps.count$(IterableOnce.scala:602)
        at scala.collection.AbstractIterable.count(Iterable.scala:919)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1.akka$http$impl$engine$client$pool$NewHostConnectionPool$HostConnectionPoolStage$$anon$$numConnectedSlots(NewHostConnectionPool.scala:143)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$Slot.runOneTransition$1(NewHostConnectionPool.scala:331)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$Slot.loop$1(NewHostConnectionPool.scala:363)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$Slot.updateState(NewHostConnectionPool.scala:372)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$Slot.updateState(NewHostConnectionPool.scala:267)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$Slot.$anonfun$updateState$2(NewHostConnectionPool.scala:289)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$Slot$$Lambda$1606/0x00007f5961061040.apply$mcV$sp(Native Method)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1.$anonfun$safeCallback$1(NewHostConnectionPool.scala:608)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1.$anonfun$safeCallback$1$adapted(NewHostConnectionPool.scala:608)
        at akka.http.impl.engine.client.pool.NewHostConnectionPool$HostConnectionPoolStage$$anon$1$$Lambda$1464/0x00007f59632cc040.apply(Native Method)
        at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:467)
        at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:517)
        at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:625)
        at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:800)
        at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:818)
        at akka.actor.Actor.aroundReceive(Actor.scala:537)
        at akka.actor.Actor.aroundReceive$(Actor.scala:535)
        at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:716)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
        at akka.actor.ActorCell.invoke(ActorCell.scala:548)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
        at akka.dispatch.Mailbox.run(Mailbox.scala:231)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

I'm not entirely sure why it occasionally happens, but seems like once it does it is a bit of a vicious cycle and it doesn't recover. I was using superPool and had a fairly large max-connections setting because superPool uses that for the parallelism. However, numConnectedSlots is also linear to compute over the slots table which is sized based on max-connections. I have worked around this issue by not using superPool, but I think the pool implementation could also be improved to reduce the overhead. In particular, most of the time was spent in the check for pre-connect:

case s if !s.isConnected && s.isIdle && numConnectedSlots < settings.minConnections =>
  debug(s"Preconnecting because number of connected slots fell down to $numConnectedSlots")
  OptionVal.Some(Event.onPreConnect)

One simple option might be to have numConnectedSlots work more like Iterable.sizeCompare(Int) so it could short circuit quickly. In my case settings.minConnections was 0 so it should be possible to avoid traversing at all.

@jrudolph jrudolph added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted bug t:client Issues related to the HTTP Client t:client-new-pool t:core Issues related to the akka-http-core module labels Sep 16, 2021
@jrudolph
Copy link
Member

Thanks for the report and the thorough analysis, @brharrington. I think we could make this call faster but I wonder if it's only a symptom of something else leading to a loop in the pool. The particular stack trace you see might just be the slowest part of the loop.

The particular stack trace you showed handles a timeout event in line 289. Could you give your full configuration and some indication about how many pools (target hosts) are involved? Maybe some timeouts are set to some very low values so they trigger (too) frequently?

@brharrington
Copy link
Contributor Author

I had max-connections set to 16k and there were roughly 10k hosts for the test. Assuming it hit that case for every host it would be iterating over ~160M slots. There were some expected timeouts from 10 of the hosts (blocked via iptables) as I was testing to see how it would handle those issues. When it got into the bad state there were timeouts all over the place, but from what I could tell most were just symptoms of most dispatcher threads being bogged down with that stack trace.

The client config was mostly the default with the max connections and open requests increased:

akka.http.host-connection-pool {
  max-open-requests = 65536
  max-connections = 16384
}

@jrudolph
Copy link
Member

jrudolph commented Sep 16, 2021

Thanks for that explanation. It sounds like you can get into a feedback loop there where, over a certain threshold, the timeouts will cause just cause more timeouts.

If it's like that, increasing the timeout (and therefore decreasing pressure on timeout handling) should help or indeed improving the processing time in that region.

Adding another condition to first check for settings.minConnections > 0 seems easy enough, could you create a PR for that?

@brharrington
Copy link
Contributor Author

Sure, will create a PR when I get a chance. Thanks.

brharrington added a commit to brharrington/akka-http that referenced this issue Sep 23, 2021
Computing the number of connected slots is linear over the
max number of connections for a host. This change avoids
computing the number of connected slots for the preconnect
step when the minimum number of connections is zero.
@jrudolph jrudolph added this to the 10.2.7 milestone Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted bug t:client Issues related to the HTTP Client t:client-new-pool t:core Issues related to the akka-http-core module
Projects
None yet
Development

No branches or pull requests

2 participants