Akka Http Client pool connections are not reestablished after DNS positive-ttl #1226

wojda · 2017-06-23T09:59:44Z

We have found that under some circumstances, Akka's Http client is not honoring the positive-ttl expiry value, not picking up new DNS entries.

It looks as if under load, and using the default http connection pool, the client will never try resolving again the DNS entry if the connection is not closed, regardless of the positive-ttl value.

Steps to reproduce:
0. Using akka-http 10.0.6 and akka-core 2.5.2

DNS resolves test.com to server_A with ip_A
Run akka http application with following settings:
dns.inet-address { positive-ttl = 30s negative-ttl = 30s }
Run load continuously to this akka http app which does requests to test.com
Change DNS entry (in /etc/hosts, for instance) to point to ip_B. NOTE: server_A with ip_A is still running.
Wait for positive-ttl dns cache expiry (30 seconds, in this example)

Expected behaviour:

DNS cache expires, every new request should be sent to ip_B.

Current behaviour:

New requests after DNS expiry time are still going to ip_A.
The only way to have the akka http application to pick up the new DNS entry is by restarting.

The text was updated successfully, but these errors were encountered:

jrudolph · 2017-06-26T15:35:13Z

That seems to be the case because akka's DNS resolver is based on JVM's InetAddress.getAllByName which introduces another layer of caching.

You can already observe the behavior by just using java.net.InetAddress.getAllByName("...") and changing /etc/hosts entries in between.

It seems the JVM DNS caching layer is configured using java.security.Security properties which are defined in a security file if a SecurityManager is installed, otherwise it can be overridden (or turned of in this case) by setting this JVM property using -Dsun.net.inetaddr.ttl=0. Can you see if that works for you?

jrudolph · 2017-06-26T15:36:13Z

See also https://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html#nct and http://www.myhowto.org/java/42-understanding-host-name-resolution-and-dns-behavior-in-java/ and https://stackoverflow.com/questions/1256556/any-way-to-make-java-honor-the-dns-caching-timeout-ttl#1256609

wojda · 2017-07-02T00:26:31Z

Thank you @jrudolph for a quick response. The problem is not related to JVM's DNS resolver unfortunately, that would be something easy to fix.
I've written a test that is failing and shows the issue, you can find it here: https://github.com/wojda/AkkaHttpDnsInvestigation. I wanted to make sure it's not the problem with JVM or other system config so the test starts three docker containers, one with akka http client, and two with the same server (but different ip). You can build and run the test with one command, please check readme.md. I hope the test will be useful.

I've done a quick investigation too. According to logs from Akka, a hostname is only resolved when a new connection is created. Example:

[DEBUG] [akka://client-system/system/IO-TCP/selectors/$a/7] Resolving server.com before connecting
[DEBUG] [akka://client-system/system/IO-TCP/selectors/$a/7] Attempting connection to [server.com/172.17.0.3:8080]
[DEBUG] [akka://client-system/system/IO-TCP/selectors/$a/7] Connection established to [server.com:8080]

In my case, because of high TPS (no idle connections) and the fact that the depricated server_A is still running and responding, a new connection is never created. After changing DNS entry, Akka Http client uses existing connection pool. Please correct me if I'm wrong, in that case 'positive-ttl' has no effect, because Akka Http client does not create new connections and not resolve the hostname.

jrudolph · 2017-07-05T08:57:15Z

Akka Http client does not create new connections and not resolve the hostname.

Yes, that's correct. I guess if you need connections to be refreshed we would need to add another feature to restrict the life-time of persistent connections (which could be a reasonable thing to do).

mdedetrich · 2017-07-19T09:55:40Z

I suspect that this is causing issues on our end where the underlying host isn't getting updated due to dns timeout not being honoured, is there a workaround for this?

jrudolph · 2017-07-19T11:45:02Z

@mdedetrich I think so far it isn't clear what could or should be done on the Akka HTTP level.

So far, the only confirmed "issue" in Akka Http is that it keeps active persistent connections open for as long as possible. I'd say that's pretty reasonable behavior. Why make a new connection (potentially to a new IP address) when the old one is still alive and serving requests? Or are you seeing something different? Can the server be changed to close connections after a while?

Are there any other HTTP clients that actually couple DNS lifetimes with lifetimes of pool connections?

jrudolph · 2017-07-19T11:48:10Z

That said, we might want to an API to give users more control over the pools. This could e.g. be a method that requests to close all connections to a given host without shutting down that pool completely.

jrudolph · 2017-07-19T11:50:30Z

square/okhttp#3374 also suggested to solve this on the server side / loadbalancer.

randomstatistic · 2017-07-19T14:41:13Z

Why make a new connection (potentially to a new IP address) when the old one is still alive and serving requests?

This is the reason I got interested in the thread. Regardless of the mechanism that you use to convert a "host reference" to a pool of servers, (DNS, LB) you end up with a pool of persistent connections.

So let's say you have two servers A, B. Your client establishes a connection pool with roughly the same number of connections to each of the two, because balancing load is what your "host reference" is for.
Now B goes down, maybe you just need to restart it. All the connections to B are broken, and the client establishes replacement connections to A to get the pool back up to the desired size.
Now B comes back up, but there's no way (unless I'm missing something) to instruct the client to rebalance the persistent connections. All the traffic is now going to A until A closes its connections.

A connection lifetime (either in duration or request count) would help solve this by gradually rebalancing the connection pool.

jrudolph · 2017-07-19T14:56:18Z

A connection lifetime (either in duration or request count) would help solve this by gradually rebalancing the connection pool.

I agree that this would probably help. But also note, that you are pushing a backend issue to the client here. I think this issue can be seen as evidence that this is a brittle solution that requires full control over all sides of the connections.

mdedetrich · 2017-07-20T11:47:53Z

@jrudolph My issue was actually unrelated, so you can ignore my earlier comment

sergeykolbasov · 2017-07-20T12:54:17Z

Hi there

I guess, it would be nice and meaningful to have behaviour similar to what Finagle did for their client.

Watermark connection pool with lower and higher marks
Graceful rotation of connections by TTL. Let's say, every few minutes (or any other configurable value) new connection is pushed in pool while old one is popped, in respect of low mark.

jrudolph · 2017-07-20T13:31:37Z

Watermark connection pool with lower and higher marks

@imliar could you explain how this is related to this ticket? I tried to understand the documentation but from a glance I didn't understand what this is about? Maybe it's because finagle is about services while akka-http is only concerned about http?

Graceful rotation of connections by TTL. Let's say, every few minutes (or any other configurable value) new connection is pushed in pool while old one is popped, in respect of low mark.

I guess you mean .withSession.maxLifeTime(20.seconds) which seems to be similar to the suggestion above.

sergeykolbasov · 2017-07-20T13:40:46Z

@jrudolph Yes, but no

Watermark connection pool is just one of mechanics for a pooling when you have minimal and maximal amount of connections in pool, and as far as you have more load than minimal amount of connections could serve, it will increase em up to higher mark.

Connection shut down could be achieved with any pooling, but with WM amount of connections will never go down to zero (unless it's not specifically defined by configuration) resulting to cold connection pool. It could be a different topic ofc

jrudolph · 2017-07-20T13:44:18Z

Sounds like our min-connections / max-connections settings.

avietrov · 2017-07-29T08:48:00Z

But also note, that you are pushing a backend issue to the client here. I think this issue can be seen as evidence that this is a brittle solution that requires full control over all sides of the connections.

@jrudolph an example that doesn't involve any back-ends failing is gradual traffic switch. If that is achieved by having two load balances and a weighted DNS resolution (e.g. how AWS Route 53 does it), then the issue cannot be solved on load balancer level (as suggested referencing okhttp), since the traffic is actually getting diverted from one LB to another.

In this case the only solution I'm aware of is to forcefully kill the "old" LB, thus throwing 5xx, which will kill connections on client's side and force akka-http to re-establish new connections, which in its turn resolves DNS. Doing so at high load, results in significant amount of errors and most likely opening a circuit breaker. And nether client nor server are happy about that.

jrudolph · 2017-07-29T09:02:07Z

If it's gradual the load balancer can start to close idle persistent connections and send connection:close headers, there's no need to send out 50x when it's not urgent.

alivanni · 2017-10-05T09:21:20Z

If it's gradual the load balancer can start to close idle persistent connections

@jrudolph I feel like there is chicken and egg problem here. Connections will never become idle because client will never move traffic away from old LB / stack. This is actually what we are trying to achieve - force client to start sending requests to new stack.

agorski · 2018-04-20T06:25:01Z

@jrudolph do you consider any solution for the issue?
Caching DNS entries forever is not the best idea for cloud. You can add or remove servers dynamically, so caching will not work.

Do you have any idea at least for workaround?

MikhailGolubtsov · 2018-06-21T13:59:23Z

@jrudolph I agree with @agorski and @alivanni and don't see a way how to workaround outside of akka-http. I cannot make additional arguments, but please consider it a real issue, it's critical to my team by causing trouble in production and if there is no solution we have to migrate away from akka-http client unfortunately (and I know another team in Zalando who did also because of this).

johanandren · 2018-06-22T07:49:52Z

There is a PR in progress which adds max-connection-keep-alive-time.

raboof · 2018-06-26T11:45:04Z

Related to #1768. We are aware and agree this is an area where we plan to improve. We're currently working on improving our DNS infrastructure and one of the next steps is to also take into account the TTL.

AnanyaDeb · 2021-09-27T08:52:22Z

My team is facing a related problem and needed some guidance regarding the same. We are using scredis library to connect to AWS redis. On change of ip of the redis node, re resolution of the host is not being attempted.

The relevant section application conf setting is :

            "negative-ttl" : "never",
            "positive-ttl" : "never",
            "provider-object" : "akka.io.dns.internal.AsyncDnsProvider",
            "resolve-timeout" : "5s",
            "search-domains" : "default"
        },
        "dispatcher" : "akka.actor.internal-dispatcher",
        "resolver" : "async-dns"
    }

The relevant versions we are using are:

    akkaHttp = "10.2.6"
    scredis = "2.4.3"
    akkaActor = "2.6.16"

From the logs we see when the older ip address ceases to respond the reconnection attempt happens to older ip rather than trying to re resolve the hostname:

[INFO] [akka://scredis/user/<$hostname>-6379-listener-actor] Connection has been shutdown abruptly
[INFO] [scredis-scredis.io.akka.io-dispatcher-20] [akka://scredis/user/<$hostname>-6379-listener-actor/<$hostname>-6379-io-actor-2] Connecting to <$hostname>/<old_ip>:6379
[ERROR][scredis-scredis.io.akka.io-dispatcher-20] [akka://scredis/user/<$hostname>-6379-listener-actor/<$hostname>-6379-io-actor-2] Could not connect to <$hostname>/<old_ip>:6379: Command failed

So in spite of connection being shut down the new connection does not re resolve the hostname. Could you direct us to what the reason could be.

jrudolph · 2021-11-08T13:05:58Z

You can enable debug logging to see what the AsyncDnsResolver in action. I think by now this issue is largely resolved by using AsyncDnsResolver with appropriate TTL settings for DNS and akka.http.host-connection-pool.max-connection-lifetime for the pool. If you need more control than that you can also change ClientConnectionSettings and set a custom ClientTransport to implement whatever resolution logic is right.

jrudolph · 2021-11-08T13:09:36Z

A more automatic solution (implementing what the title of this issue says) could be to add some logic to do manual DNS resolution directly in the pool using the AsyncDnsResolver and use the returned TTLs to automatically apply a max-connection-lieftime setting for each connection.

jrudolph added the 0 - new Ticket is unclear on it's purpose or if it is valid or not label Jun 26, 2017

jrudolph changed the title ~~DNS positive-ttl is not honored by Akka Http Client~~ Akka Http Client pool connections are not reestablished after DNS positive-ttl Jul 19, 2017

jrudolph mentioned this issue Jul 19, 2017

akka-http connection pool caches DNS resolution forever (regardless of TTL) akka/akka#20690

Closed

jrudolph marked this as a duplicate of akka/akka#20690 Jul 19, 2017

reactormonk mentioned this issue Jul 20, 2017

Obey DNS TTL http4s/http4s#1269

Open

zsedem mentioned this issue May 15, 2018

add new option to close connections after a period of time #1768 #2016

Merged

jrudolph mentioned this issue Aug 15, 2018

Thread loops infinitely consuming CPU with non-zero akka.http.host-connection-pool.min-connections setting #1391

Closed

jrudolph mentioned this issue Nov 26, 2019

Support client-side load balancing with the connection pool #2828

Open

jrudolph added the help wanted Identifies issues that the core team will likely not have time to work on label Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akka Http Client pool connections are not reestablished after DNS positive-ttl #1226

Akka Http Client pool connections are not reestablished after DNS positive-ttl #1226

wojda commented Jun 23, 2017

jrudolph commented Jun 26, 2017

jrudolph commented Jun 26, 2017

wojda commented Jul 2, 2017

jrudolph commented Jul 5, 2017

mdedetrich commented Jul 19, 2017

jrudolph commented Jul 19, 2017

jrudolph commented Jul 19, 2017

jrudolph commented Jul 19, 2017 •

edited

randomstatistic commented Jul 19, 2017

jrudolph commented Jul 19, 2017

mdedetrich commented Jul 20, 2017

sergeykolbasov commented Jul 20, 2017 •

edited

jrudolph commented Jul 20, 2017

sergeykolbasov commented Jul 20, 2017 •

edited

jrudolph commented Jul 20, 2017

avietrov commented Jul 29, 2017 •

edited

jrudolph commented Jul 29, 2017 via email •

edited

alivanni commented Oct 5, 2017

agorski commented Apr 20, 2018

MikhailGolubtsov commented Jun 21, 2018 •

edited

johanandren commented Jun 22, 2018

raboof commented Jun 26, 2018

AnanyaDeb commented Sep 27, 2021 •

edited by jrudolph

jrudolph commented Nov 8, 2021

jrudolph commented Nov 8, 2021 •

edited

Akka Http Client pool connections are not reestablished after DNS positive-ttl #1226

Akka Http Client pool connections are not reestablished after DNS positive-ttl #1226

Comments

wojda commented Jun 23, 2017

jrudolph commented Jun 26, 2017

jrudolph commented Jun 26, 2017

wojda commented Jul 2, 2017

jrudolph commented Jul 5, 2017

mdedetrich commented Jul 19, 2017

jrudolph commented Jul 19, 2017

jrudolph commented Jul 19, 2017

jrudolph commented Jul 19, 2017 • edited

randomstatistic commented Jul 19, 2017

jrudolph commented Jul 19, 2017

mdedetrich commented Jul 20, 2017

sergeykolbasov commented Jul 20, 2017 • edited

jrudolph commented Jul 20, 2017

sergeykolbasov commented Jul 20, 2017 • edited

jrudolph commented Jul 20, 2017

avietrov commented Jul 29, 2017 • edited

jrudolph commented Jul 29, 2017 via email • edited

alivanni commented Oct 5, 2017

agorski commented Apr 20, 2018

MikhailGolubtsov commented Jun 21, 2018 • edited

johanandren commented Jun 22, 2018

raboof commented Jun 26, 2018

AnanyaDeb commented Sep 27, 2021 • edited by jrudolph

jrudolph commented Nov 8, 2021

jrudolph commented Nov 8, 2021 • edited

jrudolph commented Jul 19, 2017 •

edited

sergeykolbasov commented Jul 20, 2017 •

edited

sergeykolbasov commented Jul 20, 2017 •

edited

avietrov commented Jul 29, 2017 •

edited

jrudolph commented Jul 29, 2017 via email •

edited

MikhailGolubtsov commented Jun 21, 2018 •

edited

AnanyaDeb commented Sep 27, 2021 •

edited by jrudolph

jrudolph commented Nov 8, 2021 •

edited