New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-12193: Re-resolve IPs after a client disconnects #9902
Conversation
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. It looks good overall, I left a few comments for minor issues
ApiVersions apiVersions, | ||
Sensor throttleTimeSensor, | ||
LogContext logContext) { | ||
public NetworkClient(MetadataUpdater metadataUpdater, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this constructor? As far as I can tell, it's only called by the other 3 above. These could directly call the real one below instead of going through this new one. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, fixed. These are also weirdly inconsistent (selector
is before metadata
and metadataUpdater
sometimes, after them other times), which I guess we could clean up, but I didn't want to remove any existing public constructors.
@@ -183,6 +186,10 @@ public void disconnected(String id, long now) { | |||
connectingNodes.remove(id); | |||
} else { | |||
resetConnectionSetupTimeout(nodeState); | |||
if (nodeState.state.isConnected()) { | |||
// If a connection had previously been established, re-resolve DNS because the IPs may have changed | |||
nodeState.addresses = Collections.emptyList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly weird that we're updating a "private" field here.
Also the comment is a bit misleading. We're not re-resolving DNS here but instead clearing state so if we reconnect later, the client will be forced to re-resolve then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mimaison Java allows outer classes to access inner member's private members. This is described in detail at JLS-6.6.1
A member (class, interface, field, or method) of a reference (class, interface, or array) type or a constructor of a class type is accessible only if the type is accessible and the member or constructor is declared to permit access:
- if the member or constructor is declared private, then access is permitted if and only if it occurs within the body of the top level class (§7.6) that encloses the declaration of the member or constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @mimaison on the comment, good to make it clear by mentioning that the addresses are cleared to re-resolve later when it reconnects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing the field is allowed, but I think @mimaison meant that it makes the code a little confusing and inconsistent. NodeConnectionState
has some private fields and some package-private ones, and addresses
previous was only modified by methods on NodeConnectionState
, not accessed directly by the top-level class. I've moved the empty list assignment into a NodeConnectionState.clearAddresses
method
Agree about the comment, I've updated it
clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java
Outdated
Show resolved
Hide resolved
clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @bob-barrett for addressing the review comments. LGTM
@bob-barrett Thanks for the PR. For my understanding, have we considered changing the way backoff is applied instead of re-resolving when an established connection is closed. For instance, we could try all the resolved IPs before applying the backoff. However, we would still have to wait until the connection timeouts for each IP. Also, I suppose that we assume that closing connection is rather infrequent in normal circumstances so the cost of the DNS resolution is low. On the other hand, we relies on the DNS cache if the resolution happens really frequently. Am I getting this right? That seems reasonable. I am trying to convince myself that this is the correct approach to address the issue. |
@dajac That's right, and that's my concern with exhausting the currently-resolved IPs. The default connection setup timeout is 10 seconds. Even without applying the backoff, 10 seconds of delay per additional IP address is pretty noticeable.
I think this is the right way to think about it, yes. There's already a DNS cache, which can be configured as needed. We're effectively adding our own additional cache (with a very long TTL) on top of that by preserving the resolved IPs. If a client has been connected to a server for hours or days and then disconnects, it makes complete sense to me to re-resolve DNS. |
@bob-barrett Thanks for your answers. That makes sense and I do agree with the approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bob-barrett Thanks for the PR. I have left a meta-comment. Please, let me know what you think. Besides this, the rest looks good.
static List<InetAddress> resolve(String host, ClientDnsLookup clientDnsLookup, | ||
HostResolver hostResolver) throws UnknownHostException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method and its usages look a bit weird to me now. It resolves an host
based on clientDnsLookup
and a hostResolver
which resolves an host
at its turn. We also have to pass the HostResolver
wherever it is called now. I wonder if we should avoid this level of indirection here and directly encapsulate the entire logic in the HostResolver
.
HostResolver
would look like this:
public interface HostResolver {
List<InetAddress> resolve(String host, ClientDnsLookup clientDnsLookup) throws UnknownHostException;
}
In NodeConnectionState#currentAddress
, we could replace addresses = ClientUtils.resolve(host, clientDnsLookup, hostResolver)
by addresses = hostResolver.resolve(host, clientDnsLookup)
.
The DefaultHostResolver
would do what ClientUtils.resolve
does today and we can mock it in tests like you did.
Is this something that you have considered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dajac I started making this change, but while working on it, I realized that I don't think I like requiring the HostResolver
to understand the client.dns.lookup
config. In the same way that MockTime
is only used to mock Java built-in methods, I think the HostResolver
should only be responsible for the DNS resolution (either real or mocked), and that ClientUtils
should be responsible for understanding the Kafka application behavior. That way, tests that mock DNS still test the actual ClientDnsLookup
behavior, rather than relying on both the default resolver and any mocked resolvers handle it correctly. It also means that if we ever add an option for ClientDnsLookup
or change its behavior, we don't need to remember to update all HostResolver
implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bob-barrett. I do agree with your point. Let's keep it as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bob-barrett Thanks for your reply. I left few minor comments.
static List<InetAddress> resolve(String host, ClientDnsLookup clientDnsLookup, | ||
HostResolver hostResolver) throws UnknownHostException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bob-barrett. I do agree with your point. Let's keep it as you suggested.
clients/src/main/java/org/apache/kafka/clients/DefaultHostResolver.java
Outdated
Show resolved
Hide resolved
clients/src/main/java/org/apache/kafka/clients/HostResolver.java
Outdated
Show resolved
Hide resolved
@@ -81,6 +85,19 @@ | |||
private final NetworkClient clientWithNoExponentialBackoff = createNetworkClient(reconnectBackoffMsTest); | |||
private final NetworkClient clientWithStaticNodes = createNetworkClientWithStaticNodes(); | |||
private final NetworkClient clientWithNoVersionDiscovery = createNetworkClientWithNoVersionDiscovery(); | |||
private ArrayList<InetAddress> initialAddresses = new ArrayList<>(Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we make these two static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching the UnknownHostException
makes it a bit annoying, I threw it in a static block to make the compiler happy. Let me know if you think there's something cleaner.
clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java
Outdated
Show resolved
Hide resolved
@dajac @mimaison @satishd Thanks for your reviews! In addition to PR feedback, I pushed a new change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. LGTM
Tests passed on JDK 8. |
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io> Conflicts (related to junit upgrade): clients/src/test/java/org/apache/kafka/clients/ClientUtilsTest.java clients/src/test/java/org/apache/kafka/clients/ClusterConnectionStatesTest.java clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java
I have picked the commit into 2.7 but could not pick it into 2.6 cleanly. @bob-barrett Could you open a PR for 2.6? |
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
…ltipleIPsWithUseAll TICKET = N/A LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry picked. EXIT_CRITERIA = when the commit 131d475 is picked from upstream: KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
…ltipleIPsWithUseAll (#116) TICKET = N/A LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry picked. EXIT_CRITERIA = when the commit 131d475 is picked from upstream: KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
…pache#10067) This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
…pache#10067) This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
… (#10109) This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
… (#10108) This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
您好,我已收到您的来信,会尽快回复。谢谢!
|
…ltipleIPsWithUseAll (#116) TICKET = N/A LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry picked. EXIT_CRITERIA = when the commit 131d475 is picked from upstream: KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
* [LI-HOTFIX] Do not resolve bootstrap server during client bootstrap/startup (#315) * do not resolve bootstrap server during client bootstrap/startup Co-authored-by: Ke Hu <kehu@kehu-mn2.linkedin.biz> * [LI-HOTFIX] use_all_dns_ips as default value for client.dns.lookup (#348) * [LI-HOTFIX] log level change to avoid flooding the logs * [LI-HOTFIX] Ignore the failed test ClusterConnectionStatesTest#testMultipleIPsWithUseAll (#116) TICKET = N/A LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry picked. EXIT_CRITERIA = when the commit 131d475 is picked from upstream: KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902 * [LI-HOTFIX] Ignoring the failed tests (#188) TICKET = N/A LI_DESCRIPTION = Several tests are failing since the domain kafka.apache.org that used to resolve to more than 1 IPv4 addresses are not only resolving to 1 IPv4 address. The upstream code has overhauled the ClusterConnectionStatesTest. We are simply ignoring these tests for now, and will get the new logic from upstream after a major version rebase. EXIT_CRITERIA = This hotfix can be removed in the next major version rebase Co-authored-by: Ke Hu <kehu@linkedin.com> Co-authored-by: Ke Hu <kehu@kehu-mn2.linkedin.biz> Co-authored-by: Lucas Wang <luwang@linkedin.com>
…ltipleIPsWithUseAll (#116) TICKET = N/A LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry picked. EXIT_CRITERIA = when the commit 131d475 is picked from upstream: KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
* KAFKA-6863 Kafka clients should try to use multiple DNS resolved IP (apache#4987) Implementation of KIP-302: Based on the new client configuration `client.dns.lookup`, a NetworkClient can use InetAddress.getAllByName to find all IPs and iterate over them when they fail to connect. Only uses either IPv4 or IPv6 addresses similar to the default mode. Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com> Co-authored-by: Mickael Maison <mickael.maison@gmail.com> Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com> * [LI-HOTFIX] Ignore the failed test ClusterConnectionStatesTest#testMultipleIPsWithUseAll (#116) TICKET = N/A LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry picked. EXIT_CRITERIA = when the commit 131d475 is picked from upstream: KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902 * Ignoring the failed tests (#188) [LI-HOTFIX] Ignoring the failed tests (#188) TICKET = N/A LI_DESCRIPTION = Several tests are failing since the domain kafka.apache.org that used to resolve to more than 1 IPv4 addresses are not only resolving to 1 IPv4 address. The upstream code has overhauled the ClusterConnectionStatesTest. We are simply ignoring these tests for now, and will get the new logic from upstream after a major version rebase. EXIT_CRITERIA = This hotfix can be removed in the next major version rebase * Fix for KAFKA-7974: Avoid zombie AdminClient when node host isn't resolvable (apache#6305) * Fix for KAFKA-7974: Avoid calling disconnect() when not connecting * Resolve host only when currentAddress() is called Moves away from automatically resolving the host when the connection entry is constructed, which can leave ClusterConnectionStates in a confused state. Instead, resolution is done on demand, ensuring that the entry in the connection list is present even if the resolution failed. * Add Javadoc to ClusterConnectionStates.connecting() * KAFKA-9313: Set `use_all_dns_ips` as the new default for `client.dns.lookup` (KIP-602) (apache#8644) This applies to the producer, consumer, admin client, connect worker and inter broker communication. `ClientDnsLookup.DEFAULT` has been deprecated and a warning will be logged if it's explicitly set in a client config. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk> * Update NetworkClient usage in SSLNetworkClient * [LI-HOTFIX] Bypass cluster metadata auto refresh code path by default (#329) The PR #228 attempted to resolve provided boostrap servers when the metadata is exceeding a staleness threshold. The config is coverred both on producer and consumer, and default behavior without configured value is setting timeout to Long.MAX_VALUE. However, cruise-control is affected by the behavior as it implements a similar mechanism on its own and directly uses of NetworkClient. The code would fail if empty bootstrap server is passed to NetworkClient, which is the case for internal use of CC. To resolve this, this patch aims to make default value as -1, and omit the code path referencing bootstrap server when we see -1. EXIT_CRITERIA = When #228 is ejected Co-authored-by: Xiongqi Wu <xiongqi.wu@gmail.com> Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com> Co-authored-by: Lucas Wang <luwang@linkedin.com> Co-authored-by: Nicholas Parker <nick@thelastpickle.com> Co-authored-by: Badai Aqrandista <badaiaqrandista@gmail.com> Co-authored-by: Joseph (Ting-Chou) Lin <lmr3796joseph@gmail.com>
If multiple addresses are available. This change is a follow-up to #9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
If multiple addresses are available. This change is a follow-up to #9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Cherry-pick of 47e3777 Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io> Co-authored-by: Bob Barrett <bob.barrett@confluent.io>
…pache#10061) This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
…#283) If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io> Co-authored-by: Bob Barrett <bob.barrett@confluent.io>
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.
Committer Checklist (excluded from commit message)