Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-12193: Re-resolve IPs after a client disconnects #9902

Merged
merged 4 commits into from Feb 4, 2021

Conversation

bob-barrett
Copy link
Contributor

This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.
Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It looks good overall, I left a few comments for minor issues

ApiVersions apiVersions,
Sensor throttleTimeSensor,
LogContext logContext) {
public NetworkClient(MetadataUpdater metadataUpdater,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this constructor? As far as I can tell, it's only called by the other 3 above. These could directly call the real one below instead of going through this new one. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, fixed. These are also weirdly inconsistent (selector is before metadata and metadataUpdater sometimes, after them other times), which I guess we could clean up, but I didn't want to remove any existing public constructors.

@@ -183,6 +186,10 @@ public void disconnected(String id, long now) {
connectingNodes.remove(id);
} else {
resetConnectionSetupTimeout(nodeState);
if (nodeState.state.isConnected()) {
// If a connection had previously been established, re-resolve DNS because the IPs may have changed
nodeState.addresses = Collections.emptyList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly weird that we're updating a "private" field here.

Also the comment is a bit misleading. We're not re-resolving DNS here but instead clearing state so if we reconnect later, the client will be forced to re-resolve then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimaison Java allows outer classes to access inner member's private members. This is described in detail at JLS-6.6.1

A member (class, interface, field, or method) of a reference (class, interface, or array) type or a constructor of a class type is accessible only if the type is accessible and the member or constructor is declared to permit access:

  • if the member or constructor is declared private, then access is permitted if and only if it occurs within the body of the top level class (§7.6) that encloses the declaration of the member or constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @mimaison on the comment, good to make it clear by mentioning that the addresses are cleared to re-resolve later when it reconnects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing the field is allowed, but I think @mimaison meant that it makes the code a little confusing and inconsistent. NodeConnectionState has some private fields and some package-private ones, and addresses previous was only modified by methods on NodeConnectionState, not accessed directly by the top-level class. I've moved the empty list assignment into a NodeConnectionState.clearAddresses method

Agree about the comment, I've updated it

Copy link
Member

@satishd satishd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @bob-barrett for addressing the review comments. LGTM

@dajac
Copy link
Contributor

dajac commented Feb 2, 2021

@bob-barrett Thanks for the PR. For my understanding, have we considered changing the way backoff is applied instead of re-resolving when an established connection is closed. For instance, we could try all the resolved IPs before applying the backoff. However, we would still have to wait until the connection timeouts for each IP.

Also, I suppose that we assume that closing connection is rather infrequent in normal circumstances so the cost of the DNS resolution is low. On the other hand, we relies on the DNS cache if the resolution happens really frequently. Am I getting this right? That seems reasonable.

I am trying to convince myself that this is the correct approach to address the issue.

@bob-barrett
Copy link
Contributor Author

For instance, we could try all the resolved IPs before applying the backoff. However, we would still have to wait until the connection timeouts for each IP.

@dajac That's right, and that's my concern with exhausting the currently-resolved IPs. The default connection setup timeout is 10 seconds. Even without applying the backoff, 10 seconds of delay per additional IP address is pretty noticeable.

Also, I suppose that we assume that closing connection is rather infrequent in normal circumstances so the cost of the DNS resolution is low. On the other hand, we relies on the DNS cache if the resolution happens really frequently. Am I getting this right?

I think this is the right way to think about it, yes. There's already a DNS cache, which can be configured as needed. We're effectively adding our own additional cache (with a very long TTL) on top of that by preserving the resolved IPs. If a client has been connected to a server for hours or days and then disconnects, it makes complete sense to me to re-resolve DNS.

@dajac
Copy link
Contributor

dajac commented Feb 3, 2021

@bob-barrett Thanks for your answers. That makes sense and I do agree with the approach.

Copy link
Contributor

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bob-barrett Thanks for the PR. I have left a meta-comment. Please, let me know what you think. Besides this, the rest looks good.

Comment on lines +109 to +110
static List<InetAddress> resolve(String host, ClientDnsLookup clientDnsLookup,
HostResolver hostResolver) throws UnknownHostException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method and its usages look a bit weird to me now. It resolves an host based on clientDnsLookup and a hostResolver which resolves an host at its turn. We also have to pass the HostResolver wherever it is called now. I wonder if we should avoid this level of indirection here and directly encapsulate the entire logic in the HostResolver.

HostResolver would look like this:

public interface HostResolver {
    List<InetAddress> resolve(String host, ClientDnsLookup clientDnsLookup) throws UnknownHostException;
}

In NodeConnectionState#currentAddress, we could replace addresses = ClientUtils.resolve(host, clientDnsLookup, hostResolver) by addresses = hostResolver.resolve(host, clientDnsLookup).

The DefaultHostResolver would do what ClientUtils.resolve does today and we can mock it in tests like you did.

Is this something that you have considered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dajac I started making this change, but while working on it, I realized that I don't think I like requiring the HostResolver to understand the client.dns.lookup config. In the same way that MockTime is only used to mock Java built-in methods, I think the HostResolver should only be responsible for the DNS resolution (either real or mocked), and that ClientUtils should be responsible for understanding the Kafka application behavior. That way, tests that mock DNS still test the actual ClientDnsLookup behavior, rather than relying on both the default resolver and any mocked resolvers handle it correctly. It also means that if we ever add an option for ClientDnsLookup or change its behavior, we don't need to remember to update all HostResolver implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bob-barrett. I do agree with your point. Let's keep it as you suggested.

Copy link
Contributor

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bob-barrett Thanks for your reply. I left few minor comments.

Comment on lines +109 to +110
static List<InetAddress> resolve(String host, ClientDnsLookup clientDnsLookup,
HostResolver hostResolver) throws UnknownHostException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bob-barrett. I do agree with your point. Let's keep it as you suggested.

@@ -81,6 +85,19 @@
private final NetworkClient clientWithNoExponentialBackoff = createNetworkClient(reconnectBackoffMsTest);
private final NetworkClient clientWithStaticNodes = createNetworkClientWithStaticNodes();
private final NetworkClient clientWithNoVersionDiscovery = createNetworkClientWithNoVersionDiscovery();
private ArrayList<InetAddress> initialAddresses = new ArrayList<>(Arrays.asList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we make these two static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching the UnknownHostException makes it a bit annoying, I threw it in a static block to make the compiler happy. Let me know if you think there's something cleaner.

@bob-barrett
Copy link
Contributor Author

@dajac @mimaison @satishd Thanks for your reviews!

In addition to PR feedback, I pushed a new change to ClusterConnectionStatesTest. I noticed while testing locally that tests that depended on kafka.apache.org were failing. In #9294, we changed the tests to assume that the site resolved to 3 IPv4 addresses. However, there appear to now be 2 IPv4 addresses and one IPv6, which broke the tests, since ClientUtils#resolve will only return the addresses of one version or the other, not both. With the HostResolver interface, it was straightforward to mock the DNS out instead of actually resolving kafka.apache.org, which should make the tests less fragile. This also may accomplish the intent of https://issues.apache.org/jira/browse/KAFKA-10496 without needing to create in-memory DNS.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. LGTM

@mimaison
Copy link
Member

mimaison commented Feb 4, 2021

Tests passed on JDK 8.
The failure on JDK 11 (kafka.api.TransactionsTest.testAbortTransactionTimeout()) looks unrelated.

@mimaison mimaison merged commit 131d475 into apache:trunk Feb 4, 2021
dajac pushed a commit that referenced this pull request Feb 4, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>

Conflicts (related to junit upgrade):
	clients/src/test/java/org/apache/kafka/clients/ClientUtilsTest.java
	clients/src/test/java/org/apache/kafka/clients/ClusterConnectionStatesTest.java
	clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java
@dajac
Copy link
Contributor

dajac commented Feb 4, 2021

I have picked the commit into 2.7 but could not pick it into 2.6 cleanly. @bob-barrett Could you open a PR for 2.6?

@bob-barrett bob-barrett deleted the KAFKA-12193 branch February 4, 2021 16:31
bob-barrett added a commit to bob-barrett/kafka that referenced this pull request Feb 4, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
bob-barrett added a commit to bob-barrett/kafka that referenced this pull request Feb 5, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
bob-barrett added a commit to bob-barrett/kafka that referenced this pull request Feb 5, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
bob-barrett added a commit to bob-barrett/kafka that referenced this pull request Feb 5, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
dajac pushed a commit that referenced this pull request Feb 5, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
gitlw added a commit to gitlw/kafka that referenced this pull request Feb 6, 2021
…ltipleIPsWithUseAll

TICKET = N/A
LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now
only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry
picked.
EXIT_CRITERIA = when the commit 131d475 is picked from upstream:
KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
gitlw added a commit to linkedin/kafka that referenced this pull request Feb 6, 2021
…ltipleIPsWithUseAll (#116)

TICKET = N/A
LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now
only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry
picked.
EXIT_CRITERIA = when the commit 131d475 is picked from upstream:
KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
rajinisivaram pushed a commit that referenced this pull request Feb 10, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
rajinisivaram pushed a commit that referenced this pull request Feb 10, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
rajinisivaram pushed a commit that referenced this pull request Feb 10, 2021
This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
bob-barrett added a commit to bob-barrett/kafka that referenced this pull request Feb 11, 2021
…pache#10067)

This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
bob-barrett added a commit to bob-barrett/kafka that referenced this pull request Feb 11, 2021
…pache#10067)

This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
rajinisivaram pushed a commit that referenced this pull request Feb 11, 2021
… (#10109)

This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
rajinisivaram pushed a commit that referenced this pull request Feb 11, 2021
… (#10108)

This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
@xuchenxin
Copy link

xuchenxin commented Dec 7, 2021 via email

ZIDAZ pushed a commit to linkedin/kafka that referenced this pull request Jun 6, 2022
…ltipleIPsWithUseAll (#116)

TICKET = N/A
LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now
only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry
picked.
EXIT_CRITERIA = when the commit 131d475 is picked from upstream:
KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
ZIDAZ added a commit to linkedin/kafka that referenced this pull request Jun 6, 2022
* [LI-HOTFIX] Do not resolve bootstrap server during client bootstrap/startup (#315)

* do not resolve bootstrap server during client bootstrap/startup

Co-authored-by: Ke Hu <kehu@kehu-mn2.linkedin.biz>

* [LI-HOTFIX] use_all_dns_ips as default value for client.dns.lookup (#348)

* [LI-HOTFIX] log level change to avoid flooding the logs

* [LI-HOTFIX] Ignore the failed test ClusterConnectionStatesTest#testMultipleIPsWithUseAll (#116)

TICKET = N/A
LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now
only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry
picked.
EXIT_CRITERIA = when the commit 131d475 is picked from upstream:
KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902

* [LI-HOTFIX] Ignoring the failed tests (#188)

TICKET = N/A
LI_DESCRIPTION = Several tests are failing since the domain kafka.apache.org that used to resolve to more than 1 IPv4 addresses are not only resolving to 1 IPv4 address.
The upstream code has overhauled the ClusterConnectionStatesTest. We are simply ignoring these tests for now, and will get the new logic from upstream after a major version rebase.
EXIT_CRITERIA = This hotfix can be removed in the next major version rebase

Co-authored-by: Ke Hu <kehu@linkedin.com>
Co-authored-by: Ke Hu <kehu@kehu-mn2.linkedin.biz>
Co-authored-by: Lucas Wang <luwang@linkedin.com>
ZIDAZ pushed a commit to linkedin/kafka that referenced this pull request Jun 14, 2022
…ltipleIPsWithUseAll (#116)

TICKET = N/A
LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now
only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry
picked.
EXIT_CRITERIA = when the commit 131d475 is picked from upstream:
KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902
ZIDAZ added a commit to linkedin/kafka that referenced this pull request Jun 14, 2022
* KAFKA-6863 Kafka clients should try to use multiple DNS resolved IP (apache#4987)

Implementation of KIP-302: Based on the new client configuration `client.dns.lookup`, a NetworkClient can use InetAddress.getAllByName to find all IPs and iterate over them when they fail to connect. Only uses either IPv4 or IPv6 addresses similar to the default mode.

Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

* [LI-HOTFIX] Ignore the failed test ClusterConnectionStatesTest#testMultipleIPsWithUseAll (#116)

TICKET = N/A
LI_DESCRIPTION = The test fails since the domain kafka.apache.org used to return 3 IPs and is now
only returning two IPs. Furthermore, the upstream fix identified below cannot be cleanly cherry
picked.
EXIT_CRITERIA = when the commit 131d475 is picked from upstream:
KAFKA-12193: Re-resolve IPs after a client disconnects apache#9902

* Ignoring the failed tests (#188)

[LI-HOTFIX] Ignoring the failed tests (#188)

TICKET = N/A
LI_DESCRIPTION = Several tests are failing since the domain kafka.apache.org that used to resolve to more than 1 IPv4 addresses are not only resolving to 1 IPv4 address.
The upstream code has overhauled the ClusterConnectionStatesTest. We are simply ignoring these tests for now, and will get the new logic from upstream after a major version rebase.
EXIT_CRITERIA = This hotfix can be removed in the next major version rebase

* Fix for KAFKA-7974: Avoid zombie AdminClient when node host isn't resolvable (apache#6305)

* Fix for KAFKA-7974: Avoid calling disconnect() when not connecting

* Resolve host only when currentAddress() is called

Moves away from automatically resolving the host when the connection entry is constructed, which can leave ClusterConnectionStates in a confused state.
Instead, resolution is done on demand, ensuring that the entry in the connection list is present even if the resolution failed.

* Add Javadoc to ClusterConnectionStates.connecting()

* KAFKA-9313: Set `use_all_dns_ips` as the new default for `client.dns.lookup` (KIP-602) (apache#8644)

This applies to the producer, consumer, admin client, connect worker
and inter broker communication.

`ClientDnsLookup.DEFAULT` has been deprecated and a warning
will be logged if it's explicitly set in a client config.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>

* Update NetworkClient usage in SSLNetworkClient

* [LI-HOTFIX] Bypass cluster metadata auto refresh code path by default (#329)

The PR #228 attempted to resolve provided boostrap servers when
the metadata is exceeding a staleness threshold.  The config is coverred
both on producer and consumer, and default behavior without configured
value is setting timeout to Long.MAX_VALUE.

However, cruise-control is affected by the behavior as it implements a
similar mechanism on its own and directly uses of NetworkClient. The code
would fail if empty bootstrap server is passed to NetworkClient, which
is the case for internal use of CC.

To resolve this, this patch aims to make default value as -1, and omit
the code path referencing bootstrap server when we see -1.

EXIT_CRITERIA = When #228 is ejected

Co-authored-by: Xiongqi Wu <xiongqi.wu@gmail.com>
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Lucas Wang <luwang@linkedin.com>
Co-authored-by: Nicholas Parker <nick@thelastpickle.com>
Co-authored-by: Badai Aqrandista <badaiaqrandista@gmail.com>
Co-authored-by: Joseph (Ting-Chou) Lin <lmr3796joseph@gmail.com>
lucasbru pushed a commit that referenced this pull request Nov 27, 2023
If multiple addresses are available. This change is a follow-up to #9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
cadonna pushed a commit to cadonna/kafka that referenced this pull request Dec 11, 2023
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
ex172000 pushed a commit to ex172000/kafka that referenced this pull request Dec 15, 2023
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
cadonna added a commit that referenced this pull request Jan 4, 2024
If multiple addresses are available. This change is a follow-up to #9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Cherry-pick of 47e3777

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>

Co-authored-by: Bob Barrett <bob.barrett@confluent.io>
udaynpusa pushed a commit to mapr/kafka that referenced this pull request Jan 30, 2024
…pache#10061)

This patch changes the NetworkClient behavior to resolve the target node's hostname after disconnecting from an established connection, rather than waiting until the previously-resolved addresses are exhausted. This is to handle the scenario when the node's IP addresses have changed during the lifetime of the connection, and means that the client does not have to try to connect to invalid IP addresses until it has tried each address.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Satish Duggana <satishd@apache.org>, David Jacot <djacot@confluent.io>
anurag-harness pushed a commit to anurag-harness/kafka that referenced this pull request Feb 9, 2024
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
anurag-harness added a commit to anurag-harness/kafka that referenced this pull request Feb 9, 2024
…#283)

If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>

Co-authored-by: Bob Barrett <bob.barrett@confluent.io>
yyu1993 pushed a commit to yyu1993/kafka that referenced this pull request Feb 15, 2024
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
clolov pushed a commit to clolov/kafka that referenced this pull request Apr 5, 2024
If multiple addresses are available. This change is a follow-up to apache#9902. When re-resolving DNS after disconnecting, it is possible (likely, even) that we will resolve the same set of addresses in the same order, meaning we could attempt to reconnect to the same IP that we just disconnected from. In the case where we disconnected from the IP address because it has become unhealthy (due to a load balancer going down or a network routing layer restarting, for example), the client will need to time out before connecting to the next IP. This essentially unifies the behavior before and after KAFKA-12193: re-resolve after disconnecting while still progressing through the IP list.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants