Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]KAFKA-9893: Configurable TCP connection timeout and improve the initial metadata fetch #8544

Closed
wants to merge 1 commit into from

Conversation

ctan888
Copy link
Contributor

@ctan888 ctan888 commented Apr 24, 2020

*More detailed description of your change,

Selector
The Selector class will now take the new config CONNECTIONS_TIMEOUT_MS_CONFIG in its constructions.

idleExpiryManager
Currently, we have an idleExpiryManager that uses the LRU algorithm evicting the oldest idle connected socket channels. Similarly, we can instantiate a new LinkedHashMap to keep those socket channels initiating the connection and evict the timeout channels.

Currently, all the channels will be kept on the same LRU map. We will split the connected socket channels and connecting socket channels into different LRU map (we will call them "lruConnectingConnections" and "lruConnectedConnections" later).

Here's the state transition:

When the socket channel is initiating the connection, we will put the socket channel to the lruConnectingConnections.

When the connection is successfully built, we will move the channel from lruConnectingConnections into lruConnectedConnections.

In each selector poll, we will remove the oldest timeout socket channel in both lruConnectingConnections and lruConnectedConnections, if possible.

LeastLoadedNodeProvider
Currently, when no nodes provided in --boostrap-server option is connected, the LeastLoadedNodeProvider will provide an unconnected node for the client. The Cluster class shuffled the nodes to balance the initial pressure and the LeastLoadedNodeProvider will always provide the same node, which is the last node after shuffling. Consequently, though we may provide several bootstrap servers, the client not be able to connect to the cluster if any of the servers in the --bootstrap-server list is offline.

I'm changing the provider to interact with the ClusterConnectionStates to determine which node to provide when no connection exists.

ClusterConnectionStates
ClusterConnectionStates will keep the index of the most recently provided node. Meanwhile, a public API looks like below will be added to simulate the round-robin node picking.

public synchronized int nextNodeIdx(int nodeSize) {
return (this.nextNodeIdx++) % nodeSize;
}

The LeastLoadedNodeProvider will provide the nodeSize to prevent the out of bound excpetion.

When the LeastLoadedNodeProvider iterates the node list, it can consult the ClusterConnectionStates for the index of the node it should provide.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ctan888 ctan888 force-pushed the KIP-601-dev branch 3 times, most recently from 31d5e65 to accc8d5 Compare April 27, 2020 18:24
@ctan888
Copy link
Contributor Author

ctan888 commented May 12, 2020

@ctan888
Copy link
Contributor Author

ctan888 commented May 18, 2020

Close due to #8683

@dajac
Copy link
Contributor

dajac commented Jun 30, 2020

@d8tltanc We can close this one.

@ctan888 ctan888 closed this Jun 30, 2020
mhratson added a commit to linkedin/kafka-monitor that referenced this pull request Oct 21, 2022
- 'kafka_2.12', version: '2.4.0' -> 'kafka_2.13', version: '2.8.2'
- 'kafka-clients', version: '2.3.1' -> 'kafka-clients', version: '2.8.2'
- zookeeper 3.5.6 -> 3.8.0

## Details

new `NetworkClient` arguments:

1. `long connectionSetupTimeoutMs`,
2. `long connectionSetupTimeoutMaxMs`

are described in https://issues.apache.org/jira/browse/KAFKA-9893 and corresponding PRs:

1. apache/kafka#8544
2. apache/kafka#8683

## Testing Done

1. ./gradlew build
mhratson added a commit to linkedin/kafka-monitor that referenced this pull request Oct 21, 2022
- 'kafka_2.12', version: '2.4.0' -> 'kafka_2.13', version: '2.8.2'
- 'kafka-clients', version: '2.3.1' -> 'kafka-clients', version: '2.8.2'
- zookeeper 3.5.6 -> 3.8.0

## Details

new `NetworkClient` arguments:

1. `long connectionSetupTimeoutMs`,
2. `long connectionSetupTimeoutMaxMs`

are described in https://issues.apache.org/jira/browse/KAFKA-9893 and corresponding PRs:

1. apache/kafka#8544
2. apache/kafka#8683

## Testing Done

1. ./gradlew build
mhratson added a commit to linkedin/kafka-monitor that referenced this pull request Oct 21, 2022
- 'kafka_2.12', version: '2.4.0' -> 'kafka_2.12', version: '2.8.2'
- 'kafka-clients', version: '2.3.1' -> 'kafka-clients', version: '2.8.2'
- zookeeper 3.5.6 -> 3.8.0

## Details

new `NetworkClient` arguments:

1. `long connectionSetupTimeoutMs`,
2. `long connectionSetupTimeoutMaxMs`

are described in https://issues.apache.org/jira/browse/KAFKA-9893 and corresponding PRs:

1. apache/kafka#8544
2. apache/kafka#8683

## Testing Done

1. ./gradlew build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants