Scrap the connection test in the client metadata update #277
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, if the cluster metadata was giving us back a broker which we
suspected was unavailable (since it was already in our 'dead' set) then we would
wait for the connection, and mark it as unavailable if the connection failed
(otherwise, we simply do what the cluster tells us and let the
producers/consumers deal with the connection errors). This was handy since it
let us back off nicely if a broker crashed and came back, retrying metadata
until the cluster had caught up and moved the leader to a broker that was up.
I'm now of the opinion this was more trouble than it's worth, so scrap it. Among
other things:
to begin with (fixes IO while holding lock #263)
The unfortunate side-effect of scrapping it is that in the producer and consumer
we are more likely to fail if we don't wait long enough for the cluster to fail
over leadership. The real solution if that occurs is to wait longer in the
correct spot (
RetryBackoff
in the producer, currently hard-coded to 10 secondsin the consumer) instead of this hack.
@Shopify/kafka
cc @luck02