KAFKA-15627: KIP-951's Leader discovery optimisations on the client #14564

msn-tldr · 2023-10-17T17:32:35Z

NOTE - This PR has since been moved here, as it got auto-closed with github workflows as explained in this comment.

This implements the leader discovery optimisations for the client on KIP-951.

Optimisation1: On discovering a new leader, produce-batch should skip any retry-backoff. This was implemented in KAFKA-15415
Optimisation2: FetchResponse/ProduceResponse would return new leader info. This information is then used to update the Metadata cached.

This PR focuses on optimisation2 from above. Additionally it fixes a bug that got introduced to MetadataCache.java, details inline in a comment.

IGNORE files - *.json, FetchResponse.java & ProduceResponse.java, they will be removed from this PR once #14627 is merged.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

msn-tldr · 2023-10-17T17:34:20Z

clients/src/main/java/org/apache/kafka/clients/MetadataCache.java

@@ -150,7 +150,7 @@ MetadataCache mergeWith(String newClusterId,
        // We want the most recent topic ID. We start with the previous ID stored for retained topics and then
        // update with newest information from the MetadataResponse. We always take the latest state, removing existing
        // topic IDs if the latest state contains the topic name but not a topic ID.
-        Map<String, Uuid> newTopicIds = topicIds.entrySet().stream()
+        Map<String, Uuid> newTopicIds = this.topicIds.entrySet().stream()


this was a bug, now has a test in MetadataCachTest.java

😱 What was the issue?

Net effect of the bug was that in the merged cache, the IDs of retained topics(from pre-existing metadata) would be lost in the newly built cache(via merging).

As the current comment explains the intention of the code is to get merged list of topic-ids.

We start with the previous ID stored for retained topics and then update with newest information from the MetadataResponse.

This should be done by initialising the merged list of topic-ids with retained topic(this.topicIds). And then updating with newest information(topicIds). But the code uses topicIds even to get retained topic-ids. The bug got introduced inadvertently here, due to same-named variables at object Vs method scope. Now it has a test to catch regressions in behaviour.

I would rename the argument currently called topicIds too, just to try to prevent a similar bug being re-introduced by accident.

msn-tldr · 2023-10-17T17:34:35Z

clients/src/test/java/org/apache/kafka/clients/MetadataCacheTest.java

@@ -83,4 +84,69 @@ public void testMissingLeaderEndpoint() {
        assertEquals(nodesById.get(7), replicas.get(7));
    }

+    @Test
+    public void testMergeWithThatPreExistingPartitionIsRetainedPostMerge() {


test for bug fixed in MetadataCache

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractFetch.java

clients/src/main/java/org/apache/kafka/clients/Metadata.java

kirktrue

Thanks for the pull request, @msn-tldr!

I didn't anything too egregious, but a handful of changes I'd like to see. A good number of them are subjective, so feel free to ignore 😄

clients/src/main/java/org/apache/kafka/clients/Metadata.java

kirktrue · 2023-10-18T19:59:54Z

clients/src/main/java/org/apache/kafka/clients/Metadata.java

+        for (Entry partitionLeader: partitionLeaders.entrySet()) {
+            TopicPartition partition = (TopicPartition) partitionLeader.getKey();
+            Metadata.LeaderAndEpoch currentLeader = currentLeader(partition);
+            Metadata.LeaderIdAndEpoch newLeader = (LeaderIdAndEpoch) partitionLeader.getValue();
+            if (!newLeader.epoch.isPresent() || !newLeader.leaderId.isPresent()) {
+                log.trace("For {}, incoming leader information is incomplete {}", partition, newLeader);
+                continue;
+            }
+            if (currentLeader.epoch.isPresent() && newLeader.epoch.get() <= currentLeader.epoch.get()) {
+                log.trace("For {}, incoming leader({}) is not-newer than the one in the existing metadata {}, so ignoring.", partition, newLeader, currentLeader);
+                continue;
+            }
+            if (!newNodes.containsKey(newLeader.leaderId.get())) {
+                log.trace("For {}, incoming leader({}), the corresponding node information for node-id {} is missing, so ignoring.", partition, newLeader, newLeader.leaderId.get());
+                continue;
+            }
+            if (!this.cache.partitionMetadata(partition).isPresent()) {
+                log.trace("For {}, incoming leader({}), no longer has cached metadata so ignoring.", partition, newLeader);
+                continue;
+            }


All of these checks are to handle cases where the incoming leader information is not usable, right? I'm wondering—and this is a stylistic choice—if it makes sense to throw these into a separate method shouldUpdatePartitionLeader(. . .)? Might make it more easily accessible for a dedicated unit test, too.

It's a shame that use of Optional is so noisy, what with its isPresent()s and get()s everywhere 😦

if it makes sense to throw these into a separate method shouldUpdatePartitionLeader(. . .)?

Right now conditions to not update are fairly contained, so i will skip adding the method now.

kirktrue · 2023-10-18T20:04:15Z

clients/src/main/java/org/apache/kafka/clients/Metadata.java

+                continue;
+            }
+
+            MetadataResponse.PartitionMetadata existingMetadata = this.cache.partitionMetadata(partition).get();


Is there any way that existingMetadata could be null at this point? I assume not, due to the use of the Optional wrapper here.

Good call-out, but no, due to check at line 392 making sure optional.isPresent()

if (!this.cache.partitionMetadata(partition).isPresent()) { log.trace("For {}, incoming leader({}), no longer has cached metadata so ignoring.", partition, newLeader); continue; }

clients/src/main/java/org/apache/kafka/clients/Metadata.java

kirktrue · 2023-10-18T20:19:52Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractFetch.java

@@ -171,6 +179,15 @@ protected void handleFetchResponse(final Node fetchTarget,
                log.debug("Fetch {} at offset {} for partition {} returned fetch data {}",
                        fetchConfig.isolationLevel, fetchOffset, partition, partitionData);

+                Errors partitionError = Errors.forCode(partitionData.errorCode());
+                if (requestVersion >= 16 && (partitionError == Errors.NOT_LEADER_OR_FOLLOWER || partitionError == Errors.FENCED_LEADER_EPOCH)) {


Can we add a constant, with some comments, for the magic value of 16?

It wasn't needed, so removed(from the Sender code path too). As new leader is coming through tagged fields, for version < 16, tagged fields would be initialised with default values that would be ignored.

clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractFetch.java

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

kirktrue · 2023-10-18T20:31:13Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

+                    final short requestVersion = response.requestHeader().apiVersion();
+                    if (requestVersion >= 16 && !partitionsWithUpdatedLeaderInfo.isEmpty()) {
+                        List<Node> leaderNodes = produceResponse.data().nodeEndpoints().stream()
+                            .map(e -> new Node(e.nodeId(), e.host(), e.port(), e.rack()))
+                            .filter(e -> !e.equals(Node.noNode()))
+                            .collect(
+                                Collectors.toList());
+                        Set<TopicPartition> updatedPartitions = metadata.updatePartially(partitionsWithUpdatedLeaderInfo, leaderNodes);
+                        if (log.isTraceEnabled()) {
+                            updatedPartitions.forEach(
+                                part -> log.trace("For {} leader was updated.", part)
+                            );
+                        }
+                    }


Is this worth the effort to generalize and share among Sender and AbstractFetch?

No, as there are subtle differences as in different response types & Fetcher has extra actions after leader is updated. So the common part is really is just 1-2 statements, so not worth it IMO.

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

AndrewJSchofield · 2023-10-25T09:33:54Z

clients/src/main/java/org/apache/kafka/clients/MetadataCache.java

@@ -150,7 +150,7 @@ MetadataCache mergeWith(String newClusterId,
        // We want the most recent topic ID. We start with the previous ID stored for retained topics and then
        // update with newest information from the MetadataResponse. We always take the latest state, removing existing
        // topic IDs if the latest state contains the topic name but not a topic ID.
-        Map<String, Uuid> newTopicIds = topicIds.entrySet().stream()
+        Map<String, Uuid> newTopicIds = this.topicIds.entrySet().stream()


I would rename the argument currently called topicIds too, just to try to prevent a similar bug being re-introduced by accident.

msn-tldr · 2023-10-25T15:36:22Z

@AndrewJSchofield & @kirktrue addressed the feedback so far!
thanks for the reviews.

AndrewJSchofield

lgtm

wcarlson5

just a couple of questions, but I don't think they need to block the PR

wcarlson5 · 2023-10-31T20:52:33Z

clients/src/main/java/org/apache/kafka/clients/Metadata.java

+            Metadata.LeaderAndEpoch currentLeader = currentLeader(partition);
+            Metadata.LeaderIdAndEpoch newLeader = partitionLeader.getValue();
+            if (!newLeader.epoch.isPresent() || !newLeader.leaderId.isPresent()) {
+                log.debug("For {}, incoming leader information is incomplete {}", partition, newLeader);


Should we only log these if debug is enabled? Or should we move them to INFO

wcarlson5 · 2023-10-31T21:01:53Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractFetch.java

@@ -198,6 +215,20 @@ protected void handleFetchSuccess(final Node fetchTarget,
                fetchBuffer.add(completedFetch);
            }

+            if (!partitionsWithUpdatedLeaderInfo.isEmpty()) {


I'm a bit confused why we collect the partitionsWithUpdatedLeaderInfo above when it looks like all we do with them is validate them to the subscriptions later. Is there any other use for having it out of the loop?

partitionsWithUpdatedLeaderInfo are collected above to update metadata once with all updates via metadata.updatePartitionLeadership.

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

msn-tldr · 2023-11-01T11:51:31Z

Rebasing my fork seems to have removed the PR commits from the remote branch kip951_client_changes, that has automatically closed this PR 😞
Something to be careful of in future, this github is also discussed in https://stackoverflow.com/questions/46053530/pull-request-closed-unexpectedly.
@wcarlson5 opening a new PR, don't see a way around it now.

msn-tldr commented Oct 17, 2023

View reviewed changes

AndrewJSchofield suggested changes Oct 17, 2023

View reviewed changes

kirktrue suggested changes Oct 18, 2023

View reviewed changes

msn-tldr force-pushed the kip951_client_changes branch from 58c167a to bc268a5 Compare October 24, 2023 14:17

msn-tldr changed the title ~~Kip951 client changes~~ KAFKA-15627: KIP-951's Leader discovery optimisations on the client Oct 24, 2023

AndrewJSchofield reviewed Oct 25, 2023

View reviewed changes

msn-tldr force-pushed the kip951_client_changes branch from 4888702 to 49bad17 Compare October 25, 2023 10:43

msn-tldr marked this pull request as ready for review October 25, 2023 10:44

AndrewJSchofield approved these changes Oct 26, 2023

View reviewed changes

wcarlson5 approved these changes Oct 31, 2023

View reviewed changes

msn-tldr closed this Nov 1, 2023

msn-tldr force-pushed the kip951_client_changes branch from f4c91da to 7ef5e8b Compare November 1, 2023 11:11

msn-tldr mentioned this pull request Nov 1, 2023

KAFKA-15627: KIP-951's Leader discovery optimisations on the client #14685

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-15627: KIP-951's Leader discovery optimisations on the client #14564

KAFKA-15627: KIP-951's Leader discovery optimisations on the client #14564

msn-tldr commented Oct 17, 2023 •

edited

msn-tldr Oct 17, 2023

kirktrue Oct 18, 2023

msn-tldr Oct 24, 2023 •

edited

AndrewJSchofield Oct 25, 2023

msn-tldr Oct 17, 2023

kirktrue left a comment

kirktrue Oct 18, 2023

kirktrue Oct 18, 2023

msn-tldr Oct 24, 2023

kirktrue Oct 18, 2023

msn-tldr Oct 24, 2023

kirktrue Oct 18, 2023

msn-tldr Oct 24, 2023 •

edited

kirktrue Oct 18, 2023

msn-tldr Oct 25, 2023

AndrewJSchofield Oct 25, 2023

msn-tldr commented Oct 25, 2023

AndrewJSchofield left a comment

wcarlson5 left a comment

wcarlson5 Oct 31, 2023

wcarlson5 Oct 31, 2023

msn-tldr Nov 1, 2023

msn-tldr commented Nov 1, 2023

KAFKA-15627: KIP-951's Leader discovery optimisations on the client #14564

KAFKA-15627: KIP-951's Leader discovery optimisations on the client #14564

Conversation

msn-tldr commented Oct 17, 2023 • edited

Committer Checklist (excluded from commit message)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msn-tldr Oct 24, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kirktrue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msn-tldr Oct 24, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msn-tldr commented Oct 25, 2023

AndrewJSchofield left a comment

Choose a reason for hiding this comment

wcarlson5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msn-tldr commented Nov 1, 2023

msn-tldr commented Oct 17, 2023 •

edited

msn-tldr Oct 24, 2023 •

edited

msn-tldr Oct 24, 2023 •

edited