KAFKA-7747 Check for truncation after leader changes #6371

mumrah · 2019-03-05T00:29:50Z

After the client detects a leader change we need to check the offset of the current leader for truncation.

TODO expand on this.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

hachikuji

Thanks for the patch. Left a few comments.

hachikuji · 2019-03-14T21:26:40Z

clients/src/main/java/org/apache/kafka/clients/Metadata.java

+        Node leader = fetch().leaderFor(tp);
+        if (leader == null)
+            leader = Node.noNode();
+        // TODO there a race here between reading the leader node and reading the epoch? Does it matter?


The other methods are synchronized. Any reason not to do that here?

clients/src/main/java/org/apache/kafka/clients/Metadata.java

clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

…e in prepareFetchRequest

…n check is run in updateFetchPositions

This ensures that the position has the most up-to-date leader as possible. This won't submit the async validation request, but just transition the subscription state.

mumrah · 2019-04-15T20:49:28Z

retest this please

mumrah · 2019-04-16T13:25:00Z

retest this please

hachikuji

Just a few more comments. I think there were still some minor previous comments which haven't been addressed.

hachikuji · 2019-04-17T23:46:32Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

@@ -2206,6 +2226,7 @@ private boolean updateFetchPositions(final Timer timer) {
        // by always ensuring that assigned partitions have an initial position.
        if (coordinator != null && !coordinator.refreshCommittedOffsetsIfNeeded(timer)) return false;

+


nit: unneeded newline

hachikuji · 2019-04-18T01:59:13Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

+        });
+
+        // Collect positions needing validation, with backoff
+        Map<TopicPartition, SubscriptionState.FetchPosition> partitionsToValidate = subscriptions


Can we save the second pass over the partitions by doing this collection in the loop above? I'm just thinking about MM-like use cases where the number of partitions could be quite large. A possible optimization is to to cache the metadata update version so that we only bother redoing this check if there has actually been a metadata update.

The first pass through all the partitions covers the case of metadata changing, but the second pass through is also used to resubmit the async request with backoff. We could remember the last metadata version seen and avoid unnecessary calls to the first loop.

How about we leave this for a follow-up?

Works for me 👍

hachikuji · 2019-04-18T02:19:08Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

-                Long offset = this.subscriptions.position(partition);
-                if (offset != null)
-                    return offset;
+                SubscriptionState.FetchPosition position = this.subscriptions.validPosition(partition);


This is interesting. There is always a race with the next leader election for a valid position. Do you think we need to be strict about the timing? I guess if you provide an epoch in seek(), this would be a good way to force validation before fetching.

hachikuji · 2019-04-18T02:20:57Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

@@ -2196,6 +2213,9 @@ private void close(long timeoutMs, boolean swallowException) {
     * @return true iff the operation completed without timing out
     */
    private boolean updateFetchPositions(final Timer timer) {
+        // If any partitions have been truncated due to a leader change, we need to validate the offsets
+        fetcher.validateOffsetsIfNeeded();


Already mentioned, but I think we can be smarter about caching some state to avoid unnecessary work here. Validation is only needed if we do an unprotected seek or a metadata update arrives. Probably this can be left for a follow-up. It's only a concern when the number of partitions and the poll frequency is high.

hachikuji · 2019-04-18T02:28:50Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java

            entry.getValue().leaderEpoch().ifPresent(epoch -> this.metadata.updateLastSeenEpochIfNewer(entry.getKey(), epoch));
-            this.subscriptions.seek(tp, offset);
+            this.subscriptions.seek(tp, position);
+            this.subscriptions.maybeValidatePosition(tp, leaderAndEpoch);


Hmm.. Doesn't this mean we'd only validate the committed offset if there is a change to the current leader and epoch?

Also, could leaderAndEpoch be updated by the call to updateLastSeenEpochIfNewer above?

In the position I'm creating above, the leader epoch is empty which will cause it to enter validation. This relies on the FetchPosition#safeToFetchFrom behavior. If this seems too convoluted, we could add a seekAndValidate method or something similar.

hachikuji · 2019-04-18T14:53:44Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

+                    offsetAndMetadata.offset(),
+                    offsetAndMetadata.leaderEpoch(),
+                    this.metadata.leaderAndEpoch(partition));
+            this.subscriptions.seek(partition, newPosition);


Don't we need to do validation on this new position? Also, did we lose the call to update the last seen epoch?

Also, did we lose the call to update the last seen epoch?

I'll add that back

Don't we need to do validation on this new position?

I suppose we do, yea. Perhaps another use case of the seekAndValidate method proposed above?

hachikuji · 2019-04-18T15:01:54Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

+                        if (subscriptions.awaitingValidation(respTopicPartition)) {
+                            SubscriptionState.FetchPosition currentPosition = subscriptions.position(respTopicPartition);
+                            Metadata.LeaderAndEpoch currentLeader = currentPosition.currentLeader;
+                            if (!currentLeader.equals(cachedLeaderAndEpochs.get(respTopicPartition))) {


The checks above ensure that we are still in the validating phase and that the current leader epoch hasn't changed. I guess it is still possible that both of these are true, but the user has seeked to a different position. Perhaps we can add position to the cached data above?

I think it's still okay as long as the position's epoch hasn't changed. What's the side effect if you seek to offset 10 (FETCHING), do validation (AWAIT_VALIDATION), seek to offset 30 (FETCHING), do validation again (AWAIT_VALIDATION), and then get back the OffsetsForLeader response from the first async validation? I think as long as the position's epoch is the same, there isn't a problem. When the second response comes back it will get ignored since we won't be in the right state. WDYT?

I agree that the important thing is that the position's epoch hasn't changed. That and the current leader epoch are the only input to the OffsetsForLeaderEpoch API.

hachikuji · 2019-04-18T15:20:14Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java

+        assignedState(tp).position(position);
+    }
+
+    public boolean maybeValidatePosition(TopicPartition tp, Metadata.LeaderAndEpoch leaderAndEpoch) {


Should this be named currentLeaderAndEpoch?

hachikuji

LGTM. Thanks for the patch!

…he#6371) After the client detects a leader change we need to check the offset of the current leader for truncation. These changes were part of KIP-320: https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation. Reviewers: Jason Gustafson <jason@confluent.io>

Jason Gustafson and others added 4 commits February 6, 2019 09:34

KAFKA-7747 [WIP]; Check for truncation after leader changes

ba290e2

WIP

499751b

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

863f043

After-merge cleanup

a6a6361

mumrah requested a review from hachikuji March 5, 2019 00:29

mumrah added 5 commits March 5, 2019 09:29

Checkstyle

93a84b6

Include leader epoch in ListOffsetRequest from client

a955860

Refactor some common code

8a1abbe

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

fc2f1f3

Post-merge cleanup

be3d0a4

hachikuji self-assigned this Mar 12, 2019

mumrah added 3 commits March 11, 2019 21:02

Fix a null

10ef2b6

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

d16dfdb

Bit of cleanup and comments

b9b337b

jsancio reviewed Mar 12, 2019

View reviewed changes

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java Outdated Show resolved Hide resolved

jsancio reviewed Mar 12, 2019

View reviewed changes

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java Show resolved Hide resolved

mumrah added 2 commits March 13, 2019 13:48

Remove unused method

55434b1

Renaming some stuff, adding comments, and starting on unit tests

d569069

hachikuji reviewed Mar 14, 2019

View reviewed changes

mumrah added 11 commits March 15, 2019 13:25

Feedback from PR

84160b8

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

8f19162

Move OffsetsForLeaderEpoch code into a new class

12085a0

Rename helper class

591dd8e

Rename setResetPending to setNextAllowedRetry

fe2c527

Just trying something out

3527d57

Remove extra fetcher class

9b50105

Add some comments/docs

9370de5

Merge branch 'exp-async-client' into KAFKA-7747

4a9e276

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

b396129

Adding some tests

def87b9

mumrah and others added 18 commits April 8, 2019 18:59

Potential fix for ConsumerCoordinator validating the loaded offset

4cbf370

Fix position initialization issue

10db879

Use the LeaderAndEpoch from metadata, not the position

e77548e

Always use FetchPosition.currentLeader as request target

9cb6c5e

Use backoff with async validation requests, add isEmpty check for Nod…

05d93a5

…e in prepareFetchRequest

Guard against AWAIT_RESET -> AWAIT_VALIDATION and make sure validatio…

764118c

…n check is run in updateFetchPositions

Validate the fetch position inside prepareFetchRequests

37aadad

This ensures that the position has the most up-to-date leader as possible. This won't submit the async validation request, but just transition the subscription state.

Don't update metadata if a partition has an error

1fb53bd

Handle some edge cases

03fa6ae

Cleanup

57f3396

Improve the state machine a bit

62ceb29

Fix a metadata response handling edge case

3aca178

Fix error handling in AsyncClient

93a882a

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

4faf4cb

Update FetcherTest#testOffsetValidationFencing

907d4e9

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

d005607

Revert log4j changes

b3a71a7

checkstyle

d4d2009

hachikuji reviewed Apr 18, 2019

View reviewed changes

mumrah added 4 commits April 18, 2019 16:24

Feedback from PR

4e73f99

Add seekAndValidate method to SubscriptionState

870e062

Merge remote-tracking branch 'apache/trunk' into KAFKA-7747

0076644

Don't check the current epoch when validating a seek

0934f5a

hachikuji approved these changes Apr 20, 2019

View reviewed changes

hachikuji merged commit 409fabc into apache:trunk Apr 21, 2019

andyfangdz mentioned this pull request Feb 10, 2020

KAFKA-9583: use topic-partitions grouped by node to send OffsetsForLeaderEpoch requests #8077

Merged

3 tasks

Nevon mentioned this pull request Jul 20, 2020

Support KIP-320: Handle log truncation tulios/kafkajs#818

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-7747 Check for truncation after leader changes #6371

KAFKA-7747 Check for truncation after leader changes #6371

mumrah commented Mar 5, 2019

hachikuji left a comment

hachikuji Mar 14, 2019

mumrah commented Apr 15, 2019

mumrah commented Apr 16, 2019

hachikuji left a comment

hachikuji Apr 17, 2019

hachikuji Apr 18, 2019

mumrah Apr 18, 2019

hachikuji Apr 18, 2019

mumrah Apr 18, 2019

hachikuji Apr 18, 2019

hachikuji Apr 18, 2019

hachikuji Apr 18, 2019

mumrah Apr 18, 2019

hachikuji Apr 18, 2019

mumrah Apr 18, 2019 •

edited

Loading

hachikuji Apr 18, 2019

mumrah Apr 18, 2019

hachikuji Apr 18, 2019

hachikuji Apr 18, 2019

hachikuji left a comment

		@@ -2206,6 +2226,7 @@ private boolean updateFetchPositions(final Timer timer) {
		// by always ensuring that assigned partitions have an initial position.
		if (coordinator != null && !coordinator.refreshCommittedOffsetsIfNeeded(timer)) return false;

KAFKA-7747 Check for truncation after leader changes #6371

KAFKA-7747 Check for truncation after leader changes #6371

Conversation

mumrah commented Mar 5, 2019

Committer Checklist (excluded from commit message)

hachikuji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumrah commented Apr 15, 2019

mumrah commented Apr 16, 2019

hachikuji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumrah Apr 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hachikuji left a comment

Choose a reason for hiding this comment

mumrah Apr 18, 2019 •

edited

Loading