KAFKA-14255; Return an empty record instead of an OffsetOutOfRangeException when fetching from a follower without a leader epoch #12734

dajac · 2022-10-12T13:27:20Z

Fetching from a follower is only allowed from version 11 of the fetch request. Our intent was to allow it assuming that those would also implement KIP-320 (leader epoch). It turns out that some clients use version 11 without KIP-320 and the broker allows this. The issue is that we don't know whether the client fetches from the follower based on the order of the leader or by mistake e.g. based on stale metadata. The latter means that a client could end up on the follower with an offset that the follower does not have yet. Instead of returning OffsetOutOfRangeException, we return an empty batch to the client with the expectation that the client will retry and eventually refresh its metadata. Note that we only do this if the client does not provide a leader epoch and use version 11. If the client uses version 11 and provided a leader epoch, it knows that it has to consult the leader on an OffsetOutOfRangeException error.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…eption when fetching from a follower without a leader epoch

dajac · 2022-10-12T13:27:43Z

This is an potential alternative to #12674.

hachikuji · 2022-10-13T21:00:09Z

core/src/main/scala/kafka/cluster/Partition.scala

+    // OffsetOutOfRangeException, we return an empty batch to the client with the expectation that
+    // the client will retry and eventually refresh its metadata. Note that we only do this if the
+    // client does not provide a leader epoch and use version 11.
+    if (isFollower && !currentLeaderEpoch.isPresent && fetchOffset > initialLogEndOffset) {


I guess the main downside of this is that notification about genuine out of range errors will be delayed. The only case I can think of where this could happen is an unclean election. That might be an acceptable tradeoff, but I wonder if we can do better. The thought I had before is to use the log end offset from the leader that we learn through Fetch requests. The tricky thing is ensuring we are not relying on a stale value. I don't know of an easy way to solve that without holding onto the Produce request until the next Fetch is received. That could be complicated to implement I guess. Any alternatives?

Unclean leader election, fetch version > 11 and no KIP-320 implemented seems like it would be rare enough not to make things too complex for it. KIP-320 is being implemented for librdkafka as we speak and we should file an issue on Sarama's and kafkajs's issue tracker for them to implement it too. That's the only way to have truly sane behavior.

KAFKA-14255; Return an empty record instead of an OffsetOutOfRangeExc…

2b61b02

…eption when fetching from a follower without a leader epoch

dajac requested a review from hachikuji October 12, 2022 13:27

dajac mentioned this pull request Oct 12, 2022

KAFKA-14255: Fetching from follower should be disallowed if fetch from follower is disabled #12674

Closed

3 tasks

dajac marked this pull request as ready for review October 13, 2022 08:47

hachikuji reviewed Oct 13, 2022

View reviewed changes

dajac closed this Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KAFKA-14255; Return an empty record instead of an OffsetOutOfRangeException when fetching from a follower without a leader epoch #12734

KAFKA-14255; Return an empty record instead of an OffsetOutOfRangeException when fetching from a follower without a leader epoch #12734

Uh oh!

dajac commented Oct 12, 2022

Uh oh!

dajac commented Oct 12, 2022

Uh oh!

hachikuji Oct 13, 2022

Uh oh!

ijuma Oct 14, 2022

Uh oh!

Uh oh!

KAFKA-14255; Return an empty record instead of an OffsetOutOfRangeException when fetching from a follower without a leader epoch #12734

KAFKA-14255; Return an empty record instead of an OffsetOutOfRangeException when fetching from a follower without a leader epoch #12734

Uh oh!

Conversation

dajac commented Oct 12, 2022

Committer Checklist (excluded from commit message)

Uh oh!

dajac commented Oct 12, 2022

Uh oh!

hachikuji Oct 13, 2022

Choose a reason for hiding this comment

Uh oh!

ijuma Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!