Skip to content

Conversation

dajac
Copy link
Member

@dajac dajac commented Oct 12, 2022

Fetching from a follower is only allowed from version 11 of the fetch request. Our intent was to allow it assuming that those would also implement KIP-320 (leader epoch). It turns out that some clients use version 11 without KIP-320 and the broker allows this. The issue is that we don't know whether the client fetches from the follower based on the order of the leader or by mistake e.g. based on stale metadata. The latter means that a client could end up on the follower with an offset that the follower does not have yet. Instead of returning OffsetOutOfRangeException, we return an empty batch to the client with the expectation that the client will retry and eventually refresh its metadata. Note that we only do this if the client does not provide a leader epoch and use version 11. If the client uses version 11 and provided a leader epoch, it knows that it has to consult the leader on an OffsetOutOfRangeException error.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

…eption when fetching from a follower without a leader epoch
@dajac
Copy link
Member Author

dajac commented Oct 12, 2022

This is an potential alternative to #12674.

// OffsetOutOfRangeException, we return an empty batch to the client with the expectation that
// the client will retry and eventually refresh its metadata. Note that we only do this if the
// client does not provide a leader epoch and use version 11.
if (isFollower && !currentLeaderEpoch.isPresent && fetchOffset > initialLogEndOffset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the main downside of this is that notification about genuine out of range errors will be delayed. The only case I can think of where this could happen is an unclean election. That might be an acceptable tradeoff, but I wonder if we can do better. The thought I had before is to use the log end offset from the leader that we learn through Fetch requests. The tricky thing is ensuring we are not relying on a stale value. I don't know of an easy way to solve that without holding onto the Produce request until the next Fetch is received. That could be complicated to implement I guess. Any alternatives?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclean leader election, fetch version > 11 and no KIP-320 implemented seems like it would be rare enough not to make things too complex for it. KIP-320 is being implemented for librdkafka as we speak and we should file an issue on Sarama's and kafkajs's issue tracker for them to implement it too. That's the only way to have truly sane behavior.

@dajac dajac closed this Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants