Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-7401: Fix inconsistent range exception on segment recovery #6220

Merged
merged 2 commits into from Feb 12, 2019

Conversation

apovzner
Copy link
Contributor

@apovzner apovzner commented Feb 1, 2019

This PR fixes "java.lang.IllegalArgumentException: inconsistent range" which happens on broker startup after unclean shutdown during log cleaning phase that creates swap files (in case where base offset < log start offset). Added testRecoveryAfterCrashAndIncrementedLogStartOffset that reproduces Kafka-7401.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

val fetchDataInfo = segment.read(startOffset, None, Int.MaxValue)
if (fetchDataInfo != null)
loadProducersFromLog(stateManager, fetchDataInfo.records)
if (stateManager.mapEndOffset < segment.baseOffset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic intent of this logic is to ensure that we have consistent producer state up to the base offset of the segment we're recovering before we begin the actual recovery. In this case, the log start offset is somewhere above the base offset of this segment, which makes the call to truncateAndReload above a little strange (i.e. the end offset is smaller than the start offset). This seems to work, but feels brittle. Since we do not actually do partial segment recovery, I wonder if we could do something like this:

val recoveryStartOffset = math.min(logStartOffset, segment.baseOffset)
stateManager.truncateAndReload(recoveryStartOffset, segment.baseOffset, time.milliseconds)

Then the map end offset is always no greater than the segment base offset. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. Given we want to have consistent producer state up to the base offset of the segment we are recovering, and truncateAndReload will set the map end offset properly in this case, this approach seems better. I also tested it just now, and it works (fixes the original bug). I will update the PR with your suggestion.

Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the patch.

@hachikuji hachikuji merged commit 1b6bfda into apache:1.1 Feb 12, 2019
@landau
Copy link

landau commented Dec 2, 2019

Was this released in Kafka 1.1.2? I cannot find the download here https://kafka.apache.org/downloads or here https://archive.apache.org/dist/kafka/

This ticket, https://issues.apache.org/jira/browse/KAFKA-7401, claims it was released as part of 1.1.2

If it hasn't been released, what effort would it be to get it released? We are currently unable to start one of our brokers due to this. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants