Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-13194: bound cleaning by both LSO and HWM when firstUnstableOffsetMetadata is None #11199

Merged
merged 3 commits into from Aug 13, 2021

Conversation

lbradstreet
Copy link
Contributor

@lbradstreet lbradstreet commented Aug 11, 2021

When the high watermark is contained in a non-active segment, we are not correctly bounding it by the hwm. This means that uncommitted records may overwrite committed data. I've separated out the bounding point tests to check the hwm case in addition to the existing active segment case.

See https://issues.apache.org/jira/browse/KAFKA-13194 for a further explanation.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbradstreet : Thanks for the PR. Great find! Just a couple of minor comments below.

@@ -595,8 +595,8 @@ private[log] object LogCleanerManager extends Logging {
// may be cleaned
val firstUncleanableDirtyOffset: Long = Seq(

// we do not clean beyond the first unstable offset
log.firstUnstableOffset,
// we do not clean beyond the last stable offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing issue. But could we update the comment in line 593 to include last stable offset too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've updated it with a better explanation of what we bound by.

val lastCleanOffset = Some(0L)
val cleanableOffsets = LogCleanerManager.cleanableOffsets(log, lastCleanOffset, time.milliseconds)
assertEquals(0L, cleanableOffsets.firstDirtyOffset, "The first cleanable offset starts at the beginning of the log.")
assertEquals(log.highWatermark, cleanableOffsets.firstUncleanableDirtyOffset, "The first uncleanable offset is bounded by the hwm.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the description of the test says bounded by LSO, should we change log.highWatermark to log.lastStableOffset and the error message accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I intended since the lastStableOffset should equal the highWatermark, but I think it makes sense to keep the intent. Instead I have checked that log.highWatermark == log.lastStableOffset in the line prior to it.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbradstreet : Thanks for the updated PR. LGTM. Are the 29 test failures related to this PR?

@lbradstreet
Copy link
Contributor Author

@lbradstreet : Thanks for the updated PR. LGTM. Are the 29 test failures related to this PR?

@junrao yeah, they're looking related. Let me clean those up.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbradstreet : Thanks for the updated PR. Were the java 16 tests ok?

@lbradstreet
Copy link
Contributor Author

@lbradstreet : Thanks for the updated PR. Were the java 16 tests ok?

@junrao it looks like it is one of the issues where System.exit was called without the Exit.exit override.

[2021-08-13T19:08:33.299Z] * What went wrong:
[2021-08-13T19:08:33.299Z] Execution failed for task ':core:integrationTest'.
[2021-08-13T19:08:33.299Z] > Process 'Gradle Test Executor 129' finished with non-zero exit value 1
[2021-08-13T19:08:33.299Z]   This problem might be caused by incorrect test process configuration.
[2021-08-13T19:08:33.299Z]   Please refer to the test execution section in the User Manual at https://docs.gradle.org/7.1.1/userguide/java_testing.html#sec:test_execution

I don't think it's related to this change but if we want we can rerun it to be sure.

@junrao junrao merged commit 64b8e17 into apache:trunk Aug 13, 2021
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
…setMetadata is None (apache#11199)

When the high watermark is contained in a non-active segment, we are not correctly bounding it by the hwm. This means that uncommitted records may overwrite committed data. I've separated out the bounding point tests to check the hwm case in addition to the existing active segment case.

Reviewers: Jun Rao <junrao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants