KAFKA-9157: Avoid generating empty segments if all records are deleted after cleaning #7711

huxihx · 2019-11-19T09:39:39Z

https://issues.apache.org/jira/browse/KAFKA-9157

If all records are deleted after cleaning, we should avoid generating empty log segments.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

huxihx · 2019-11-20T00:47:20Z

@junrao Please review this patch. Thanks.

junrao

@huxihx : Thanks for the PR. Left a few comments below.

junrao · 2019-12-03T00:07:58Z

core/src/test/scala/unit/kafka/log/LogCleanerTest.scala

+      log.appendAsLeader(record(key = 0, value = log.logEndOffset.toInt), leaderEpoch = 0)
+
+    cleaner.clean(LogToClean(new TopicPartition("test", 0), log, 0, log.activeSegment.baseOffset))
+    assertEquals("Empty segment(s) should not be created after cleaning.", 2, log.numberOfSegments)


Not sure if this test exposes the existing issue. Even without this patch, the cleaner will collapse the first 3 segments into one segment (with 1 record in it). To expose this issue, we have to force multiple log segments to be generated in the old logic. One way to do that is to generate log segments that are more than 2 billion apart in offsets.

Without this patch, this test will generate 4 segments. The first two segments are empty. The latter ones are non-empty.

junrao · 2019-12-03T00:10:47Z

core/src/main/scala/kafka/log/LogCleaner.scala

+        log.replaceSegments(List(), segments)
+      } else {
+        info(s"Swapping in cleaned segment ${cleanedOpt.get} for segment(s) $segments in log $log")
+        log.replaceSegments(List(cleanedOpt.get), segments)


Hmm, the changes here are more significant than I expected. I am wondering if they are truly necessary. For example, could we just keep the existing logic and check the size of cleaned segment? If the cleaned segment has size of 0, then we can replace the existing segments with an empty list and delete cleaned.

junrao · 2019-12-03T00:19:13Z

core/src/main/scala/kafka/log/Log.scala

@@ -2269,7 +2269,7 @@ class Log(@volatile var dir: File,
      // delete the old files
      for (seg <- sortedOldSegments) {
        // remove the index entry
-        if (seg.baseOffset != sortedNewSegments.head.baseOffset)
+        if (sortedNewSegments.isEmpty || seg.baseOffset != sortedNewSegments.head.baseOffset)


With this change, we may need to advance logStartOffset.

huxihx · 2019-12-04T02:08:24Z

@junrao Thanks for the response. I was thinking an option of not creating empty cleaned segments at all other than first creating them then deleting. If you prefer the latter one, that looks good to me as well.

huxihx · 2019-12-05T00:54:31Z

retest this please.

junrao

@huxihx : Thanks for the update PR. A few more comments below.

junrao · 2019-12-06T23:02:03Z

core/src/main/scala/kafka/log/Log.scala

@@ -2269,7 +2269,7 @@ class Log(@volatile var dir: File,
      // delete the old files
      for (seg <- sortedOldSegments) {
        // remove the index entry
-        if (seg.baseOffset != sortedNewSegments.head.baseOffset)
+        if (sortedNewSegments.isEmpty || seg.baseOffset != sortedNewSegments.head.baseOffset)


The first check seems unnecessary now since sortedNewSegments is never empty.

It's more like a defensive coding pattern to avoid NoSuchElementException. Do you think it deserves to be here?

junrao · 2019-12-06T23:04:08Z

core/src/main/scala/kafka/log/LogCleaner.scala

+      log.activeSegment.baseOffset else allNonEmptyCleanedSegments.head.baseOffset
+    log.maybeIncrementHighWatermark(LogOffsetMetadata(baseOffsetOfFirstNonEmptySegment))
+    log.maybeIncrementLogStartOffset(baseOffsetOfFirstNonEmptySegment)
+    log.removeAndDeleteSegments(log.nonActiveLogSegmentsFrom(-1L).filter(_.size == 0), true)


It's better to do the above logic while holding a lock of the Log to prevent any concurrent log segment changes. Could we move the above logic inside Log.replaceSegments()?

junrao · 2019-12-07T01:09:12Z

core/src/test/scala/unit/kafka/log/LogCleanerTest.scala


+    assertTrue(distinctValuesBySegmentAfterClean.size == distinctValuesBySegmentBeforeClean.size - 1)
+    distinctValuesBySegmentAfterClean = 0 +: distinctValuesBySegmentAfterClean


I understand the need to patch the logic here. However, the way it's done seems a bit ad hoc. Could we just change the logic to get the distinct values for a particular offset range?

huxihx · 2019-12-17T01:54:03Z

retest this please.

junrao

@huxihx : Thanks for the updated patch. A couple of more suggestions below.

junrao · 2019-12-17T22:26:01Z

core/src/main/scala/kafka/log/Log.scala

+
+      val (emptyCleanedSegments, nonEmptyCleanedSegments) = sortedNewSegments.partition(_.size == 0)
+
+      if (emptyCleanedSegments.nonEmpty) {


Currently, we expect that if there is an empty segment, it would be the only segment in newSegments. Perhaps we could throw an IllegalArgumentException if that's not the case.

junrao · 2019-12-17T22:41:37Z

core/src/main/scala/kafka/log/Log.scala

+        if (emptyCleanedSegments.head.baseOffset == baseOffset) {
+          val newLogStartOffset = Math.max(logStartOffset, segments.higherKey(baseOffset))
+          if (highWatermark < newLogStartOffset)
+            maybeIncrementHighWatermark(LogOffsetMetadata(newLogStartOffset))


We probably don't need this. Currently, the log cleaner never cleans beyond the HWM. If a bug causes the logStartOffset to be larger than HWM, we can just rely on the assertion in maybeIncrementLogStartOffset.

Several test cases did not bump the HWM when appending records. The patch will fail such cases. That's why I also update HWM here. Is that okay we could refine those cases then?

Hmm, in those test cases, should the HWM be higher to start with? Or is there a legitimate case that the HWM needs to be advanced when log segments are swapped? If it's the former, we will want to patch the test cases.

Some tests will update HWM to LEO when they append new records, and some won't. Prior this patch, we don't care about the HWM thing since log start offset does not advance after cleaning. However, with this patch applied, things changed. I think it'd better update HWM to LEO as well for those failed tests. Does it make sense?

@huxihx : Thanks for the reply. In general, HWM is supposed to be changed only at the ReplicaManager level since it can only advance if all in-sync replicas have received the records. At the Log layer, we only know the local replica and shouldn't be advancing the HWM directly. That's why I was suggesting to fix the tests instead of changing HWM here so that the tests could pass.

huxihx · 2019-12-20T02:54:53Z

retest this please.

huxihx · 2019-12-23T06:48:11Z

retest this please.

huxihx · 2019-12-30T01:03:57Z

@junrao Please review again. Thanks.

huxihx · 2019-12-30T06:12:56Z

retest this please.

junrao

@huxihx : Sorry for the delay. Thanks for the new update. Added a few more comments below.

junrao · 2020-01-04T17:07:14Z

core/src/main/scala/kafka/log/Log.scala

@@ -2232,6 +2232,8 @@ class Log(@volatile var dir: File,
   *   <li> Cleaner creates one or more new segments with suffix .cleaned and invokes replaceSegments().
   *        If broker crashes at this point, the clean-and-swap operation is aborted and
   *        the .cleaned files are deleted on recovery in loadSegments().
+   *   <li> If the new cleaned segments contain empty(zero-sized) ones, replaceSegments() tries to update high watermark and log start offset,


The comment needs to be adjusted since we are no longer updating the high watermark.

junrao · 2020-01-04T17:12:53Z

core/src/main/scala/kafka/log/Log.scala

@@ -2254,7 +2256,21 @@ class Log(@volatile var dir: File,
   */
  private[log] def replaceSegments(newSegments: Seq[LogSegment], oldSegments: Seq[LogSegment], isRecoveredSwapFile: Boolean = false): Unit = {
    lock synchronized {
+      if (newSegments.size > 1 && newSegments.exists(_.size == 0))
+        throw new IllegalArgumentException(s"Expected only one cleaned segment should be created after cleaning, but we got $newSegments")


replaceSegments() is called during log recovery too. So, we want to write the error message in a more general way. Sth like "There should be only one new log segment if any of the new segment is empty, but we got $newSegments instead".

junrao · 2020-01-04T17:18:09Z

core/src/main/scala/kafka/log/Log.scala

+      if (emptyCleanedSegments.nonEmpty) {
+        val baseOffset = segments.firstKey()
+        if (emptyCleanedSegments.head.baseOffset == baseOffset) {
+          val newLogStartOffset = Math.max(logStartOffset, segments.higherKey(baseOffset))


Hmm, this doesn't seem quite right. There could be multiple segments in oldSegments that the new segment is trying to replace. In that case, we want to get the higher key of the last old segment.

@junrao Sorry for the long delay. Will take care of the comments ASAP.

huxihx · 2020-02-11T05:41:40Z

@junrao Correct me if I am wrong. The key problem here is whether we always update log start offset if an empty cleaned segment is generated. IMO, the update should be conditional. In some cases we do not need to do that.

Say we have three sorted segments, A, B and C which A is not empty after cleaning. When cleaner is doing cleaning for B, and the generated cleaned segment for B is empty. In such a case, we should not update the log start offset since A is not empty. That's actually the case where LogCleanerParameterizedIntegrationTest.cleanerTest wants to test. Does it make sense?

junrao · 2020-02-11T16:41:32Z

@huxihx : I was thinking about the following case. Suppose that we have 4 segments A, B, C and D with starting offset 100, 200, 300 and 400. The firstDirty offset is at 400. We load segment D to build the offset map. During cleaning, segment A, B and C are grouped together and are cleaned into segment A', which is empty. After replacing segment A, B and C with segment A', it seems that the log start offset should be 400, instead of 200?

huxihx · 2020-02-14T02:13:04Z

retest this please.

junrao · 2020-02-20T18:20:21Z

ok to test

junrao · 2020-02-20T21:05:40Z

retest this please.

junrao · 2020-02-20T21:11:08Z

test this please

junrao · 2020-02-20T21:24:26Z

retest this please

junrao · 2020-02-21T02:09:06Z

@huxihx : The following test failure seems related to this PR.

14:24:07 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag FAILED
14:24:07     java.lang.AssertionError: log cleaner should have processed at least to offset 294, but lastCleaned=0
14:24:07         at org.junit.Assert.fail(Assert.java:89)
14:24:07         at org.junit.Assert.assertTrue(Assert.java:42)
14:24:07         at kafka.log.LogCleanerIntegrationTest.testMaxLogCompactionLag(LogCleanerIntegrationTest.scala:162)

huxihx · 2020-07-07T14:26:05Z

@junrao Sorry for the long delay. Please review this patch again.

huxihx · 2020-07-24T05:59:14Z

retest this please

huxihx · 2020-08-17T01:34:50Z

@junrao Could you take some time to review this patch? Thanks :)

huxihx force-pushed the KAFKA-9157 branch from ce3ad47 to 31ab4de Compare November 19, 2019 09:46

junrao reviewed Dec 3, 2019

View reviewed changes

huxihx force-pushed the KAFKA-9157 branch from 31ab4de to b2e540c Compare December 4, 2019 02:03

junrao reviewed Dec 7, 2019

View reviewed changes

junrao reviewed Dec 17, 2019

View reviewed changes

huxihx force-pushed the KAFKA-9157 branch from 9de608b to 7fc21ea Compare December 19, 2019 05:57

huxihx force-pushed the KAFKA-9157 branch from d04d219 to 1c8a07c Compare December 23, 2019 06:31

junrao reviewed Jan 4, 2020

View reviewed changes

huxihx force-pushed the KAFKA-9157 branch from 1c8a07c to 3b7e1ea Compare February 11, 2020 05:34

huxihx force-pushed the KAFKA-9157 branch from 3b7e1ea to c7fbbf4 Compare February 14, 2020 03:25

huxihx force-pushed the KAFKA-9157 branch from c7fbbf4 to 220fdd7 Compare July 7, 2020 11:11

remove empty segments after cleaning

7c8f915

huxihx force-pushed the KAFKA-9157 branch from 220fdd7 to 7c8f915 Compare October 12, 2020 09:13


		assertTrue(distinctValuesBySegmentAfterClean.size == distinctValuesBySegmentBeforeClean.size - 1)
		distinctValuesBySegmentAfterClean = 0 +: distinctValuesBySegmentAfterClean


		val (emptyCleanedSegments, nonEmptyCleanedSegments) = sortedNewSegments.partition(_.size == 0)

		if (emptyCleanedSegments.nonEmpty) {

KAFKA-9157: Avoid generating empty segments if all records are deleted after cleaning #7711

Are you sure you want to change the base?

KAFKA-9157: Avoid generating empty segments if all records are deleted after cleaning #7711

Conversation

huxihx commented Nov 19, 2019

Committer Checklist (excluded from commit message)

huxihx commented Nov 20, 2019

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huxihx commented Dec 4, 2019

huxihx commented Dec 5, 2019

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huxihx commented Dec 17, 2019

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huxihx commented Dec 20, 2019

huxihx commented Dec 23, 2019

huxihx commented Dec 30, 2019

huxihx commented Dec 30, 2019

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huxihx commented Feb 11, 2020

junrao commented Feb 11, 2020

huxihx commented Feb 14, 2020

junrao commented Feb 20, 2020

junrao commented Feb 20, 2020

junrao commented Feb 20, 2020

junrao commented Feb 20, 2020

junrao commented Feb 21, 2020

huxihx commented Jul 7, 2020

huxihx commented Jul 24, 2020

huxihx commented Aug 17, 2020