KAFKA-7385: Fix log cleaner behavior when empty batches are retained #5623

dhruvilshah3 · 2018-09-07T23:14:12Z

With idempotent producers, we may leave empty batches in the log during log compaction. When filtering the data, we keep track of state like maxOffset and maxTimestamp of filtered data. This patch ensures we maintain this state correctly for the case when only empty batches are left in MemoryRecords#filterTo. Without this patch, we did not initialize maxOffset in this edge case which led us to append data to the log with maxOffset = -1L, causing the append to fail and log cleaner to crash.

hachikuji · 2018-09-07T23:46:17Z

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

@@ -371,31 +388,24 @@ public int hashCode() {
    public static class FilterResult {
        public final ByteBuffer output;
        public final int messagesRead;
-        public final int bytesRead;
+        public final int totalBytesRead;


Why was this only this one renamed?

hachikuji · 2018-09-07T23:46:29Z

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java


        for (MutableRecordBatch batch : batches) {
-            bytesRead += batch.sizeInBytes();
+            long maxOffset = -1L;
+            int messagesRead = 0;


If there are no messages retained, then don't we lose this value?

ijuma · 2018-09-08T14:44:17Z

@dhruvilshah3 can you please add a description to the PR explaining the fix?

ijuma · 2018-09-08T14:46:02Z

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

-    private static FilterResult filterTo(TopicPartition partition, Iterable<MutableRecordBatch> batches,
-                                         RecordFilter filter, ByteBuffer destinationBuffer, int maxRecordBatchSize,
-                                         BufferSupplier decompressionBufferSupplier) {
+    private static class FilteredBatchesMetadata {


Can you please explain why we added this new class and only have a subset of fields in it?

This class is tracking state for all filtered batches. Earlier, all branches in filterTo were cluttered with logic to track state per batch and aggregating that over all the batches we have seen so far. With this class, we now only track per batch state and FilteredBatchesMetadata holds the aggregate. Cleaner separation and avoiding code repetition, mostly which also makes it easier to reason about which fields have / have not been initialized in a particular branch.

Two fields have been left out from here: messagesRead and bytesRead. Both these fields are aggregated for all the data we have seen so far, regardless of whether we end up filtering it or not.

Thanks for the explanation. Maybe worth documenting the intent (briefly).

I think it might be nicer to make the fields in FilterResult mutable and move the functionality there. It's a little annoying to have an additional internal class just for accumulating a subset of the filter stats.

hachikuji

Thanks for the patch, left a few small comments.

hachikuji · 2018-09-08T18:16:17Z

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

@@ -245,18 +263,17 @@ private static FilterResult filterTo(TopicPartition partition, Iterable<MutableR
                        batch.producerEpoch(), batch.baseSequence(), batch.baseOffset(), batch.lastOffset(),
                        batch.partitionLeaderEpoch(), batch.timestampType(), batch.maxTimestamp(),
                        batch.isTransactional(), batch.isControlBatch());
+                filteredBatchesMetadata.addRetainedBatchMetadata(batch, retainedRecords.size(), true);


Isn't retainedRecords.size() just 0 down this path?

hachikuji · 2018-09-08T18:23:39Z

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

-    private static FilterResult filterTo(TopicPartition partition, Iterable<MutableRecordBatch> batches,
-                                         RecordFilter filter, ByteBuffer destinationBuffer, int maxRecordBatchSize,
-                                         BufferSupplier decompressionBufferSupplier) {
+    private static class FilteredBatchesMetadata {


I think it might be nicer to make the fields in FilterResult mutable and move the functionality there. It's a little annoying to have an additional internal class just for accumulating a subset of the filter stats.

clients/src/test/java/org/apache/kafka/common/record/MemoryRecordsTest.java

hachikuji · 2018-09-08T19:34:41Z

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

        ByteBufferOutputStream bufferOutputStream = new ByteBufferOutputStream(destinationBuffer);
+        int totalBytesRead = 0;


Since FilterResult is mutable, could we just increment the bytesRead and messagesRead fields directly?

hachikuji

LGTM. Thanks for the patch.

…ined (#5623) With idempotent/transactional producers, we may leave empty batches in the log during log compaction. When filtering the data, we keep track of state like `maxOffset` and `maxTimestamp` of filtered data. This patch ensures we maintain this state correctly for the case when only empty batches are left in `MemoryRecords#filterTo`. Without this patch, we did not initialize `maxOffset` in this edge case which led us to append data to the log with `maxOffset` = -1L, causing the append to fail and log cleaner to crash. Reviewers: Jason Gustafson <jason@confluent.io>

…ined (apache#5623) With idempotent/transactional producers, we may leave empty batches in the log during log compaction. When filtering the data, we keep track of state like `maxOffset` and `maxTimestamp` of filtered data. This patch ensures we maintain this state correctly for the case when only empty batches are left in `MemoryRecords#filterTo`. Without this patch, we did not initialize `maxOffset` in this edge case which led us to append data to the log with `maxOffset` = -1L, causing the append to fail and log cleaner to crash. Reviewers: Jason Gustafson <jason@confluent.io>

dhruvilshah3 force-pushed the cleaner-fix branch from 6045039 to 0e0684f Compare September 7, 2018 23:15

Fix log cleaner behavior when empty batches are retained

2f9fcaf

dhruvilshah3 force-pushed the cleaner-fix branch from 0e0684f to 2f9fcaf Compare September 7, 2018 23:18

hachikuji self-assigned this Sep 7, 2018

hachikuji reviewed Sep 7, 2018

View reviewed changes

Fixes for test failures + additional test

ec04f82

ijuma reviewed Sep 8, 2018

View reviewed changes

hachikuji reviewed Sep 8, 2018

View reviewed changes

dhruvilshah3 added 2 commits September 8, 2018 12:18

Address review comments

b0344f1

Minor class refactoring

4ea2235

hachikuji reviewed Sep 8, 2018

View reviewed changes

dhruvilshah3 added 2 commits September 8, 2018 15:16

Address review comment

0a4dee7

Fix typo

9cece29

hachikuji approved these changes Sep 8, 2018

View reviewed changes

hachikuji merged commit 958cdca into apache:trunk Sep 9, 2018

dhruvilshah3 deleted the cleaner-fix branch September 9, 2018 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-7385: Fix log cleaner behavior when empty batches are retained #5623

KAFKA-7385: Fix log cleaner behavior when empty batches are retained #5623

dhruvilshah3 commented Sep 7, 2018 •

edited

hachikuji Sep 7, 2018

hachikuji Sep 7, 2018

ijuma commented Sep 8, 2018

ijuma Sep 8, 2018

dhruvilshah3 Sep 8, 2018

ijuma Sep 8, 2018

hachikuji Sep 8, 2018

hachikuji left a comment

hachikuji Sep 8, 2018

hachikuji Sep 8, 2018

hachikuji Sep 8, 2018

hachikuji left a comment

		ByteBufferOutputStream bufferOutputStream = new ByteBufferOutputStream(destinationBuffer);
		int totalBytesRead = 0;

KAFKA-7385: Fix log cleaner behavior when empty batches are retained #5623

KAFKA-7385: Fix log cleaner behavior when empty batches are retained #5623

Conversation

dhruvilshah3 commented Sep 7, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ijuma commented Sep 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hachikuji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hachikuji left a comment

Choose a reason for hiding this comment

dhruvilshah3 commented Sep 7, 2018 •

edited