KAFKA-20452: Avoid creating unnecessary empty batches in LogCleaner below the High Watermark#22089
Conversation
|
PR #17193 (KAFKA-17076) introduced logic to retain the last batch in each cleaning round even if empty, to preserve logEndOffset after compaction. As noted by @junrao, the current check uses We only need to retain an empty batch when its
|
9521654 to
8288640
Compare
| CleanedTransactionMetadata transactionMetadata, | ||
| long legacyDeleteHorizonMs, | ||
| long upperBoundOffsetOfCleaningRound) throws IOException { | ||
| long highWatermark) throws IOException { |
There was a problem hiding this comment.
Could we get the hw directly within this method instead of passing it in from the caller?
There was a problem hiding this comment.
If we use log.highWatermark() directly, I think the main impact would be on unit tests. If the HW is modified during cleaning, different batches may see different HW values, but I don't think this would cause any issues since the HW typically only moves forward.
There was a problem hiding this comment.
yes, since hw is monotonic, we can simplify the method by fetching it internally rather than capturing the value too early.
There was a problem hiding this comment.
I adjusted how the highWatermark parameter is passed, updated the corresponding unit tests, and all tests pass locally.
8288640 to
bfbe7f9
Compare
bfbe7f9 to
316d380
Compare
316d380 to
2b62739
Compare
2b62739 to
b08b42d
Compare
…elow the High Watermark
b08b42d to
9c65209
Compare
|
Hi @junrao , Could you please take a look when you have a moment? Thanks! |
This patch replaces
upperBoundOffsetwithhighWatermarkinCleaner.cleanSegments()/cleanInto(), update related Javadoc andcomments, and adds a test to verify that empty batches are not retained
when the highWatermark is beyond the cleaned range.
Reviewers: Jun Rao junrao@gmail.com, PoAn Yang payang@apache.org,
Chia-Ping Tsai chia7712@gmail.com