Skip to content

Conversation

@brandboat
Copy link
Member

@brandboat brandboat commented Oct 21, 2025

When a duplicate batch is detected, the entire MemoryRecords instance is
skipped to prevent appending duplicate data to the log. This operation
is silent, adding a log.info message here to provide better
observability.

Reviewers: Chia-Ping Tsai chia7712@gmail.com

@github-actions github-actions bot added storage Pull requests that target the storage module small Small PRs labels Oct 21, 2025

Optional<BatchMetadata> duplicateBatch = maybeLastEntry.flatMap(e -> e.findDuplicateBatch(batch));
if (duplicateBatch.isPresent()) {
logger.info("Found duplicate batch from client, duplicateBatchMetadata={}", duplicateBatch.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should move the log statement to UnifiedLog#append. Since another branch already has the log logger.trace("Appended message set with ..., it would be straightforward to add a similar log for duplicate batches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712, thanks for the suggestion! Indeed, move the log.info to append method seems better.

appendInfo.setLastOffset(duplicate.lastOffset());
appendInfo.setLogAppendTime(duplicate.timestamp());
appendInfo.setLogStartOffset(logStartOffset);
logger.info("Duplicate batch detected, returning AppendInfo from duplicate batch with last offset: {}, first offset: {}, next offset: {}, skipped messages: {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about the log flooding if the producer has network problems. Wouldn't the trace level be adequate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m good with either info or trace. As long as we leave a debug message behind, that’s fine 😃

@brandboat brandboat changed the title MINOR: add duplicate batch detection logging KAFKA-19821: Duplicated batches should be logged Oct 22, 2025
@chia7712 chia7712 merged commit 8c794fc into apache:trunk Oct 22, 2025
24 checks passed
@brandboat brandboat deleted the minor-add-log-duplicate-batch branch October 22, 2025 15:08
joshua2519 pushed a commit to joshua2519/kafka that referenced this pull request Oct 27, 2025
When a duplicate batch is detected, the entire MemoryRecords instance is
skipped to prevent appending duplicate data to the log. This operation
is silent, adding a log.info message here to provide better
observability.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
eduwercamacaro pushed a commit to littlehorse-enterprises/kafka that referenced this pull request Nov 12, 2025
When a duplicate batch is detected, the entire MemoryRecords instance is
skipped to prevent appending duplicate data to the log. This operation
is silent, adding a log.info message here to provide better
observability.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

small Small PRs storage Pull requests that target the storage module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants