Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-7687: Print batch level information in DumpLogSegments #5976

Merged
merged 5 commits into from
Dec 4, 2018

Conversation

huxihx
Copy link
Contributor

@huxihx huxihx commented Nov 30, 2018

DumpLogSegment should be able to print batch level information when deep-iteration is specified

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji hachikuji self-assigned this Nov 30, 2018
Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huxihx . Left some minor suggestions to consider.

@@ -289,7 +289,7 @@ object DumpLogSegments {
}
lastOffset = record.offset

print("offset: " + record.offset + " position: " + validBytes +
print("- offset: " + record.offset + " position: " + validBytes +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some redundant information that we can exclude if we are printing the batch info: position, timestamp type, magic version, compression codec. Below we can also exclude the producerId, epoch, and the transactional flag.

Also, what do you think about using | instead of -?

Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Just two more comments.

print(" producerId: " + batch.producerId + " producerEpoch: " + batch.producerEpoch + " sequence: " + record.sequence +
" isTransactional: " + batch.isTransactional +
" headerKeys: " + record.headers.map(_.key).mkString("[", ",", "]"))
print(" sequence: " + record.sequence + " headerKeys: " + record.headers.map(_.key).mkString("[", ",", "]"))
} else {
print(" crc: " + record.checksumOrNull)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old format still has individual record level checksums. I think it would make sense to add back " isvalid: " + record.isValid here.

" " + batch.timestampType + ": " + record.timestamp + " isvalid: " + record.isValid +
" keysize: " + record.keySize + " valuesize: " + record.valueSize + " magic: " + batch.magic +
" compresscodec: " + batch.compressionType)
print(s"$INDENT offset: ${record.offset} keysize: ${record.keySize} valuesize: ${record.valueSize}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, the timestamp is a per-record field, so can we add batch.timestampType + ": " + record.timestamp back to this string?

Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huxihx Thanks, LGTM. I pushed a minor change to only display isvalid when there is a corresponding checksum. If it looks good to you, I will go ahead and merge.

@huxihx
Copy link
Contributor Author

huxihx commented Dec 4, 2018

@hachikuji That's fine to me. Looks it's a more reasonable change.

@hachikuji hachikuji merged commit f65b1c4 into apache:trunk Dec 4, 2018
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…p iterating (apache#5976)

DumpLogSegments should print batch level information when deep-iteration is specified.

Reviewers: Jason Gustafson <jason@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants