HDFS-17803. Compute correct checksum type when file is empty.#8322
Open
balodesecurity wants to merge 1 commit intoapache:trunkfrom
Open
HDFS-17803. Compute correct checksum type when file is empty.#8322balodesecurity wants to merge 1 commit intoapache:trunkfrom
balodesecurity wants to merge 1 commit intoapache:trunkfrom
Conversation
getFileChecksum() on an empty file always returned an MD5MD5CRC32Gzip
checksum regardless of the configured dfs.checksum.combine.mode.
When COMPOSITE_CRC is configured the returned type should be
CompositeCrcFileChecksum.
Fix: in FileChecksumHelper.FileChecksumComputer.compute(), when
locatedBlocks is null/empty check combineMode first:
- COMPOSITE_CRC → return CompositeCrcFileChecksum(crc=0, configuredType,
configuredBytesPerCrc)
- MD5MD5CRC → keep the existing backward-compatible magic value
Test: TestGetFileChecksum#testEmptyFileChecksumType verifies both modes
return the expected FileChecksum subclass for a zero-byte file.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🎊 +1 overall
This message was automatically generated. |
Author
|
@adoroszlai CI is passing — would you mind taking a look when you get a chance? Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DistributedFileSystem.getFileChecksum()on an empty (zero-byte) file always returned anMD5MD5CRC32GzipFileChecksumvalue, even when the client was configured withdfs.checksum.combine.mode=COMPOSITE_CRC. This was misleading and inconsistent with what non-empty files return in the same mode.Root Cause
In
FileChecksumHelper.FileChecksumComputer.compute(), the empty-file fast path unconditionally constructed anMD5MD5CRC32GzipFileChecksumwithout checkingcombineMode.Fix
Check
combineModein the empty-file path:COMPOSITE_CRC: returnCompositeCrcFileChecksum(crc=0, configuredChecksumType, configuredBytesPerCrc)using the client's configuredChecksumOpt.MD5MD5CRC: retain the existing backward-compatible magic value (MD5MD5CRC32GzipFileChecksum(0, 0, md5OfZeros)) to avoid breaking existing tools.Changes
FileChecksumHelper.java: split the empty-file branch oncombineMode.TestGetFileChecksum.java: addtestEmptyFileChecksumTypeverifying both modes return the expectedFileChecksumsubclass for a zero-byte file.Test plan
TestGetFileChecksum#testEmptyFileChecksumTypepasses locally ✅mvn package ... -DskipTests) ✅