Skip to content

HDFS-17803. Compute correct checksum type when file is empty.#8322

Open
balodesecurity wants to merge 1 commit intoapache:trunkfrom
balodesecurity:HDFS-17803
Open

HDFS-17803. Compute correct checksum type when file is empty.#8322
balodesecurity wants to merge 1 commit intoapache:trunkfrom
balodesecurity:HDFS-17803

Conversation

@balodesecurity
Copy link

Summary

DistributedFileSystem.getFileChecksum() on an empty (zero-byte) file always returned an MD5MD5CRC32GzipFileChecksum value, even when the client was configured with dfs.checksum.combine.mode=COMPOSITE_CRC. This was misleading and inconsistent with what non-empty files return in the same mode.

Root Cause

In FileChecksumHelper.FileChecksumComputer.compute(), the empty-file fast path unconditionally constructed an MD5MD5CRC32GzipFileChecksum without checking combineMode.

Fix

Check combineMode in the empty-file path:

  • COMPOSITE_CRC: return CompositeCrcFileChecksum(crc=0, configuredChecksumType, configuredBytesPerCrc) using the client's configured ChecksumOpt.
  • MD5MD5CRC: retain the existing backward-compatible magic value (MD5MD5CRC32GzipFileChecksum(0, 0, md5OfZeros)) to avoid breaking existing tools.

Changes

  • FileChecksumHelper.java: split the empty-file branch on combineMode.
  • TestGetFileChecksum.java: add testEmptyFileChecksumType verifying both modes return the expected FileChecksum subclass for a zero-byte file.

Test plan

  • TestGetFileChecksum#testEmptyFileChecksumType passes locally ✅
  • Full module build passes (mvn package ... -DskipTests) ✅

getFileChecksum() on an empty file always returned an MD5MD5CRC32Gzip
checksum regardless of the configured dfs.checksum.combine.mode.
When COMPOSITE_CRC is configured the returned type should be
CompositeCrcFileChecksum.

Fix: in FileChecksumHelper.FileChecksumComputer.compute(), when
locatedBlocks is null/empty check combineMode first:
- COMPOSITE_CRC  → return CompositeCrcFileChecksum(crc=0, configuredType,
                   configuredBytesPerCrc)
- MD5MD5CRC      → keep the existing backward-compatible magic value

Test: TestGetFileChecksum#testEmptyFileChecksumType verifies both modes
return the expected FileChecksum subclass for a zero-byte file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 19m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 0s Maven dependency ordering for branch
+1 💚 mvninstall 55m 7s trunk passed
+1 💚 compile 6m 18s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 6m 35s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 2m 10s trunk passed
+1 💚 mvnsite 3m 12s trunk passed
+1 💚 javadoc 2m 28s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 2m 27s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 8m 24s trunk passed
+1 💚 shadedclient 37m 23s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 2m 26s the patch passed
+1 💚 compile 5m 33s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 5m 33s the patch passed
+1 💚 compile 6m 20s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 6m 20s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 50s the patch passed
+1 💚 mvnsite 2m 34s the patch passed
+1 💚 javadoc 1m 37s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 42s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 7m 37s the patch passed
+1 💚 shadedclient 36m 23s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 37s hadoop-hdfs-client in the patch passed.
+1 💚 unit 259m 12s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
473m 42s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8322/1/artifact/out/Dockerfile
GITHUB PR #8322
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d4225b18797a 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 643ac96
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8322/1/testReport/
Max. process+thread count 2363 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8322/1/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@balodesecurity
Copy link
Author

@adoroszlai CI is passing — would you mind taking a look when you get a chance? Thanks!

@adoroszlai adoroszlai requested a review from jojochuang March 19, 2026 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants