Skip to content

HDFS-17590. NullPointerException in createBlockReader during retry iteration#8328

Open
balodesecurity wants to merge 4 commits intoapache:trunkfrom
balodesecurity:HDFS-17590
Open

HDFS-17590. NullPointerException in createBlockReader during retry iteration#8328
balodesecurity wants to merge 4 commits intoapache:trunkfrom
balodesecurity:HDFS-17590

Conversation

@balodesecurity
Copy link

Summary

DFSStripedInputStream.createBlockReader() initialises dnInfo to a sentinel DNAddrPair(null, null, null, null) before entering the retry loop. If refreshLocatedBlock() throws an IOException (e.g. the block's start offset is out of range after a file truncation or stale cache) before getBestNodeDNAddrPair() is called, the catch block tries:

addToLocalDeadNodes(dnInfo.info);   // dnInfo.info is still null

addToLocalDeadNodes calls deadNodes.put(null, null), and ConcurrentHashMap does not permit null keys, so a NullPointerException is thrown — masking the original IOException.

Fix: add a null guard at the top of DFSInputStream.addToLocalDeadNodes():

protected void addToLocalDeadNodes(DatanodeInfo dnInfo) {
  if (dnInfo == null) {
    return;
  }
  ...
}

This is the safest fix location because it protects all callers, not just the one in createBlockReader.

Test plan

  • New test TestDFSStripedInputStream#testAddNullToLocalDeadNodesIsIgnored creates a striped file, opens a DFSStripedInputStream, calls addToLocalDeadNodes(null), and asserts: no exception thrown and dead-nodes map remains empty.
  • Test passes with MiniDFSCluster (EC RS-6-3 policy).

amitbalode and others added 3 commits March 9, 2026 06:13
…geType stats.

The StorageType stats map maintained a nodesInService counter using
increments/decrements (via StorageTypeStats.addNode / subtractNode).
When nodesInService dropped to 0, the entry for that storage type was
removed from the map — even when decommissioning nodes still used the
storage type and still contributed capacity data.

When the entry was later recreated by an addStorage call, it started
fresh with nodesInService = 0.  Subsequent in-service node heartbeats
then performed subtract (no-op, entry was gone) followed by add (creates
entry, nodesInService = 1), which was correct.  But any in-service node
whose subtract ran against the freshly-created entry saw nodesInService
decrement past 0 to -1, and then add brought it back to 0 — so that
node's in-service contribution was lost for the rest of the session.

Fix: add a totalNodes counter to StorageTypeStats that tracks ALL nodes
using a storage type (in-service + decommissioning + maintenance).
Change the map-entry removal condition from nodesInService == 0 to
totalNodes == 0.  An entry is now removed only when no node of any
admin state still uses that storage type, preventing the premature
removal that caused the count corruption.

Added TestStorageTypeStatsMap with 4 unit tests covering:
- Basic add/remove correctness
- Entry survival when a decommissioning node still uses the storage type
- nodesInService stability after the last in-service node decommissions
- Entry removal only when all nodes (including decommissioning) are gone

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput.

Replace PrintStream with Writer in PBImageXmlWriter and
FileDistributionCalculator. Writer.write() propagates IOException
immediately, while PrintStream.print/println() silently swallows errors.

OfflineImageViewerPB wraps the output PrintStream in an OutputStreamWriter
(with explicit flush after visit) to bridge the two APIs. All test callers
of PBImageXmlWriter and FileDistributionCalculator are updated to pass
an OutputStreamWriter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eration.

When DFSStripedInputStream.createBlockReader() calls refreshLocatedBlock()
and that call throws IOException before getBestNodeDNAddrPair() runs,
the dnInfo variable is still in its initial state (DNAddrPair with all-null
fields). The catch block then calls addToLocalDeadNodes(dnInfo.info) with
a null argument, which triggers an NPE inside ConcurrentHashMap.put().

Fix: add a null guard at the top of DFSInputStream.addToLocalDeadNodes()
so that a null dnInfo is silently ignored, preventing both the NPE and
the masking of the original IOException.

Test: testAddNullToLocalDeadNodesIsIgnored verifies that passing null to
addToLocalDeadNodes() leaves the dead-nodes map empty without throwing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 41s Maven dependency ordering for branch
+1 💚 mvninstall 28m 41s trunk passed
+1 💚 compile 3m 35s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 3m 51s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 15s trunk passed
+1 💚 mvnsite 1m 45s trunk passed
+1 💚 javadoc 1m 16s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 19s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 4m 30s trunk passed
+1 💚 shadedclient 19m 24s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 22s Maven dependency ordering for patch
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 3m 28s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 3m 28s the patch passed
+1 💚 compile 3m 41s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 3m 41s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 3s /results-checkstyle-hadoop-hdfs-project.txt hadoop-hdfs-project: The patch generated 2 new + 153 unchanged - 0 fixed = 155 total (was 153)
+1 💚 mvnsite 1m 28s the patch passed
+1 💚 javadoc 0m 56s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 3s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 4m 37s the patch passed
+1 💚 shadedclient 19m 55s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 50s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 173m 20s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch failed.
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
280m 36s
Reason Tests
Failed junit tests hadoop.hdfs.TestEncryptionZones
hadoop.hdfs.TestEncryptionZonesWithKMS
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8328/1/artifact/out/Dockerfile
GITHUB PR #8328
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 317c1d5b68c1 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a14d8f5
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8328/1/testReport/
Max. process+thread count 4224 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8328/1/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants