HDFS-17590. NullPointerException in createBlockReader during retry iteration#8328
Open
balodesecurity wants to merge 4 commits intoapache:trunkfrom
Open
HDFS-17590. NullPointerException in createBlockReader during retry iteration#8328balodesecurity wants to merge 4 commits intoapache:trunkfrom
balodesecurity wants to merge 4 commits intoapache:trunkfrom
Conversation
…geType stats. The StorageType stats map maintained a nodesInService counter using increments/decrements (via StorageTypeStats.addNode / subtractNode). When nodesInService dropped to 0, the entry for that storage type was removed from the map — even when decommissioning nodes still used the storage type and still contributed capacity data. When the entry was later recreated by an addStorage call, it started fresh with nodesInService = 0. Subsequent in-service node heartbeats then performed subtract (no-op, entry was gone) followed by add (creates entry, nodesInService = 1), which was correct. But any in-service node whose subtract ran against the freshly-created entry saw nodesInService decrement past 0 to -1, and then add brought it back to 0 — so that node's in-service contribution was lost for the rest of the session. Fix: add a totalNodes counter to StorageTypeStats that tracks ALL nodes using a storage type (in-service + decommissioning + maintenance). Change the map-entry removal condition from nodesInService == 0 to totalNodes == 0. An entry is now removed only when no node of any admin state still uses that storage type, preventing the premature removal that caused the count corruption. Added TestStorageTypeStatsMap with 4 unit tests covering: - Basic add/remove correctness - Entry survival when a decommissioning node still uses the storage type - nodesInService stability after the last in-service node decommissions - Entry removal only when all nodes (including decommissioning) are gone Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput. Replace PrintStream with Writer in PBImageXmlWriter and FileDistributionCalculator. Writer.write() propagates IOException immediately, while PrintStream.print/println() silently swallows errors. OfflineImageViewerPB wraps the output PrintStream in an OutputStreamWriter (with explicit flush after visit) to bridge the two APIs. All test callers of PBImageXmlWriter and FileDistributionCalculator are updated to pass an OutputStreamWriter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eration. When DFSStripedInputStream.createBlockReader() calls refreshLocatedBlock() and that call throws IOException before getBestNodeDNAddrPair() runs, the dnInfo variable is still in its initial state (DNAddrPair with all-null fields). The catch block then calls addToLocalDeadNodes(dnInfo.info) with a null argument, which triggers an NPE inside ConcurrentHashMap.put(). Fix: add a null guard at the top of DFSInputStream.addToLocalDeadNodes() so that a null dnInfo is silently ignored, preventing both the NPE and the masking of the original IOException. Test: testAddNullToLocalDeadNodesIsIgnored verifies that passing null to addToLocalDeadNodes() leaves the dead-nodes map empty without throwing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
💔 -1 overall
This message was automatically generated. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DFSStripedInputStream.createBlockReader()initialisesdnInfoto a sentinelDNAddrPair(null, null, null, null)before entering the retry loop. IfrefreshLocatedBlock()throws anIOException(e.g. the block's start offset is out of range after a file truncation or stale cache) beforegetBestNodeDNAddrPair()is called, the catch block tries:addToLocalDeadNodescallsdeadNodes.put(null, null), andConcurrentHashMapdoes not permit null keys, so aNullPointerExceptionis thrown — masking the originalIOException.Fix: add a null guard at the top of
DFSInputStream.addToLocalDeadNodes():This is the safest fix location because it protects all callers, not just the one in
createBlockReader.Test plan
TestDFSStripedInputStream#testAddNullToLocalDeadNodesIsIgnoredcreates a striped file, opens aDFSStripedInputStream, callsaddToLocalDeadNodes(null), and asserts: no exception thrown and dead-nodes map remains empty.