Fix NPE in mkAssignments when assignment is deleted during scheduling by jnioche · Pull Request #8441 · apache/storm

jnioche · 2026-03-28T06:40:07Z

Summary

Fix a NullPointerException in Nimbus.mkAssignments() caused by a TOCTOU race
against ZooKeeper: state.assignmentInfo() can return null when an assignment is
deleted between the state.assignments() listing (line 2510) and the per-topology
read (line 2517)

Bug

In mkAssignments(), the code iterates over assigned topology IDs and fetches each
assignment from ZooKeeper:

Assignment currentAssignment = state.assignmentInfo(id, null);  // can return null
if (!currentAssignment.is_set_owner()) {                         // NPE

assignmentInfo() returns null when the assignment znode no longer exists. This
happens when a topology is killed or its assignment is cleaned up between the two
ZooKeeper reads — a classic TOCTOU (time-of-check-to-time-of-use) race condition.

The same method already handles this correctly elsewhere in Nimbus.java (line 3125):

Assignment assignment = state.assignmentInfo(topoId, null);                                                                                                                                                                                 
if (assignment != null && assignment.is_set_executor_node_port()) { ... }

Impact

mkAssignments runs on a recurring timer as part of Nimbus's scheduling loop. When
this NPE fires:

The entire scheduling round fails — no topology in the cluster gets new or
updated assignments for that cycle
The error is persistent — if the deleted topology ID remains in the
state.assignments() listing (e.g., due to a slow ZooKeeper cleanup), every
scheduling round crashes until it disappears
All topologies are starved — any topology needing re-assignment (new workers,
rebalance, failed workers) is blocked, not just the one whose assignment was deleted

This makes Nimbus scheduling fragile under topology churn (rapid submit/kill cycles)
or ZooKeeper latency spikes.

Fix

Add a null guard consistent with the existing pattern at line 3125:

       Assignment currentAssignment = state.assignmentInfo(id, null);                                                                                                                                                                         
  -    if (!currentAssignment.is_set_owner()) {             
  +    if (currentAssignment != null && !currentAssignment.is_set_owner()) {

When assignmentInfo() returns null, the null flows into the existingAssignments
map. All four downstream consumers in lockingMkAssignments already handle null
values from this map (lines 2566, 2576, 2581, 2663).

Test plan

Verify the fix compiles: mvn compile -pl storm-server
Review the four existingAssignments.get() call sites in
lockingMkAssignments to confirm they handle null (they do — see lines 2566,
2576, 2581, 2663)
Confirm the fix matches the existing pattern at line 3125

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix NPE in mkAssignments when assignment is deleted during scheduling

28d400a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jnioche added this to the 2.8.6 milestone Mar 28, 2026

jnioche added the bug label Mar 28, 2026

rzo1 approved these changes Mar 28, 2026

View reviewed changes

rzo1 merged commit 9961d32 into master Mar 28, 2026
12 checks passed

jnioche deleted the fix/nimbus-mkassignments-npe branch March 30, 2026 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NPE in mkAssignments when assignment is deleted during scheduling#8441

Fix NPE in mkAssignments when assignment is deleted during scheduling#8441
rzo1 merged 1 commit intomasterfrom
fix/nimbus-mkassignments-npe

jnioche commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jnioche commented Mar 28, 2026

Summary

Bug

Impact

Fix

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants