Fix potential NPE in PinotNumReplicaChanger.updateIdealState#18716
Fix potential NPE in PinotNumReplicaChanger.updateIdealState#18716Akanksha-kedia wants to merge 1 commit into
Conversation
IdealState.getInstanceStateMap() can return null when a partition has no instance assignments. Add a null check before accessing the map to avoid NullPointerException during replica count changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18716 +/- ##
============================================
- Coverage 64.51% 64.50% -0.01%
Complexity 1291 1291
============================================
Files 3372 3372
Lines 208638 208638
Branches 32596 32596
============================================
- Hits 134604 134588 -16
- Misses 63239 63252 +13
- Partials 10795 10798 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@Jackie-Jiang @xiangfu0 could you please review this null-safety fix for |
| Set<String> segmentIds = idealState.getPartitionSet(); | ||
| for (String segmentId : segmentIds) { | ||
| Map<String, String> instanceStateMap = idealState.getInstanceStateMap(segmentId); | ||
| if (instanceStateMap == null) { |
There was a problem hiding this comment.
when this could be null? or it will be an empty map?
There was a problem hiding this comment.
Good question. getInstanceStateMap(segmentId) delegates to ZNRecord.getMapField(segmentId) which simply does mapFields.get(segmentId). The value can be null in two cases:
-
ZK data inconsistency: Helix's
ZNRecord.setMapField(key, null)is a valid API call, so a segment can appear in the partition set (viagetPartitionSet()→mapFields.keySet()) while having a null value stored. This can occur due to partial ZK writes or data corruption. -
Non-callback dry-run path: When this method is invoked in dry-run mode (line 69), the
currentIdealStateis fetched directly viagetResourceIdealState(). While Pinot itself never writes null maps, an external Helix operation or ZK inconsistency could produce one.
The guard is a defensive check to avoid a silent NPE at instanceStateMap.size(). I can add a LOGGER.warn on the null branch to make it visible if preferred — let me know.
There was a problem hiding this comment.
We don't really handle corrupted ideal state. If that happens, we will encounter error everywhere.
One potential optimization here is to directly loop over idealState.getRecord().getMapFields() to avoid per entry map lookup
|
@xiangfu0 @Jackie-Jiang all CI checks pass, review comments addressed. Please review when you get a chance. |
|
Hey @xiangfu0, would appreciate a review on this when you get a chance! |
|
Closing based on reviewer feedback. Thank you @xiangfu0 and @Jackie-Jiang for the review! |
Description
IdealState.getInstanceStateMap(segmentId)can returnnullwhen a partition exists in the ideal state but has no instance assignments. The previous code called.size()directly on the result without a null check, which would throw aNullPointerExceptionduring replica count changes.Fix
Add a null check on
instanceStateMapand skip the segment if it is null — consistent with how this scenario is handled elsewhere in the codebase.Tests
No functional change for the normal (non-null) path. The null case is a defensive guard for an edge condition during cluster state transitions.
Checklist