Skip to content

Fix potential NPE in PinotNumReplicaChanger.updateIdealState#18716

Closed
Akanksha-kedia wants to merge 1 commit into
apache:masterfrom
Akanksha-kedia:fix/pinot-num-replica-changer-npe
Closed

Fix potential NPE in PinotNumReplicaChanger.updateIdealState#18716
Akanksha-kedia wants to merge 1 commit into
apache:masterfrom
Akanksha-kedia:fix/pinot-num-replica-changer-npe

Conversation

@Akanksha-kedia

Copy link
Copy Markdown
Contributor

Description

IdealState.getInstanceStateMap(segmentId) can return null when a partition exists in the ideal state but has no instance assignments. The previous code called .size() directly on the result without a null check, which would throw a NullPointerException during replica count changes.

Fix

Add a null check on instanceStateMap and skip the segment if it is null — consistent with how this scenario is handled elsewhere in the codebase.

Tests

No functional change for the normal (non-null) path. The null case is a defensive guard for an edge condition during cluster state transitions.

Checklist

  • No new public APIs without documentation
  • Backward compatible change
  • Code follows existing patterns in the codebase

IdealState.getInstanceStateMap() can return null when a partition has no
instance assignments. Add a null check before accessing the map to avoid
NullPointerException during replica count changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

@Jackie-Jiang

@codecov-commenter

codecov-commenter commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.50%. Comparing base (4ff9dbe) to head (daa4be2).

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18716      +/-   ##
============================================
- Coverage     64.51%   64.50%   -0.01%     
  Complexity     1291     1291              
============================================
  Files          3372     3372              
  Lines        208638   208638              
  Branches      32596    32596              
============================================
- Hits         134604   134588      -16     
- Misses        63239    63252      +13     
- Partials      10795    10798       +3     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.50% <ø> (-0.01%) ⬇️
temurin 64.50% <ø> (-0.01%) ⬇️
unittests 64.50% <ø> (-0.01%) ⬇️
unittests1 56.91% <ø> (-0.01%) ⬇️
unittests2 37.14% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

@Jackie-Jiang @xiangfu0 could you please review this null-safety fix for PinotNumReplicaChanger.updateIdealState? IdealState.getInstanceStateMap() can return null for partitions with no instance assignments — adding a guard to skip such segments.

Set<String> segmentIds = idealState.getPartitionSet();
for (String segmentId : segmentIds) {
Map<String, String> instanceStateMap = idealState.getInstanceStateMap(segmentId);
if (instanceStateMap == null) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this could be null? or it will be an empty map?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. getInstanceStateMap(segmentId) delegates to ZNRecord.getMapField(segmentId) which simply does mapFields.get(segmentId). The value can be null in two cases:

  1. ZK data inconsistency: Helix's ZNRecord.setMapField(key, null) is a valid API call, so a segment can appear in the partition set (via getPartitionSet()mapFields.keySet()) while having a null value stored. This can occur due to partial ZK writes or data corruption.

  2. Non-callback dry-run path: When this method is invoked in dry-run mode (line 69), the currentIdealState is fetched directly via getResourceIdealState(). While Pinot itself never writes null maps, an external Helix operation or ZK inconsistency could produce one.

The guard is a defensive check to avoid a silent NPE at instanceStateMap.size(). I can add a LOGGER.warn on the null branch to make it visible if preferred — let me know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really handle corrupted ideal state. If that happens, we will encounter error everywhere.

One potential optimization here is to directly loop over idealState.getRecord().getMapFields() to avoid per entry map lookup

@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

@xiangfu0 @Jackie-Jiang all CI checks pass, review comments addressed. Please review when you get a chance.

@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

Hey @xiangfu0, would appreciate a review on this when you get a chance!

@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

Closing based on reviewer feedback. Thank you @xiangfu0 and @Jackie-Jiang for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants