Skip to content

[fix][bk] Fix NPE in IsolatedBookieEnsemblePlacementPolicy when policy class does not match#25825

Merged
dao-jun merged 2 commits into
apache:masterfrom
Shawyeok:hotfix/fix-npe-in-isolated-bookie-ensemble-placement-policy
May 27, 2026
Merged

[fix][bk] Fix NPE in IsolatedBookieEnsemblePlacementPolicy when policy class does not match#25825
dao-jun merged 2 commits into
apache:masterfrom
Shawyeok:hotfix/fix-npe-in-isolated-bookie-ensemble-placement-policy

Conversation

@Shawyeok
Copy link
Copy Markdown
Contributor

@Shawyeok Shawyeok commented May 19, 2026

Fixes NPE in IsolatedBookieEnsemblePlacementPolicy

Motivation

A NullPointerException occurs in getExcludedBookiesWithIsolationGroups when getIsolationGroup() returns a Pair whose left value is null. This happens when the EnsemblePlacementPolicyConfig carries a policy class name that does not equal IsolatedBookieEnsemblePlacementPolicy — in that case the if block that calls pair.setLeft() / pair.setRight() is skipped entirely, leaving both sides of the MutablePair as null.

This bug was observed during the upgrade from the 2.8 branch to the 3.0 branch. The same defect may also exist in master.

Full stack trace from production:

2026-05-18T13:28:40.756Z [BookKeeperClientWorker-OrderedExecutor-35-0] ERROR org.apache.bookkeeper.common.util.SingleThreadExecutor - Error while running task: Cannot invoke "java.util.Set.contains(Object)" because the return value of "org.apache.commons.lang3.tuple.Pair.getLeft()" is null
java.lang.NullPointerException: Cannot invoke "java.util.Set.contains(Object)" because the return value of "org.apache.commons.lang3.tuple.Pair.getLeft()" is null
    at org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.getExcludedBookiesWithIsolationGroups(IsolatedBookieEnsemblePlacementPolicy.java:192)
    at org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.getExcludedBookies(IsolatedBookieEnsemblePlacementPolicy.java:141)
    at org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.replaceBookie(IsolatedBookieEnsemblePlacementPolicy.java:127)
    at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:316)
    at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:69)
    at org.apache.bookkeeper.client.ReadOnlyLedgerHandle.handleBookieFailure(ReadOnlyLedgerHandle.java:222)
    at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:353)
    at org.apache.bookkeeper.proto.BookieClientImpl.completeAdd(BookieClientImpl.java:287)
    at org.apache.bookkeeper.proto.BookieClientImpl.access$200(BookieClientImpl.java:79)
    at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.lambda$operationComplete$0(BookieClientImpl.java:405)
    at org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203)
    at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
    at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:840)

This error causes topics to time out during loading. Since there is no automatic recovery path, the only way to restore service is by restarting the broker.

Modifications

Two complementary fixes:

  1. getIsolationGroup(): Ensure not returning pair of null: add compatibility with ZkIsolatedBookieEnsemblePlacementPolicy and fallback to defaultIsolationGroups if met a mis-match policy class.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added testReplaceBookieWithNonMatchingPolicyClassShouldNotThrowNPE to reproduce the exact production failure: calling replaceBookie with custom metadata whose EnsemblePlacementPolicyConfig policy class does not match IsolatedBookieEnsemblePlacementPolicy, which previously caused the NPE.

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Copy link
Copy Markdown
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…y class does not match

MutablePair was initialized without default values, leaving left/right as null
when the ensemble placement policy class name did not match. This caused a
NullPointerException when getLeft().contains() was called.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Shawyeok Shawyeok force-pushed the hotfix/fix-npe-in-isolated-bookie-ensemble-placement-policy branch from eb5e766 to 35f3158 Compare May 22, 2026 06:55
Copy link
Copy Markdown
Contributor

@void-ptr974 void-ptr974 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing the fallback behavior.

@Shawyeok
Copy link
Copy Markdown
Contributor Author

/pulsarbot rerun-failure-checks

@Shawyeok
Copy link
Copy Markdown
Contributor Author

@lhotari @dao-jun
All tests have passed, feel free to take a look, thanks.

@dao-jun dao-jun merged commit b93fe9e into apache:master May 27, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants