-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Fix rackaware placement policy init error #12097
Conversation
Since the release of Pulsar 2.8 and upgrade to BK 4.12, the default rackAwarePlacementPolicy has been failing with the following exception: ``` org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException ``` This regression occured in commit 4c60262 The core of the issue is that `setConf` is called before `setBookieAddressResolver` has been set (see https://github.com/apache/bookkeeper/blob/034ef8566ad037937a4d58a28f70631175744f53/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L264-L286) This results in the NPE. We are safe to simply not eagerly init the cache in setConf as the getRack call will re-check the cache. We also protect against this possible NPE for safety
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work Addison.
I wonder why we didn't catch this problem with tests.
Would you mind adding a reproducer test case for the problem?
If we cannot go with a unit test we can make a smoke integration test that simply configures rackaware bookie placement. No need to fully verify the behaviour but just that Pulsar does not break
} | ||
BookieAddressResolver addressResolver = getBookieAddressResolver(); | ||
if (addressResolver == null) { | ||
LOG.warn("Bookie address resolver not yet initialized, skipping resolution"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this case happen?
Or is this only a precautionary null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
precautionary, I figure if it can return null we should be defensive
@eolivelli I believe the test set the DNS resolver manually before the test. Hence it doesn't catch the real usage. |
My idea is to add an integration test that reproduces the behaviour of a sys admin that configures this feature. It looks like we do not have such kind of integration tests. @addisonj do you think it is worth do add it ? btw please answer to my comment about the null check, and I am happy with this patch |
@eolivelli I do think it is worth the test, but I would suggest we make a follow up item to do that. While we could do a smoke test and see the logs, the log is pretty subtle and easy to miss... I would instead suggest we just put the time in to do a proper test that validates placement by querying metadata. However, I don't want to hold up this fix for that test as this is quite a big breakage |
By the way I don't want to block this patch, I hope you will find time to follow up. |
Since the release of Pulsar 2.8 and upgrade to BK 4.12, the default rackAwarePlacementPolicy has been failing with the following exception: ``` org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException ``` This regression occured in commit 4c60262 The core of the issue is that `setConf` is called before `setBookieAddressResolver` has been set (see https://github.com/apache/bookkeeper/blob/034ef8566ad037937a4d58a28f70631175744f53/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L264-L286) This results in the NPE. We are safe to simply not eagerly init the cache in setConf as the getRack call will re-check the cache. We also protect against this possible NPE for safety
Since the release of Pulsar 2.8 and upgrade to BK 4.12, the default rackAwarePlacementPolicy has been failing with the following exception: ``` org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException ``` This regression occured in commit apache@4c60262 The core of the issue is that `setConf` is called before `setBookieAddressResolver` has been set (see https://github.com/apache/bookkeeper/blob/034ef8566ad037937a4d58a28f70631175744f53/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L264-L286) This results in the NPE. We are safe to simply not eagerly init the cache in setConf as the getRack call will re-check the cache. We also protect against this possible NPE for safety (cherry picked from commit 9975fe4)
Since the release of Pulsar 2.8 and upgrade to BK 4.12, the default rackAwarePlacementPolicy has been failing with the following exception: ``` org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException ``` This regression occured in commit apache@4c60262 The core of the issue is that `setConf` is called before `setBookieAddressResolver` has been set (see https://github.com/apache/bookkeeper/blob/034ef8566ad037937a4d58a28f70631175744f53/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L264-L286) This results in the NPE. We are safe to simply not eagerly init the cache in setConf as the getRack call will re-check the cache. We also protect against this possible NPE for safety
Since the release of Pulsar 2.8 and upgrade to BK 4.12, the default
rackAwarePlacementPolicy has been failing with the following exception:
This regression occured in commit 4c60262
The core of the issue is that
setConf
is called beforesetBookieAddressResolver
has been set (seehttps://github.com/apache/bookkeeper/blob/034ef8566ad037937a4d58a28f70631175744f53/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L264-L286)
This results in the NPE.
We are safe to simply not eagerly init the cache in setConf as the
getRack call will re-check the cache.
We also protect against this possible NPE for safety
This is difficult to test directly but we should have an integration test to validate that PlacementPolicy is working as expected.
CC @eolivelli