Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] PIP-192: Write the child ownership to ServiceUnitStateChannel instead of ZK when handling bundle split #18858

Merged

Conversation

Demogorgon314
Copy link
Member

@Demogorgon314 Demogorgon314 commented Dec 9, 2022

Master Issue: #16691

Motivation

When the channel handles the split event, we should write the child ownership to ServiceUnitStateChannel instead of zookeeper, since we are using the ServiceUnitStateChannel to store the bundle ownership.

Modifications

  • Write the child ownership to ServiceUnitStateChannel instead of ZK when handling bundle split.
  • Retry when split bundle has BadVersion exception.
  • Add unit test to cover this change.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: Demogorgon314#8

@Demogorgon314 Demogorgon314 self-assigned this Dec 9, 2022
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 9, 2022
@Demogorgon314 Demogorgon314 changed the title [improve][broker] Write the child ownership to ServiceUnitStateChannel instead of ZK when handling bundle split [improve][broker] PIP-192: Write the child ownership to ServiceUnitStateChannel instead of ZK when handling bundle split Dec 18, 2022
@codelipenghui codelipenghui added this to the 2.12.0 milestone Dec 30, 2022
@codelipenghui codelipenghui added type/feature The PR added a new feature or issue requested a new feature type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages and removed type/feature The PR added a new feature or issue requested a new feature labels Dec 30, 2022
@Demogorgon314 Demogorgon314 force-pushed the Demogorgon314/PIP-192-improve-split branch from 369c5d4 to e97ca14 Compare January 4, 2023 03:24
@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2023

Codecov Report

Merging #18858 (e97ca14) into master (9ec1d07) will decrease coverage by 0.24%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #18858      +/-   ##
============================================
- Coverage     47.43%   47.19%   -0.25%     
- Complexity     9535    10685    +1150     
============================================
  Files           632      712      +80     
  Lines         59839    69722    +9883     
  Branches       6234     7485    +1251     
============================================
+ Hits          28384    32904    +4520     
- Misses        28375    33098    +4723     
- Partials       3080     3720     +640     
Flag Coverage Δ
unittests 47.19% <0.00%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...xtensions/channel/ServiceUnitStateChannelImpl.java 0.00% <0.00%> (ø)
...ache/pulsar/broker/namespace/NamespaceService.java 56.38% <0.00%> (-1.84%) ⬇️
.../pulsar/broker/service/SharedConsumerAssignor.java 3.70% <0.00%> (-64.82%) ⬇️
...apache/pulsar/broker/service/EntryAndMetadata.java 0.00% <0.00%> (-40.75%) ⬇️
...e/pulsar/broker/service/EntryBatchIndexesAcks.java 82.14% <0.00%> (-10.72%) ⬇️
...ersistentStickyKeyDispatcherMultipleConsumers.java 51.96% <0.00%> (-8.34%) ⬇️
...sistent/PersistentDispatcherMultipleConsumers.java 51.96% <0.00%> (-7.18%) ⬇️
...r/service/AbstractDispatcherMultipleConsumers.java 66.12% <0.00%> (-4.84%) ⬇️
...oker/service/schema/SchemaRegistryServiceImpl.java 62.13% <0.00%> (-4.05%) ⬇️
...e/HashRangeExclusiveStickyKeyConsumerSelector.java 36.92% <0.00%> (-3.08%) ⬇️
... and 120 more

@codelipenghui
Copy link
Contributor

When the channel handles the split event, we should write the split bundle to zookeeper instead of ServiceUnitStateChannel,

@Demogorgon314 I think it's incorrect in the PR's description, right?

@Demogorgon314
Copy link
Member Author

I think it's incorrect in the PR's description, right?

@codelipenghui Thanks for reminding me, I have fixed it.

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think the right solution is to improve the NamespaceService to decide to write to Zookeeper or service unit channel? We also have Admin API to split the bundle, it will be a problem if the load manager write the bundle split to service unit state channel, but the Admin API can't write to the service unit channel?

@Demogorgon314
Copy link
Member Author

Hmmm, I think the right solution is to improve the NamespaceService to decide to write to Zookeeper or service unit channel? We also have Admin API to split the bundle, it will be a problem if the load manager write the bundle split to service unit state channel, but the Admin API can't write to the service unit channel?

Yes, then we need to wait for this PR merge first. #19102

@Demogorgon314 Demogorgon314 force-pushed the Demogorgon314/PIP-192-improve-split branch from e97ca14 to 568a39b Compare January 31, 2023 09:27
# Conflicts:
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelTest.java
@Demogorgon314 Demogorgon314 marked this pull request as draft February 6, 2023 09:25
# Conflicts:
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelTest.java
@Demogorgon314
Copy link
Member Author

Hmmm, I think the right solution is to improve the NamespaceService to decide to write to Zookeeper or service unit channel? We also have Admin API to split the bundle, it will be a problem if the load manager write the bundle split to service unit state channel, but the Admin API can't write to the service unit channel?

@codelipenghui I've had a second thought, Actually, we should keep this method in ServiceUnitStateChannel . The split operation should call by channel when receiving the Splitting state, like the unload operation. We can support the admin API later in other PRs.

@Demogorgon314 Demogorgon314 marked this pull request as ready for review February 7, 2023 07:10
@Demogorgon314 Demogorgon314 reopened this Feb 7, 2023
bundleFactory.invalidateBundleCache(bundle.getNamespaceObject());
updateFuture.complete(splitBundles);
}).exceptionally(e -> {
// Clean the new bundle when has exception.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about logging the error here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the updateFuture.completeExceptionally(e); is called, the log will print in this place. https://github.com/apache/pulsar/pull/18858/files#diff-191ed1df4f5804c8fd6cdaf909cca01d071a110c3e701bdc98352f99e7a92e8eR534

FutureUtil.waitForAll(futureList)
.whenComplete((__, ex) -> {
if (ex != null) {
log.warn("Clean new bundles failed,", ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the consequences of this failure ?

can you please point me out to where we handle this inconsistent state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two cases to entry this failure.

  1. When pubAsync has an exception.
  2. When updateNamespaceBundles, updateNamespaceBundlesForPolicies has exception.

In these two cases, we won’t write the new bundle to ZK, so when the client tries to lookup, it will still find the old bundle, but some of the new bundle will be left in table-view, and when the broker is down, the unused bundle will be deleted in table-view.

&& (counter.incrementAndGet() < NamespaceService.BUNDLE_SPLIT_RETRY_LIMIT)) {
pulsar.getExecutor().schedule(() -> splitServiceUnitOnceAndRetry(namespaceService, bundleFactory,
bundle, serviceUnit, data, counter, startTime, completionFuture), 100, MILLISECONDS);
} else if (ex instanceof IllegalArgumentException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are doing the test on "ex" here and on "ex.getCause()" above ?

and why this case is so special ? we should at least log that we are passing from this branch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It should use FutureUtil.unwrapCompletionException(ex) here.

# Conflicts:
#	pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@eolivelli eolivelli merged commit 950ff44 into apache:master Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants