Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Fix isolated group not work problem. #21096

Merged

Conversation

horizonzy
Copy link
Member

@horizonzy horizonzy commented Aug 30, 2023

Modifications

When upgraded the pulsar version from 2.9.2 to 2.10.3, and the isolated group feature not work anymore.
Finally, we found the problem. In IsolatedBookieEnsemblePlacementPolicy, when it gets the bookie rack from the metadata store cache, uses future.isDone() to avoid sync operation. If the future is incomplete, return empty blacklists.
The cache may expire due to the caffeine cache getExpireAfterWriteMillis config, if the cache expires, the future may be incomplete. (#21095 will correct the behavior)

In 2.9.2, it uses the sync to get data from the metadata store, we should also keep the behavior.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to protect this logic.

@hangc0276 hangc0276 added type/bug The PR fixed a bug or issue reported a bug area/broker ready-to-test category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost release/3.0.2 release/2.11.3 release/3.1.1 labels Aug 30, 2023
@hangc0276 hangc0276 added this to the 3.2.0 milestone Aug 30, 2023
@codecov-commenter
Copy link

codecov-commenter commented Aug 31, 2023

Codecov Report

Merging #21096 (c033ac0) into master (b26ee8a) will increase coverage by 39.72%.
The diff coverage is 64.28%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21096       +/-   ##
=============================================
+ Coverage     33.46%   73.18%   +39.72%     
- Complexity    12179    32436    +20257     
=============================================
  Files          1623     1887      +264     
  Lines        127399   139985    +12586     
  Branches      13929    15413     +1484     
=============================================
+ Hits          42631   102449    +59818     
+ Misses        79158    29440    -49718     
- Partials       5610     8096     +2486     
Flag Coverage Δ
inttests 24.14% <0.00%> (+0.06%) ⬆️
systests 25.11% <0.00%> (?)
unittests 72.47% <64.28%> (+40.52%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...areness/IsolatedBookieEnsemblePlacementPolicy.java 87.78% <64.28%> (+87.78%) ⬆️

... and 1532 files with indirect coverage changes

Copy link
Contributor

@Technoboy- Technoboy- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Comment on lines 199 to 201
Optional<BookiesRackConfiguration> optional =
bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH).get(
metaOpTimeout, TimeUnit.MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the zookeeper thread reach here? Just to confirm it will not introduce a deadlock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discuss with hang, and we will use a graceful way to avoid sync operation. I will push a next commit later.

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please take a look at the failed tests, thanks.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Great catch

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to provide another approach for handling the cached rack configuration update.

  1. Make allGroupsBookieMapping as a volatile variable
  2. Update the cached rack configuration async

initialize

bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH).thenAccept(opt -> {
                opt.ifPresent(bookiesRackConfiguration -> cachedRackConfiguration = bookiesRackConfiguration);
            }).exceptionally(ex -> {
                log.warn("Failed to load bookies rack configuration while initialize the PlacementPolicy.");
                return null;
            });

getExcludedBookiesWithIsolationGroups

                CompletableFuture<Optional<BookiesRackConfiguration>> future =
                        bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH);

                BookiesRackConfiguration allGroupsBookieMapping;
                future.thenAccept(opt ->
                        opt.ifPresent(bookiesRackConfiguration -> cachedRackConfiguration = bookiesRackConfiguration)
                    ).exceptionally(ex -> {
                        log.warn("The newest bookie rack config is not available now.");
                        return null;
                    });
                allGroupsBookieMapping = cachedRackConfiguration;
                if (allGroupsBookieMapping == null) {
                    return excludedBookies;
                }

When the broker restarts, we might lose the rack configurations, but it can avoid any potential deadlocks.

The initialize method is called when starting the broker before the broker service becomes available to the client. So we will not lose the bookie rack configuration in most cases.

cachedRackConfiguration);
allGroupsBookieMapping = cachedRackConfiguration;
} else {
throw new KeeperException.NoNodeException(ZkBookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How bookie client will handle this exception, and what is the expected behavior from the producer/consumer's perspective? Will the producer/consumer be closed by this exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not, we already caught this exception at line_284

@horizonzy
Copy link
Member Author

 

I would like to provide another approach for handling the cached rack configuration update.

  1. Make allGroupsBookieMapping as a volatile variable
  2. Update the cached rack configuration async

initialize

bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH).thenAccept(opt -> {
                opt.ifPresent(bookiesRackConfiguration -> cachedRackConfiguration = bookiesRackConfiguration);
            }).exceptionally(ex -> {
                log.warn("Failed to load bookies rack configuration while initialize the PlacementPolicy.");
                return null;
            });

getExcludedBookiesWithIsolationGroups

                CompletableFuture<Optional<BookiesRackConfiguration>> future =
                        bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH);

                BookiesRackConfiguration allGroupsBookieMapping;
                future.thenAccept(opt ->
                        opt.ifPresent(bookiesRackConfiguration -> cachedRackConfiguration = bookiesRackConfiguration)
                    ).exceptionally(ex -> {
                        log.warn("The newest bookie rack config is not available now.");
                        return null;
                    });
                allGroupsBookieMapping = cachedRackConfiguration;
                if (allGroupsBookieMapping == null) {
                    return excludedBookies;
                }

When the broker restarts, we might lose the rack configurations, but it can avoid any potential deadlocks.

The initialize method is called when starting the broker before the broker service becomes available to the client. So we will not lose the bookie rack configuration in most cases.

It makes sense, we should print a debug log when the allGroupsBookieMapping is null.

@codelipenghui
Copy link
Contributor

It makes sense, we should print a debug log when the allGroupsBookieMapping is null.

Yes, sure.

@horizonzy
Copy link
Member Author

It makes sense, we should print a debug log when the allGroupsBookieMapping is null.

Yes, sure.

There is a case that needs discussed, we only update the rack config when the opt.isPresent. If the user deletes the zk nodes of path /bookies, the opt.isPresent will be false. Shall we delete the cached rack config or still use it?

@hangc0276
Copy link
Contributor

It makes sense, we should print a debug log when the allGroupsBookieMapping is null.

Yes, sure.

There is a case that needs discussed, we only update the rack config when the opt.isPresent. If the user deletes the zk nodes of path /bookies, the opt.isPresent will be false. Shall we delete the cached rack config or still use it?

@horizonzy We need to delete the cached rack configuration.

@horizonzy
Copy link
Member Author

It makes sense, we should print a debug log when the allGroupsBookieMapping is null.

Yes, sure.

There is a case that needs discussed, we only update the rack config when the opt.isPresent. If the user deletes the zk nodes of path /bookies, the opt.isPresent will be false. Shall we delete the cached rack config or still use it?

@horizonzy We need to delete the cached rack configuration.

What do you think of it?

@hangc0276
Copy link
Contributor

It makes sense, we should print a debug log when the allGroupsBookieMapping is null.

Yes, sure.

There is a case that needs discussed, we only update the rack config when the opt.isPresent. If the user deletes the zk nodes of path /bookies, the opt.isPresent will be false. Shall we delete the cached rack config or still use it?

@horizonzy We need to delete the cached rack configuration.

What do you think of it?

@horizonzy Do checking the exception type work? If the exception is a NoNode exception, update the cached rack configuration to null.

@horizonzy
Copy link
Member Author

It makes sense, we should print a debug log when the allGroupsBookieMapping is null.

Yes, sure.

There is a case that needs discussed, we only update the rack config when the opt.isPresent. If the user deletes the zk nodes of path /bookies, the opt.isPresent will be false. Shall we delete the cached rack config or still use it?

@horizonzy We need to delete the cached rack configuration.

What do you think of it?

@horizonzy Do checking the exception type work? If the exception is a NoNode exception, update the cached rack configuration to null.

It doesn't work. The zk metadata store handles it separately, not throw exceptions.

if (code == Code.NONODE) {
// For get operations, we return an empty optional
op.getFuture().complete(Optional.empty());

bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH);
bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH)
.thenAccept(opt -> cachedRackConfiguration = opt.orElse(null)).exceptionally(e -> {
log.warn("Failed to update the newest bookies rack config.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the log msg to log.warn("Failed to update the newest bookies rack config, and use the cached rack configuration: {}", cachedRackConfiguration)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an async operation, it won't affect the result of the current invocation, and printing this log may be misleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@hangc0276 hangc0276 merged commit abd7bfa into apache:master Sep 5, 2023
45 checks passed
Technoboy- pushed a commit that referenced this pull request Sep 5, 2023
### Modifications
When upgraded the pulsar version from 2.9.2 to 2.10.3, and the isolated group feature not work anymore.
Finally, we found the problem. In IsolatedBookieEnsemblePlacementPolicy, when it gets the bookie rack from the metadata store cache, uses future.isDone() to avoid sync operation. If the future is incomplete, return empty blacklists. 
The cache may expire due to the caffeine cache `getExpireAfterWriteMillis` config, if the cache expires, the future may be incomplete. (#21095 will correct the behavior)

In 2.9.2, it uses the sync to get data from the metadata store, we should also keep the behavior.
Technoboy- pushed a commit that referenced this pull request Sep 11, 2023
When upgraded the pulsar version from 2.9.2 to 2.10.3, and the isolated group feature not work anymore.
Finally, we found the problem. In IsolatedBookieEnsemblePlacementPolicy, when it gets the bookie rack from the metadata store cache, uses future.isDone() to avoid sync operation. If the future is incomplete, return empty blacklists.
The cache may expire due to the caffeine cache `getExpireAfterWriteMillis` config, if the cache expires, the future may be incomplete. (#21095 will correct the behavior)

In 2.9.2, it uses the sync to get data from the metadata store, we should also keep the behavior.
@shibd
Copy link
Member

shibd commented Oct 22, 2023

@horizonzy Can you help create a PR to cherry-pick this change to branch-2.11?

@horizonzy
Copy link
Member Author

@horizonzy Can you help create a PR to cherry-pick this change to branch-2.11?

Ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.10 cherry-picked/branch-2.11 cherry-picked/branch-3.0 cherry-picked/branch-3.1 doc-not-needed Your PR changes do not impact docs ready-to-test release/2.10.6 release/2.11.3 release/3.0.2 release/3.1.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants