Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker]After the broker is restarted, the cache dynamic configuration is invalid #17035

Merged
merged 7 commits into from Sep 6, 2022

Conversation

lordcheng10
Copy link
Contributor

@lordcheng10 lordcheng10 commented Aug 10, 2022

Motivation

We update some cache dynamic configurations, such as: managedLedgerCacheSizeMB=100, but after we restarted the broker, we found that the dynamically modified cache configuration on zookeeper was invalid, and the value configured in broker.conf took effect.

After restarting, the loading process of config is as follows:

  1. Load the broker.conf file and initialize the ServiceConfiguration object;

  2. Use the ServiceConfiguration object to build the ManagedLedgerStorage object. When building the managedLedgerClientFactory object, the cache configuration in managedLedgerFactoryConfig is initialized through the conf object:

    managedLedgerClientFactory = ManagedLedgerStorage.create(
    config, localMetadataStore,
    bkClientFactory, ioEventLoopGroup
    );
    this.brokerService = newBrokerService(this);

managedLedgerFactoryConfig.setMaxCacheSize(conf.getManagedLedgerCacheSizeMB() * 1024L * 1024L);
managedLedgerFactoryConfig.setCacheEvictionWatermark(conf.getManagedLedgerCacheEvictionWatermark());

  1. When creating the BrokerService object, it will read the dynamic configuration on zookeeper, update it to conf, and then register the listener for the relevant configuration update, but the execution of these listeners is not triggered:
    this.brokerService = newBrokerService(this);

    // (2) update ServiceConfiguration value by reading zk-configuration-map
    updateDynamicServiceConfiguration();
    // (3) Listener Registration
    // add listener on "maxConcurrentLookupRequest" value change
    registerConfigurationListener("maxConcurrentLookupRequest",
    (maxConcurrentLookupRequest) -> lookupRequestSemaphore.set(
    new Semaphore((int) maxConcurrentLookupRequest, false)));
    // add listener on "maxConcurrentTopicLoadRequest" value change
    registerConfigurationListener("maxConcurrentTopicLoadRequest",
    (maxConcurrentTopicLoadRequest) -> topicLoadRequestSemaphore.set(
    new Semaphore((int) maxConcurrentTopicLoadRequest, false)));
    registerConfigurationListener("loadManagerClassName", className -> {
    pulsar.getExecutor().execute(() -> {
    try {
    final LoadManager newLoadManager = LoadManager.create(pulsar);
    log.info("Created load manager: {}", className);
    pulsar.getLoadManager().get().stop();
    newLoadManager.start();
    pulsar.getLoadManager().set(newLoadManager);
    } catch (Exception ex) {
    log.warn("Failed to change load manager", ex);
    }
    });
    });
    // add listener to notify broker managedLedgerCacheSizeMB dynamic config
    registerConfigurationListener("managedLedgerCacheSizeMB", (managedLedgerCacheSizeMB) -> {
    managedLedgerFactory.getEntryCacheManager()
    .updateCacheSizeAndThreshold(((int) managedLedgerCacheSizeMB) * 1024L * 1024L);
    });
    // add listener to notify broker managedLedgerCacheEvictionWatermark dynamic config

Therefore, after the broker is restarted, the cache-related configuration used is still the configuration in the broker.conf file.

Solution:
The configuration is loaded in the following order:

  1. Register and configure the listener;
  2. Read the dynamic configuration on zookeeper and trigger the corresponding listener;

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 10, 2022
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help add tests? We are able to restart the broker during the test.

@HQebupt
Copy link
Contributor

HQebupt commented Aug 12, 2022

Good catch 👍

@lordcheng10
Copy link
Contributor Author

Could you please help add tests? We are able to restart the broker during the test.

@codelipenghui @HQebupt @Technoboy- Fixed, PTAL,thanks!

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

3 similar comments
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

This looks like a regression.
We should pick this into 2.11
@codelipenghui @zymap @lhotari

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

2 similar comments
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jason918
Copy link
Contributor

@lordcheng10 Please solve CI failure.

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

@lordcheng10 Please solve CI failure.

OK

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

7 similar comments
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Sep 6, 2022

@lordcheng10 Please solve CI failure.

Fixed,PTAL,thanks! @Jason918

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Sep 6, 2022

Could you please help add tests? We are able to restart the broker during the test.

@codelipenghui Fixed, PTAL,thanks!

@codelipenghui codelipenghui merged commit a7f1a56 into apache:master Sep 6, 2022
@mattisonchao
Copy link
Member

Hi @lordcheng10
It looks like we got many conflicts when cherry-picking it to branch-2.9.
Would you mind pushing a PR to branch-2.9? (To avoid cherry-picking involving bugs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-2.11 doc-not-needed Your PR changes do not impact docs type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants