Skip to content

Comments

Fix switchBackDelayNs failure problem in class of AutoClusterFailover#14442

Merged
codelipenghui merged 7 commits intoapache:masterfrom
lordcheng10:Optimize_AutoClusterFailover
Feb 25, 2022
Merged

Fix switchBackDelayNs failure problem in class of AutoClusterFailover#14442
codelipenghui merged 7 commits intoapache:masterfrom
lordcheng10:Optimize_AutoClusterFailover

Conversation

@lordcheng10
Copy link
Contributor

@lordcheng10 lordcheng10 commented Feb 24, 2022

Motivation

Consider such a bad case:

  1. When the following line of code is executed, switch secondary to primary:
    // current service url is secondary, probe whether it is down
    probeAndUpdateServiceUrl(primary, primaryAuthentication, primaryTlsTrustCertsFilePath,
    primaryTlsTrustStorePath, primaryTlsTrustStorePassword);
  2. The currentPulsarServiceUrl at this time is primary, and then the probeAndCheckSwitchBack method is executed immediately, in which the recoverTimestamp timestamp will be updated:
    if (recoverTimestamp == -1) {
    recoverTimestamp = currentTimestamp;
    } else if (currentTimestamp - recoverTimestamp >= switchBackDelayNs) {
  3. When the primary is switched to the secondary again and the probeAndCheckSwitchBack method is executed, the switchBackDelayNs parameter will be invalid, because the recoverTimestamp has been updated again and is no longer equal to -1;

So we need to judge currentPulsarServiceUrl.equals(primary) again before executing probeAndCheckSwitchBack();

Documentation

Check the box below or label this PR directly (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 24, 2022
Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lordcheng10 Nice catch, thanks for your contribution. Would you please help add a test to protect the switch back delay logic?

@lordcheng10 lordcheng10 changed the title Optimize AutoClusterFailover Fix switchBackDelayNs failure problem in class of AutoClusterFailover Feb 24, 2022
@lordcheng10
Copy link
Contributor Author

@lordcheng10 Nice catch, thanks for your contribution. Would you please help add a test to protect the switch back delay logic?

OK ,I will add unit test

@codelipenghui codelipenghui added this to the 2.11.0 milestone Feb 24, 2022
@lordcheng10
Copy link
Contributor Author

@lordcheng10 Nice catch, thanks for your contribution. Would you please help add a test to protect the switch back delay logic?

Done

@lordcheng10
Copy link
Contributor Author

@hangc0276 @codelipenghui PTAL,thanks!

@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@lordcheng10
Copy link
Contributor Author

/pulsarbot run-failure-checks

@lordcheng10
Copy link
Contributor Author

@codelipenghui PTAL,thanks!

@codelipenghui codelipenghui merged commit 0ef7baa into apache:master Feb 25, 2022
@codelipenghui codelipenghui modified the milestones: 2.11.0, 2.10.0 Feb 25, 2022
codelipenghui pushed a commit that referenced this pull request Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants