Skip to content

Fix Periodic rebalancer Timer leak#1456

Merged
jiajunwang merged 4 commits intoapache:masterfrom
xyuanlu:timer
Oct 26, 2020
Merged

Fix Periodic rebalancer Timer leak#1456
jiajunwang merged 4 commits intoapache:masterfrom
xyuanlu:timer

Conversation

@xyuanlu
Copy link
Contributor

@xyuanlu xyuanlu commented Oct 9, 2020

Issues

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    We found a periodic rebalance's Timer leakage issue during log analysis. In current startPeriodRebalance, two thread may interference with each other. This may result in one timer got canceled twice, two timers are created with one timer leaked.
  void startPeriodRebalance(long period, HelixManager manager) {
    if (period != _timerPeriod) {
      logger.info("Controller starting periodical rebalance timer at period " + period);
      if (_periodicalRebalanceTimer != null) {
        _periodicalRebalanceTimer.cancel();                  <<<------ 
      } 
      _periodicalRebalanceTimer =
          new Timer("GenericHelixController_" + _clusterName + "_periodical_Timer", true);     <<<-----
     ......
  }

This PR changes Timer to use a SingleThreadScheduledExecutor and adds a synchronized block in start/stop PeriodRebalance.

Tests

  • The following tests are written for this issue:

NA

  • The following is the result of the "mvn test" command on the appropriate module:
[ERROR] Failures: 
[ERROR]   TestRoutingTableProviderPeriodicRefresh.testPeriodicRefresh:214 expected:<4> but was:<3>
[INFO]
[ERROR] Tests run: 1234, Failures: 1, Errors: 0, Skipped: 1
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:23 h

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@kaisun2000
Copy link
Contributor

One more thing I forget to mention before is this:

The current leaking is caused that two events are push to TaskRebalance thread and HelixRebalance thread at the same time.
For this kind of changing the config time period change, in fact, we don't need to put two events to two different thread running task or helix separately.

All the above is under assumption that periodic rebalance triggering thread (previous timer, now the is shared by task and now _periodicalRebalanceExecutor. Is there a possibility later that we will use two different (timer)?

@xyuanlu xyuanlu force-pushed the timer branch 4 times, most recently from 8f2fd94 to a0564e9 Compare October 20, 2020 19:44
Copy link
Contributor

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Please ensure the remaining comments are addressed.

@xyuanlu
Copy link
Contributor Author

xyuanlu commented Oct 23, 2020

My PR is ready to be merged in. Approved by @jiajunwang

Final commit message:
Fix Periodic rebalancer Timer leak
In current startPeriodRebalance, two thread may interference with each other. This may result in one timer got canceled twice, two timers are created with one timer leaked. This PR changes Timer to use a SingleThreadScheduledExecutor and adds a synchronized block in start/stop PeriodRebalance.

@jiajunwang jiajunwang merged commit 41c4d48 into apache:master Oct 26, 2020
@xyuanlu xyuanlu deleted the timer branch October 26, 2020 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants