Skip to content

Change getBaselineAasignment/getBestPossibleAssignment to account for partial-WAGED clusters#1167

Merged
jiajunwang merged 4 commits intoapache:masterfrom
NealSun96:nealsun/get-baseline-assignment-fallback-change
Jul 23, 2020
Merged

Change getBaselineAasignment/getBestPossibleAssignment to account for partial-WAGED clusters#1167
jiajunwang merged 4 commits intoapache:masterfrom
NealSun96:nealsun/get-baseline-assignment-fallback-change

Conversation

@NealSun96
Copy link
Contributor

@NealSun96 NealSun96 commented Jul 22, 2020

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

Fixes #1166

Description

  • The following tests are written for this issue:
    testPartialBaselineAvailability

  • Here are some details about my PR, including screenshots of any UI changes:

WAGED Rebalancer by design only supports one-off migration of resources. As a result, its getBaselineAssignment() did not consider the case of a cluster that's partially managed by WAGED. It tries to get baselines for all resources from the cluster; if there's no baseline data, it gets current states for all resources as a fallback. It doesn't care for resources that don't have baseline while other resources have baselines - when that's the case, the resources that don't have baseline will not fallback to their current states; they will simply not have baselines. A similar problem happens to getBestPossibleAssignment().

The logic is now changed to a per resource level. If a resource has baseline, the baseline will be used; else, the current state will be used. getBestPossibleAssignment() will be fixed the same way.

Tests

  • The following is the result of the "mvn test" command on the appropriate module:
[ERROR] Tests run: 1155, Failures: 5, Errors: 0, Skipped: 1, Time elapsed: 4,324.66 s <<< FAILURE! - in TestSuite
[ERROR] testResourceSubset(org.apache.helix.tools.TestClusterStateVerifier)  Time elapsed: 1.025 s  <<< FAILURE!
org.apache.helix.HelixException: Failed to create pause signal
        at org.apache.helix.tools.TestClusterStateVerifier.testResourceSubset(TestClusterStateVerifier.java:115)

[ERROR] afterMethod(org.apache.helix.tools.TestClusterStateVerifier)  Time elapsed: 1.062 s  <<< FAILURE!
java.lang.IllegalStateException: ZkClient already closed!
        at org.apache.helix.tools.TestClusterStateVerifier.afterMethod(TestClusterStateVerifier.java:98)

[ERROR] testCustomizedViewAggregation(org.apache.helix.integration.TestCustomizedViewAggregation)  Time elapsed: 12.133 s  <<< FAILURE!
java.lang.AssertionError: expected:<true> but was:<false>
        at org.apache.helix.integration.TestCustomizedViewAggregation.validateAggregationSnapshot(TestCustomizedViewAggregation.java:238)
        at org.apache.helix.integration.TestCustomizedViewAggregation.testCustomizedViewAggregation(TestCustomizedViewAggregation.java:394)

[ERROR] testStateTransitionTimeOut(org.apache.helix.integration.paticipant.TestStateTransitionTimeoutWithResource)  Time elapsed: 36.061 s  <<< FAILURE!
java.lang.AssertionError: expected:<true> but was:<false>
        at org.apache.helix.integration.paticipant.TestStateTransitionTimeoutWithResource.testStateTransitionTimeOut(TestStateTransitionTimeoutWithResource.java:171)

[ERROR] testPeriodicRefresh(org.apache.helix.integration.spectator.TestRoutingTableProviderPeriodicRefresh)  Time elapsed: 2.015 s  <<< FAILURE!
java.lang.AssertionError: expected:<7> but was:<6>
        at org.apache.helix.integration.spectator.TestRoutingTableProviderPeriodicRefresh.testPeriodicRefresh(TestRoutingTableProviderPeriodicRefresh.java:192)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestCustomizedViewAggregation.testCustomizedViewAggregation:394->validateAggregationSnapshot:238 expected:<true> but was:<false>
[ERROR]   TestStateTransitionTimeoutWithResource.testStateTransitionTimeOut:171 expected:<true> but was:<false>
[ERROR]   TestRoutingTableProviderPeriodicRefresh.testPeriodicRefresh:192 expected:<7> but was:<6>
[ERROR]   TestClusterStateVerifier.afterMethod:98 » IllegalState ZkClient already closed...
[ERROR]   TestClusterStateVerifier.testResourceSubset:115 » Helix Failed to create pause...
[INFO] 
[ERROR] Tests run: 1155, Failures: 5, Errors: 0, Skipped: 1
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:12 h
[INFO] Finished at: 2020-07-22T18:49:39-07:00
[INFO] ------------------------------------------------------------------------

Rerun

[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.505 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  43.912 s
[INFO] Finished at: 2020-07-22T19:12:42-07:00
[INFO] ------------------------------------------------------------------------

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

Copy link
Contributor

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do the same for getBestPossibleAssignment too.

@NealSun96 NealSun96 changed the title Change getBaselineAasignment to account for partial-WAGED clusters Change getBaselineAasignment/getBestPossibleAssignment to account for partial-WAGED clusters Jul 22, 2020
Copy link
Contributor

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test case to cover this scenario.

Copy link
Contributor

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@NealSun96
Copy link
Contributor Author

This PR is ready to be merged, approved by @jiajunwang
Final commit message:

Change getBaselineAasignment/getBestPossibleAssignment to account for partial-WAGED clusters

The logics are now changed to a per resource level. If a resource has baseline/best possible, the baseline/best possible will be used; else, the current state will be used.

@jiajunwang jiajunwang merged commit e218607 into apache:master Jul 23, 2020
kaisun2000 pushed a commit to kaisun2000/helix that referenced this pull request Jul 29, 2020
… partial-WAGED clusters (apache#1167)

The logics are now changed to a per resource level. If a resource has baseline/best possible, the baseline/best possible will be used; else, the current state will be used.

Co-authored-by: Neal Sun <nesun@nesun-mn1.linkedin.biz>
huizhilu pushed a commit to huizhilu/helix that referenced this pull request Aug 16, 2020
… partial-WAGED clusters (apache#1167)

The logics are now changed to a per resource level. If a resource has baseline/best possible, the baseline/best possible will be used; else, the current state will be used.

Co-authored-by: Neal Sun <nesun@nesun-mn1.linkedin.biz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change getBaselineAssignment/getBestPossibleAssignment to account for partial-WAGED clusters

2 participants