Introduce rebalance simulate tool #487

jiajunwang · 2019-09-21T07:50:55Z

Issues

My PR addresses the following Helix issues and references them in the PR description:

Description

Here are some details about my PR, including screenshots of any UI changes:

The tool leverages standalone ZK, Helix controller which run locally to simulate the rebalance process.
The tool also initializes mock participant threads for simulation state transitions.
After each operation, the tool will print the current cluster status and all state transitions happen after the previous print out.

Tests

The following tests are written for this issue:

NA

The following is the result of the "mvn test" command on the appropriate module:

(Copy & paste the result of "mvn test")

Commits

My commits all reference appropriate Apache Helix GitHub issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Documentation

In case of new functionality, my PR adds documentation in the following wiki page:

https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer

Code Quality

My diff has been formatted using helix-style.xml

jiajunwang · 2019-09-21T07:53:46Z

To the reviewers, this is only for testing and evaluation. It should not impact business logic.
And the tool will be continuously modified as we have different simulation requirement.

...core/src/main/java/org/apache/helix/examples/rebalancer/simulator/CloneClusterSimulator.java

narendly · 2019-09-23T16:56:31Z

helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/MetadataOverwrites.java

+/**
+ * Overwrites the cluster metadata for the simulation purposes.
+ */
+public interface MetadataOverwrites {


MetadataOverwriter or MetadataCopier? Or MetadataCopyTool?

I want to ensure the user understands that this class if for overwriting the existing metadata which is copied from somewhere else. So strictly speaking, it is not a copy tool.

narendly · 2019-09-23T17:02:06Z

...ore/src/main/java/org/apache/helix/examples/rebalancer/simulator/operations/AddResource.java

+
+  @Override
+  public String getDescription() {
+    return "Adding new resource: " + _resourceName;


Nit: String format like how you do it elsewhere in other Operations

narendly · 2019-09-23T17:02:22Z

helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/operations/AddNode.java

+
+  @Override
+  public String getDescription() {
+    return "Adding new node: " + _instanceConfig.getInstanceName();


Nit: String format like how you do it elsewhere in other Operations

narendly · 2019-09-23T17:20:45Z

helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/operations/AddNode.java

+    } catch (Exception e) {
+      return false;
+    }
+    ZkHelixClusterVerifier verifier =


Why is verifier only being used for AddNode?

This is to ensure the participant process starts and registers correctly.

helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/AbstractSimulator.java

narendly · 2019-09-23T17:25:07Z

helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/AbstractSimulator.java

+          return false;
+        }, DEFAULT_WAIT_TIME);
+
+        if (!isClusterConverged()) {


An operation could trigger multiple transitions, right? What if multiple Participants got transitions but haven't fully completed them? Then would you want to break here right away?

I tried to add some more validations to ensure the converged assignment is a valid one. This should be good for the simulate. But still not good enough for our integration tests.

helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/AbstractSimulator.java

jiajunwang · 2019-10-03T19:23:15Z

Update for generating the report, I will refine the format later.

This is the intial check in for the future development of the WAGED rebalancer. All the components are placeholders. They will be implemented gradually.

* Adding the configuration items of the WAGED rebalancer. Including: Instance Capacity Keys, Rebalance Preferences, Instance Capacity Details, Partition Capacity (the weight) Details. Also adding test to cover the new configuration items.

* Introduce the cluster model classes to support the WAGED rebalancer. Implement the cluster model classes with the minimum necessary information to support rebalance. Additional field/logics might be added later once the detailed rebalance logic is implemented. Also add related tests.

…ead of IdealState. (apache#398) ResourceAssignment fit the usage better. And there will be no unnecessary information to be recorded or read during the rebalance calculation.

…nment. (apache#399) This is to avoid unnecessary information being recorded or read.

* Implement Cluster Model Provider. The model provider is called in the WAGED rebalancer to generate CLuster Model based on the current cluster status. The major responsibility of the provider is to parse all the assignable replicas and identify which replicas need to be reassigned. Note that if the current best possible assignment is still valid, the rebalancer won't need to calculate for the partition assignment. Also, add unit tests to verify the main logic.

This config will be applied to the instance when there is no (or empty) capacity configuration in the Instance Config. Also add unit tests.

apache#388) Add ChangeDetector interface and ResourceChangeDetector implementation In order to efficiently react to changes happening to the cluster in the new WAGED rebalancer, a new component called ChangeDetector was added. Changelist: 1. Add ChangeDetector interface 2. Implement ResourceChangeDetector 3. Add ResourceChangeCache, a wrapper for critical cluster metadata 4. Add an integration test, TestResourceChangeDetector

* Refactor the interfaces of hard/soft constraints and a central place to keep the softConstraint weights

…e#431) * Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage. - Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer. - Refine ClusterModel to integrate with the ClusterDataDetector implementation. - Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer.

apache#431)" (apache#437) This reverts commit 08a2015.

…or integration (apache#431) * Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage. - Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer. - Refine ClusterModel to integrate with the ClusterDataDetector implementation. - Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer. * Bring back the interface class and algorithm placeholder class that was removed prematurely.

…WAGED rebalancer. (apache#438) CONFIG is for generic configuration items. That will be too generic for the rebalancer. Modify to check for CLUSTER_CONFIG to avoid confusion.

This diff allows callers of getChangeType to iterate over the result of getChangeType() by changing determinePropertyMapByType so that it just returns an empty map for ClusterConfig.

…artition name (apache#440) The replica instances are required while the rebalance algorithm generating ResourceAssignment based on the AssignableNode instances. Refine the methods of the AssignableNode for better code style and readability. Also, modify the related test cases to verify state information and new methods.

For the new WAGED rebalancer, it's necessary to have a data accessor that will allow writes of data exceeding 1MB. ZooKeeper's ZNode size is capped at 1MB, so BucketDataAccessor interface and ZkBucketDataAccessor help us achieve this. Changelist: 1. Add BucketDataAccessor and ZkBucketDataAccessor 2. Add necessary serializers 3. Add an integration test against ZK

Implement basic constraint algorithm: Greedy based, each time it picks the best scores given each replica and assigns the replica to the node. It doesn't guarantee to achieve global optimal but local optimal result The algorithm is based on a given set of constraints * HardConstraint: Approve or deny the assignment given its condition, any assignment cannot bypass any "hard constraint" * SoftConstraint: Evaluate the assignment by points/rewards/scores, a higher point means a better assignment The goal is to avoid all "hard constraints" while accumulating the most points(rewards) from "soft constraints"

For the WAGED rebalancer, we persist the cluster's mapping via AssignmentMetadataStore every pipeline. However, if there are no changes made to the new assignment from the old assignment, this write is not necessary. This diff checks whether they are equal and skips the write if old and new assignments are the same.

…pache#574) ResourceToReblance map also has resources from current states. And this causes null pointer exceptions at parsing all replicas stage when the resource is not in ideal states. This diff fixes the issue by only using the resources in ideal states to parse all replicas.

…he#573) Use a dry-run rebalancer to avoid updating the persisted rebalancer status in the verifiers or tests. Also, refine several rebalancer related interfaces so as to simplify the dry-run rebalancer implementation. Convert the test cases back to use the BestPossibleExternalViewVerifier. Additional fixing: - Updating the rebalancer preference for every rebalancer.compute calls. Since the preference might be updated at runtime. - Fix one minor metric domain name bug in the WagedRebalancerMetricCollector. - Minor test case fix to make them more stable after the change.

Change ClusterConfig.setDefaultCapacityMap to be private.

…apache#570) Add Java API methods for adding and validating resources for WAGED rebalancer. This is a set of convenience APIs provided through HelixAdmin the user could use to more easily add resources and validate them for WAGED rebalance usage. Changelist: 1. Add API methods in HelixAdmin 2. Implement the said methods 3. Add tests

Change the calculation for baseline divergence: 0.0 means no difference, 1.0 means all are different.

This change improves the rebalance's speed by 2x to 5x depends on the host capacity. Parallelism the loop processing whenever possible and help to improve the performance. This does not change the logic. Avoid some duplicate logic in the loop. Put the calculation outside the loop and only do it once.

Fix the unstable test TestZeroReplicaAvoidance by waiting. This is a temporary resolution before we fix issue apache#526. Marked it in the TODO comment so easier for us to remove the wait in batch later.

We want to make WAGED rebalancer (weight-aware) easier to use. One way to do this is to allow the user to easily add resources with weight configuration set by providing REST endpoints. This change adds the relevant REST endpoints based on the HelixAdmin APIs added in (apache#570). Basically, this commit uses existing REST endpoints whose hierarchy is defined by REST resource. What this commit does to the existing endpoints is 1) Add extra commands 2) Add a WAGED command as a QueryParam so that WAGED logic could be included. This change is backward-compatible because it keeps the original behavior when no commands are provided by using @DefaultValue annotation.

The trim logic in the ResourceChangeSnapshot for cleaning up the IdealState should not clear the whole map. This will cause the WAGED rebalancer ignores changes such as new partitions into the partition list. Modify the test case accordingly.

…ti-threads. (apache#636) The previous design of RebalanceLatencyGauge won't support asynchronous metric data emitting. This PR adds support by using a ThreadLocal object. The metric logic is not changed.

…pache#637) This option will be used by the WAGED rebalancer to determine if the global rebalance should be performed asynchronously.

…ion. (apache#638) The previous design is that both on-demand and periodic rebalance scheduling task will request for a cache refresh. This won't be always true moving forward. For example, the WAGED rebalancer async baseline calculating requests for a scheduled rebalance. But cache refresh won't be necessary. This PR does not change any business logic. It prepares for future feature change. This PR ensures strict backward compatibility.

apache#552) Deep copy for mapFields and listFields in ZNRecord's copy constructor. Change list: 1. deep copy for mapFields and listFields in ZNRecord's copy constructor. 2. add unit test for the deep copy constructor.

…structor. (apache#552)" This reverts commit 2a335cf.

Current Helix will charge the partitions with pending message twice due to not remove the partition from partitions need recovery/load balance set when they are charged for pending message. This fix will fix the problem for recovery/load/ANY rebalance type charged for pending messages.

…tency (apache#576) * Minor log optimization for easy debugging mastership handoff latency analysis Make the log for visible and human-readable when any mastership handoff happens

Added a new constructor with a custom serializer as the parameter default constructor with ZnRecordSerializer added cross-validated unit tests to verify the custom/default serializer works correctly

Create workflowContext only if config exists. An if statement is added in the getOrInitializeWorkflowContext function: It checks whether workflowConfig is null or not. If workflowConfig is null, this function will not initialize the context.

The tool leverages standalone ZK, Helix controller which run locally to simulate the rebalance process. The tool also initializes mock participant threads for simulation state transitions. After each operation, the tool will print the current cluster status and all state transitions happen after the previous print out.

…sure converge.

jiajunwang · 2019-12-23T22:04:50Z

This tool won't be required for now. I'm closing this PR and will leave the tool in my branch.

jiajunwang self-assigned this Sep 21, 2019

narendly reviewed Sep 23, 2019

View reviewed changes

jiajunwang force-pushed the tool branch 4 times, most recently from 63e5942 to 8271d74 Compare October 2, 2019 20:59

asfgit force-pushed the wagedRebalancer branch from 7c6f1be to 4353f67 Compare October 7, 2019 21:17

jiajunwang force-pushed the tool branch 2 times, most recently from 1f561dc to c23ae6a Compare October 18, 2019 22:09

jiajunwang force-pushed the tool branch from c23ae6a to 5f8ac3a Compare October 28, 2019 22:16

jiajunwang and others added 17 commits October 28, 2019 15:32

Define the WAGED rebalancer interfaces.

3a720ef

This is the intial check in for the future development of the WAGED rebalancer. All the components are placeholders. They will be implemented gradually.

Change the rebalancer assignment record to be ResourceAssignment inst…

ba64365

…ead of IdealState. (apache#398) ResourceAssignment fit the usage better. And there will be no unnecessary information to be recorded or read during the rebalance calculation.

Convert all the internal assignment state objects to be ResourceAssig…

356a5aa

…nment. (apache#399) This is to avoid unnecessary information being recorded or read.

Add cluster level default instance config. (apache#413)

240772b

This config will be applied to the instance when there is no (or empty) capacity configuration in the Instance Config. Also add unit tests.

Redefine the hard/soft constraints (apache#422)

2e99b98

* Refactor the interfaces of hard/soft constraints and a central place to keep the softConstraint weights

Revert "Refine the WAGED rebalancer related interfaces for integration (

cd85e67

apache#431)" (apache#437) This reverts commit 08a2015.

Modify the expected change type from CONFIG to CLUSTER_CONFIG in the …

84eb689

…WAGED rebalancer. (apache#438) CONFIG is for generic configuration items. That will be too generic for the rebalancer. Modify to check for CLUSTER_CONFIG to avoid confusion.

Add special treatment for ClusterConfig

6b800e6

This diff allows callers of getChangeType to iterate over the result of getChangeType() by changing determinePropertyMapByType so that it just returns an empty map for ClusterConfig.

narendly and others added 9 commits November 4, 2019 18:04

Change ClusterConfig.setDefaultCapacityMap to be private. (apache#590)

22e6b87

Change ClusterConfig.setDefaultCapacityMap to be private.

Change calculation for baseline divergence. (apache#598)

916a9d0

Change the calculation for baseline divergence: 0.0 means no difference, 1.0 means all are different.

Fix the unstable test TestZeroReplicaAvoidance. (apache#603)

4dc35c0

Fix the unstable test TestZeroReplicaAvoidance by waiting. This is a temporary resolution before we fix issue apache#526. Marked it in the TODO comment so easier for us to remove the wait in batch later.

jiajunwang force-pushed the tool branch from 8245ba7 to fa5dd94 Compare November 27, 2019 19:35

jiajunwang and others added 14 commits December 5, 2019 16:42

Simply and enhance the RebalanceLatencyGauge so it can be used in mul…

3ace9da

…ti-threads. (apache#636) The previous design of RebalanceLatencyGauge won't support asynchronous metric data emitting. This PR adds support by using a ThreadLocal object. The metric logic is not changed.

Add new WAGED rebalancer config item "GLOBAL_REBALANCE_ASYNC_MODE". (a…

fae5272

…pache#637) This option will be used by the WAGED rebalancer to determine if the global rebalance should be performed asynchronously.

Revert "Deep copy for mapFields and listFields in ZNRecord's copy con…

ea0d4ce

…structor. (apache#552)" This reverts commit 2a335cf.

Minor log optimization for easily investigating mastership handoff la…

53758ee

…tency (apache#576) * Minor log optimization for easy debugging mastership handoff latency analysis Make the log for visible and human-readable when any mastership handoff happens

Update ZkBaseDataAccessor with custom serializer support (apache#534)

a39e623

Added a new constructor with a custom serializer as the parameter default constructor with ZnRecordSerializer added cross-validated unit tests to verify the custom/default serializer works correctly

Refine the simulating examples.

91700f4

Refine some names and output format. Adding more validate logic to en…

40d8052

…sure converge.

Add min replica count to AddResource test operation.

596cb27

jiajunwang force-pushed the tool branch from fa5dd94 to 596cb27 Compare December 11, 2019 01:00

asfgit force-pushed the wagedRebalancer branch from b0ba6e3 to f6cd078 Compare December 13, 2019 00:34

jiajunwang closed this Dec 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce rebalance simulate tool #487

Introduce rebalance simulate tool #487

jiajunwang commented Sep 21, 2019 •

edited

Loading

jiajunwang commented Sep 21, 2019 •

edited

Loading

narendly Sep 23, 2019

jiajunwang Oct 3, 2019

narendly Sep 23, 2019

narendly Sep 23, 2019

narendly Sep 23, 2019

jiajunwang Oct 3, 2019 •

edited

Loading

narendly Sep 23, 2019

jiajunwang Oct 3, 2019

jiajunwang commented Oct 3, 2019

jiajunwang commented Dec 23, 2019

Introduce rebalance simulate tool #487

Introduce rebalance simulate tool #487

Conversation

jiajunwang commented Sep 21, 2019 • edited Loading

Issues

Description

Tests

Commits

Documentation

Code Quality

jiajunwang commented Sep 21, 2019 • edited Loading

narendly Sep 23, 2019

Choose a reason for hiding this comment

jiajunwang Oct 3, 2019

Choose a reason for hiding this comment

narendly Sep 23, 2019

Choose a reason for hiding this comment

narendly Sep 23, 2019

Choose a reason for hiding this comment

narendly Sep 23, 2019

Choose a reason for hiding this comment

jiajunwang Oct 3, 2019 • edited Loading

Choose a reason for hiding this comment

narendly Sep 23, 2019

Choose a reason for hiding this comment

jiajunwang Oct 3, 2019

Choose a reason for hiding this comment

jiajunwang commented Oct 3, 2019

jiajunwang commented Dec 23, 2019

jiajunwang commented Sep 21, 2019 •

edited

Loading

jiajunwang commented Sep 21, 2019 •

edited

Loading

jiajunwang Oct 3, 2019 •

edited

Loading