-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce rebalance simulate tool #487
Conversation
To the reviewers, this is only for testing and evaluation. It should not impact business logic. |
...core/src/main/java/org/apache/helix/examples/rebalancer/simulator/CloneClusterSimulator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/helix/examples/rebalancer/simulator/CloneClusterSimulator.java
Outdated
Show resolved
Hide resolved
/** | ||
* Overwrites the cluster metadata for the simulation purposes. | ||
*/ | ||
public interface MetadataOverwrites { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetadataOverwriter or MetadataCopier? Or MetadataCopyTool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to ensure the user understands that this class if for overwriting the existing metadata which is copied from somewhere else. So strictly speaking, it is not a copy tool.
|
||
@Override | ||
public String getDescription() { | ||
return "Adding new resource: " + _resourceName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: String format like how you do it elsewhere in other Operations
|
||
@Override | ||
public String getDescription() { | ||
return "Adding new node: " + _instanceConfig.getInstanceName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: String format like how you do it elsewhere in other Operations
} catch (Exception e) { | ||
return false; | ||
} | ||
ZkHelixClusterVerifier verifier = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is verifier only being used for AddNode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to ensure the participant process starts and registers correctly.
helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/AbstractSimulator.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/AbstractSimulator.java
Outdated
Show resolved
Hide resolved
return false; | ||
}, DEFAULT_WAIT_TIME); | ||
|
||
if (!isClusterConverged()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An operation could trigger multiple transitions, right? What if multiple Participants got transitions but haven't fully completed them? Then would you want to break here right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add some more validations to ensure the converged assignment is a valid one. This should be good for the simulate. But still not good enough for our integration tests.
helix-core/src/main/java/org/apache/helix/examples/rebalancer/simulator/AbstractSimulator.java
Show resolved
Hide resolved
63e5942
to
8271d74
Compare
Update for generating the report, I will refine the format later. |
7c6f1be
to
4353f67
Compare
1f561dc
to
c23ae6a
Compare
This is the intial check in for the future development of the WAGED rebalancer. All the components are placeholders. They will be implemented gradually.
* Adding the configuration items of the WAGED rebalancer. Including: Instance Capacity Keys, Rebalance Preferences, Instance Capacity Details, Partition Capacity (the weight) Details. Also adding test to cover the new configuration items.
* Introduce the cluster model classes to support the WAGED rebalancer. Implement the cluster model classes with the minimum necessary information to support rebalance. Additional field/logics might be added later once the detailed rebalance logic is implemented. Also add related tests.
…ead of IdealState. (apache#398) ResourceAssignment fit the usage better. And there will be no unnecessary information to be recorded or read during the rebalance calculation.
…nment. (apache#399) This is to avoid unnecessary information being recorded or read.
* Implement Cluster Model Provider. The model provider is called in the WAGED rebalancer to generate CLuster Model based on the current cluster status. The major responsibility of the provider is to parse all the assignable replicas and identify which replicas need to be reassigned. Note that if the current best possible assignment is still valid, the rebalancer won't need to calculate for the partition assignment. Also, add unit tests to verify the main logic.
This config will be applied to the instance when there is no (or empty) capacity configuration in the Instance Config. Also add unit tests.
apache#388) Add ChangeDetector interface and ResourceChangeDetector implementation In order to efficiently react to changes happening to the cluster in the new WAGED rebalancer, a new component called ChangeDetector was added. Changelist: 1. Add ChangeDetector interface 2. Implement ResourceChangeDetector 3. Add ResourceChangeCache, a wrapper for critical cluster metadata 4. Add an integration test, TestResourceChangeDetector
* Refactor the interfaces of hard/soft constraints and a central place to keep the softConstraint weights
…e#431) * Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage. - Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer. - Refine ClusterModel to integrate with the ClusterDataDetector implementation. - Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer.
apache#431)" (apache#437) This reverts commit 08a2015.
…or integration (apache#431) * Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage. - Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer. - Refine ClusterModel to integrate with the ClusterDataDetector implementation. - Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer. * Bring back the interface class and algorithm placeholder class that was removed prematurely.
…WAGED rebalancer. (apache#438) CONFIG is for generic configuration items. That will be too generic for the rebalancer. Modify to check for CLUSTER_CONFIG to avoid confusion.
This diff allows callers of getChangeType to iterate over the result of getChangeType() by changing determinePropertyMapByType so that it just returns an empty map for ClusterConfig.
…artition name (apache#440) The replica instances are required while the rebalance algorithm generating ResourceAssignment based on the AssignableNode instances. Refine the methods of the AssignableNode for better code style and readability. Also, modify the related test cases to verify state information and new methods.
For the new WAGED rebalancer, it's necessary to have a data accessor that will allow writes of data exceeding 1MB. ZooKeeper's ZNode size is capped at 1MB, so BucketDataAccessor interface and ZkBucketDataAccessor help us achieve this. Changelist: 1. Add BucketDataAccessor and ZkBucketDataAccessor 2. Add necessary serializers 3. Add an integration test against ZK
Implement basic constraint algorithm: Greedy based, each time it picks the best scores given each replica and assigns the replica to the node. It doesn't guarantee to achieve global optimal but local optimal result The algorithm is based on a given set of constraints * HardConstraint: Approve or deny the assignment given its condition, any assignment cannot bypass any "hard constraint" * SoftConstraint: Evaluate the assignment by points/rewards/scores, a higher point means a better assignment The goal is to avoid all "hard constraints" while accumulating the most points(rewards) from "soft constraints"
For the WAGED rebalancer, we persist the cluster's mapping via AssignmentMetadataStore every pipeline. However, if there are no changes made to the new assignment from the old assignment, this write is not necessary. This diff checks whether they are equal and skips the write if old and new assignments are the same.
…pache#574) ResourceToReblance map also has resources from current states. And this causes null pointer exceptions at parsing all replicas stage when the resource is not in ideal states. This diff fixes the issue by only using the resources in ideal states to parse all replicas.
…he#573) Use a dry-run rebalancer to avoid updating the persisted rebalancer status in the verifiers or tests. Also, refine several rebalancer related interfaces so as to simplify the dry-run rebalancer implementation. Convert the test cases back to use the BestPossibleExternalViewVerifier. Additional fixing: - Updating the rebalancer preference for every rebalancer.compute calls. Since the preference might be updated at runtime. - Fix one minor metric domain name bug in the WagedRebalancerMetricCollector. - Minor test case fix to make them more stable after the change.
Change ClusterConfig.setDefaultCapacityMap to be private.
…apache#570) Add Java API methods for adding and validating resources for WAGED rebalancer. This is a set of convenience APIs provided through HelixAdmin the user could use to more easily add resources and validate them for WAGED rebalance usage. Changelist: 1. Add API methods in HelixAdmin 2. Implement the said methods 3. Add tests
Change the calculation for baseline divergence: 0.0 means no difference, 1.0 means all are different.
This change improves the rebalance's speed by 2x to 5x depends on the host capacity. Parallelism the loop processing whenever possible and help to improve the performance. This does not change the logic. Avoid some duplicate logic in the loop. Put the calculation outside the loop and only do it once.
Fix the unstable test TestZeroReplicaAvoidance by waiting. This is a temporary resolution before we fix issue apache#526. Marked it in the TODO comment so easier for us to remove the wait in batch later.
We want to make WAGED rebalancer (weight-aware) easier to use. One way to do this is to allow the user to easily add resources with weight configuration set by providing REST endpoints. This change adds the relevant REST endpoints based on the HelixAdmin APIs added in (apache#570). Basically, this commit uses existing REST endpoints whose hierarchy is defined by REST resource. What this commit does to the existing endpoints is 1) Add extra commands 2) Add a WAGED command as a QueryParam so that WAGED logic could be included. This change is backward-compatible because it keeps the original behavior when no commands are provided by using @DefaultValue annotation.
The trim logic in the ResourceChangeSnapshot for cleaning up the IdealState should not clear the whole map. This will cause the WAGED rebalancer ignores changes such as new partitions into the partition list. Modify the test case accordingly.
…ti-threads. (apache#636) The previous design of RebalanceLatencyGauge won't support asynchronous metric data emitting. This PR adds support by using a ThreadLocal object. The metric logic is not changed.
…pache#637) This option will be used by the WAGED rebalancer to determine if the global rebalance should be performed asynchronously.
…ion. (apache#638) The previous design is that both on-demand and periodic rebalance scheduling task will request for a cache refresh. This won't be always true moving forward. For example, the WAGED rebalancer async baseline calculating requests for a scheduled rebalance. But cache refresh won't be necessary. This PR does not change any business logic. It prepares for future feature change. This PR ensures strict backward compatibility.
apache#552) Deep copy for mapFields and listFields in ZNRecord's copy constructor. Change list: 1. deep copy for mapFields and listFields in ZNRecord's copy constructor. 2. add unit test for the deep copy constructor.
…structor. (apache#552)" This reverts commit 2a335cf.
Current Helix will charge the partitions with pending message twice due to not remove the partition from partitions need recovery/load balance set when they are charged for pending message. This fix will fix the problem for recovery/load/ANY rebalance type charged for pending messages.
…tency (apache#576) * Minor log optimization for easy debugging mastership handoff latency analysis Make the log for visible and human-readable when any mastership handoff happens
Added a new constructor with a custom serializer as the parameter default constructor with ZnRecordSerializer added cross-validated unit tests to verify the custom/default serializer works correctly
Create workflowContext only if config exists. An if statement is added in the getOrInitializeWorkflowContext function: It checks whether workflowConfig is null or not. If workflowConfig is null, this function will not initialize the context.
The tool leverages standalone ZK, Helix controller which run locally to simulate the rebalance process. The tool also initializes mock participant threads for simulation state transitions. After each operation, the tool will print the current cluster status and all state transitions happen after the previous print out.
b0ba6e3
to
f6cd078
Compare
This tool won't be required for now. I'm closing this PR and will leave the tool in my branch. |
Issues
#402
Description
The tool leverages standalone ZK, Helix controller which run locally to simulate the rebalance process.
The tool also initializes mock participant threads for simulation state transitions.
After each operation, the tool will print the current cluster status and all state transitions happen after the previous print out.
Tests
NA
(Copy & paste the result of "mvn test")
Commits
Documentation
https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer
Code Quality