Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce rebalance simulate tool #487

Closed
wants to merge 80 commits into from

Conversation

jiajunwang
Copy link
Contributor

@jiajunwang jiajunwang commented Sep 21, 2019

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

#402

Description

  • Here are some details about my PR, including screenshots of any UI changes:

The tool leverages standalone ZK, Helix controller which run locally to simulate the rebalance process.
The tool also initializes mock participant threads for simulation state transitions.
After each operation, the tool will print the current cluster status and all state transitions happen after the previous print out.

Tests

  • The following tests are written for this issue:

NA

  • The following is the result of the "mvn test" command on the appropriate module:

(Copy & paste the result of "mvn test")

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation in the following wiki page:

https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer

Code Quality

  • My diff has been formatted using helix-style.xml

@jiajunwang jiajunwang self-assigned this Sep 21, 2019
@jiajunwang
Copy link
Contributor Author

jiajunwang commented Sep 21, 2019

To the reviewers, this is only for testing and evaluation. It should not impact business logic.
And the tool will be continuously modified as we have different simulation requirement.

/**
* Overwrites the cluster metadata for the simulation purposes.
*/
public interface MetadataOverwrites {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetadataOverwriter or MetadataCopier? Or MetadataCopyTool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to ensure the user understands that this class if for overwriting the existing metadata which is copied from somewhere else. So strictly speaking, it is not a copy tool.


@Override
public String getDescription() {
return "Adding new resource: " + _resourceName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: String format like how you do it elsewhere in other Operations


@Override
public String getDescription() {
return "Adding new node: " + _instanceConfig.getInstanceName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: String format like how you do it elsewhere in other Operations

} catch (Exception e) {
return false;
}
ZkHelixClusterVerifier verifier =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is verifier only being used for AddNode?

Copy link
Contributor Author

@jiajunwang jiajunwang Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure the participant process starts and registers correctly.

return false;
}, DEFAULT_WAIT_TIME);

if (!isClusterConverged()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An operation could trigger multiple transitions, right? What if multiple Participants got transitions but haven't fully completed them? Then would you want to break here right away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add some more validations to ensure the converged assignment is a valid one. This should be good for the simulate. But still not good enough for our integration tests.

@jiajunwang
Copy link
Contributor Author

Update for generating the report, I will refine the format later.

jiajunwang and others added 17 commits October 28, 2019 15:32
This is the intial check in for the future development of the WAGED rebalancer.
All the components are placeholders. They will be implemented gradually.
* Adding the configuration items of the WAGED rebalancer.

Including: Instance Capacity Keys, Rebalance Preferences, Instance Capacity Details, Partition Capacity (the weight) Details.
Also adding test to cover the new configuration items.
* Introduce the cluster model classes to support the WAGED rebalancer.

Implement the cluster model classes with the minimum necessary information to support rebalance.
Additional field/logics might be added later once the detailed rebalance logic is implemented.

Also add related tests.
…ead of IdealState. (apache#398)

ResourceAssignment fit the usage better. And there will be no unnecessary information to be recorded or read during the rebalance calculation.
…nment. (apache#399)

This is to avoid unnecessary information being recorded or read.
* Implement Cluster Model Provider.

The model provider is called in the WAGED rebalancer to generate CLuster Model based on the current cluster status.
The major responsibility of the provider is to parse all the assignable replicas and identify which replicas need to be reassigned. Note that if the current best possible assignment is still valid, the rebalancer won't need to calculate for the partition assignment.

Also, add unit tests to verify the main logic.
This config will be applied to the instance when there is no (or empty) capacity configuration in the Instance Config.
Also add unit tests.
apache#388)

Add ChangeDetector interface and ResourceChangeDetector implementation

In order to efficiently react to changes happening to the cluster in the new WAGED rebalancer, a new component called ChangeDetector was added.

Changelist:
1. Add ChangeDetector interface
2. Implement ResourceChangeDetector
3. Add ResourceChangeCache, a wrapper for critical cluster metadata
4. Add an integration test, TestResourceChangeDetector
* Refactor the interfaces of hard/soft constraints and a central place to keep the softConstraint weights
…e#431)

* Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage.

- Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer.
- Refine ClusterModel to integrate with the ClusterDataDetector implementation.
- Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer.
…or integration (apache#431)

* Refine the WAGED rebalancer related interfaces and initial integrate with the BestPossibleStateCalStage.

- Modify the BestPossibleStateCalStage logic to plugin the WAGED rebalancer.
- Refine ClusterModel to integrate with the ClusterDataDetector implementation.
- Enabling getting the changed details for Cluster Config in the change detector. Which is required by the WAGED rebalancer.

* Bring back the interface class and algorithm placeholder class that was removed prematurely.
…WAGED rebalancer. (apache#438)

CONFIG is for generic configuration items. That will be too generic for the rebalancer.
Modify to check for CLUSTER_CONFIG to avoid confusion.
This diff allows callers of getChangeType to iterate over the result of getChangeType() by changing determinePropertyMapByType so that it just returns an empty map for ClusterConfig.
…artition name (apache#440)

The replica instances are required while the rebalance algorithm generating ResourceAssignment based on the AssignableNode instances.
Refine the methods of the AssignableNode for better code style and readability.
Also, modify the related test cases to verify state information and new methods.
For the new WAGED rebalancer, it's necessary to have a data accessor that will allow writes of data exceeding 1MB. ZooKeeper's ZNode size is capped at 1MB, so BucketDataAccessor interface and ZkBucketDataAccessor help us achieve this.
Changelist:
1. Add BucketDataAccessor and ZkBucketDataAccessor
2. Add necessary serializers
3. Add an integration test against ZK
Implement basic constraint algorithm: Greedy based, each time it picks the best scores given each replica and assigns the replica to the node. It doesn't guarantee to achieve global optimal but local optimal result

The algorithm is based on a given set of constraints

* HardConstraint: Approve or deny the assignment given its condition, any assignment cannot bypass any "hard constraint"
* SoftConstraint: Evaluate the assignment by points/rewards/scores, a higher point means a better assignment
The goal is to avoid all "hard constraints" while accumulating the most points(rewards) from "soft constraints"
narendly and others added 9 commits November 4, 2019 18:04
For the WAGED rebalancer, we persist the cluster's mapping via AssignmentMetadataStore every pipeline. However, if there are no changes made to the new assignment from the old assignment, this write is not necessary. This diff checks whether they are equal and skips the write if old and new assignments are the same.
…pache#574)

ResourceToReblance map also has resources from current states. And this causes null pointer exceptions at parsing all replicas stage when the resource is not in ideal states. This diff fixes the issue by only using the resources in ideal states to parse all replicas.
…he#573)

Use a dry-run rebalancer to avoid updating the persisted rebalancer status in the verifiers or tests.
Also, refine several rebalancer related interfaces so as to simplify the dry-run rebalancer implementation.
Convert the test cases back to use the BestPossibleExternalViewVerifier.

Additional fixing:
- Updating the rebalancer preference for every rebalancer.compute calls. Since the preference might be updated at runtime.
- Fix one minor metric domain name bug in the WagedRebalancerMetricCollector.
- Minor test case fix to make them more stable after the change.
Change ClusterConfig.setDefaultCapacityMap to be private.
…apache#570)

Add Java API methods for adding and validating resources for WAGED rebalancer. This is a set of convenience APIs provided through HelixAdmin the user could use to more easily add resources and validate them for WAGED rebalance usage.
Changelist:
1. Add API methods in HelixAdmin
2. Implement the said methods
3. Add tests
Change the calculation for baseline divergence: 0.0 means no difference, 1.0 means all are different.
This change improves the rebalance's speed by 2x to 5x depends on the host capacity.

Parallelism the loop processing whenever possible and help to improve the performance. This does not change the logic.
Avoid some duplicate logic in the loop. Put the calculation outside the loop and only do it once.
Fix the unstable test TestZeroReplicaAvoidance by waiting.
This is a temporary resolution before we fix issue apache#526. Marked it in the TODO comment so easier for us to remove the wait in batch later.
We want to make WAGED rebalancer (weight-aware) easier to use. One way to do this is to allow the user to easily add resources with weight configuration set by providing REST endpoints. This change adds the relevant REST endpoints based on the HelixAdmin APIs added in (apache#570).

Basically, this commit uses existing REST endpoints whose hierarchy is defined by REST resource. What this commit does to the existing endpoints is 1) Add extra commands 2) Add a WAGED command as a QueryParam so that WAGED logic could be included.

This change is backward-compatible because it keeps the original behavior when no commands are provided by using @DefaultValue annotation.
jiajunwang and others added 14 commits December 5, 2019 16:42
The trim logic in the ResourceChangeSnapshot for cleaning up the IdealState should not clear the whole map. This will cause the WAGED rebalancer ignores changes such as new partitions into the partition list.
Modify the test case accordingly.
…ti-threads. (apache#636)

The previous design of RebalanceLatencyGauge won't support asynchronous metric data emitting. This PR adds support by using a ThreadLocal object.
The metric logic is not changed.
…pache#637)

This option will be used by the WAGED rebalancer to determine if the global rebalance should be performed asynchronously.
…ion. (apache#638)

The previous design is that both on-demand and periodic rebalance scheduling task will request for a cache refresh. This won't be always true moving forward.
For example, the WAGED rebalancer async baseline calculating requests for a scheduled rebalance. But cache refresh won't be necessary.
This PR does not change any business logic. It prepares for future feature change.
This PR ensures strict backward compatibility.
apache#552)

Deep copy for mapFields and listFields in ZNRecord's copy constructor.
Change list:
1. deep copy for mapFields and listFields in ZNRecord's copy constructor.
2. add unit test for the deep copy constructor.
Current Helix will charge the partitions with pending message twice due to not remove the partition from partitions need recovery/load balance set when they are charged for pending message.

This fix will fix the problem for recovery/load/ANY rebalance type charged for pending messages.
…tency (apache#576)

* Minor log optimization for easy debugging mastership handoff latency analysis

Make the log for visible and human-readable when any mastership handoff happens
Added a new constructor with a custom serializer as the parameter
default constructor with ZnRecordSerializer
added cross-validated unit tests to verify the custom/default serializer works correctly
Create workflowContext only if config exists.
An if statement is added in the getOrInitializeWorkflowContext function:
It checks whether workflowConfig is null or not.
If workflowConfig is null, this function will not initialize the context.
The tool leverages standalone ZK, Helix controller which run locally to simulate the rebalance process.
The tool also initializes mock participant threads for simulation state transitions.
After each operation, the tool will print the current cluster status and all state transitions happen after the previous print out.
@jiajunwang
Copy link
Contributor Author

This tool won't be required for now. I'm closing this PR and will leave the tool in my branch.

@jiajunwang jiajunwang closed this Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants