Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Topology with only required levels (FaultZone and EndNode) #2713

Conversation

zpinto
Copy link
Contributor

@zpinto zpinto commented Dec 14, 2023

Issues

  • Change all rebalancer strategies to create Topology without additional non-FaultZone or EndNode levels of the tree. This will allow for swap to work in clusters where the non-FaultZone or EndNode domain kv pairs don't directly match the swapping node. Helix Node/Instance Swap #2662

Description

  • When attempting to use Swap in clusters that had more complex topologies with levels that were not fault zone or end node, it was discovered that Swap assignment was being computed correctly. This is because the levels and all nodes in the Topology tree affect the assignment in CRUSH and CRUSHED. This changes the behavior of those rebalancer strategies to only create levels in the Topology tree for FaultZone and EndNode. This is the only thing required as the topology aware placement should only be done based on FaultZone.

Tests

  • Updated TestInstanceOperation to reproduce issues seen in testing on production like cluster. After topology changes, those tests pass.
  • Used current changes to deploy local controller for CRUSHED and WAGED clusters copied from production. Successfully carried out SWAP operation for both.

Changes that Break Backward Compatibility (Optional)

Changes to Topology class are not backwards compatible; however, the rebalancer will have different behavior and produce new assignment for existing clusters using more than a 2-layer topology. This constitutes bumping minor version and calling out in release notes in next open source release.

Documentation (Optional)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

…l non-FaultZone or EndNode levels of the tree. This will allow for swap to work in clusters where the non-FaultZone or EndNode domain kv pairs don't directly match the swapping node.
…ate same instance config. Changes to copy getAssignableLiveInstances keySet to avoid ConcurrentModificationException.
@zpinto zpinto changed the base branch from master to ApplicationClusterManager December 14, 2023 23:49
@xyuanlu xyuanlu merged commit d70a9cb into apache:ApplicationClusterManager Dec 14, 2023
2 of 3 checks passed
@xyuanlu
Copy link
Contributor

xyuanlu commented Dec 15, 2023

Reopen the PR and remerge since last force push missed PR #2712.

xyuanlu pushed a commit that referenced this pull request Dec 20, 2023
* Change all rebalancer strategies to create Topology without additional non-FaultZone or EndNode levels of the tree. This will allow for swap to work in clusters where the non-FaultZone or EndNode domain kv pairs don't directly match the swapping node.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants