Skip to content

Resource metric for the rebalance state could be overwritten unexpectedly #1358

@jiajunwang

Description

@jiajunwang

Describe the bug

The rebalance state metric for each resource is the way Helix provides to identify if a resource is being rebalanced normally. We identify an issue that may lead this metric to report wrong information.
In detail, even though the bestpossiblestate calculation fails, the state might remain to be NORMAL.

To Reproduce

  1. Use DelayedRebalance since it may not throw an exception even the rebalance is not possible.
  2. Try to rebalance a resource with no available instances.
  3. Check the ResourceMonitor attribute RebalanceState, it would be showing BEST_POSSIBLE_CAL_FAILED and then be overwritten to NORMAL by the later pipeline stage.

Expected behavior

The BEST_POSSIBLE_CAL_FAILED state should remain in the ResourceMonitor until enough instances appear and the resource is normally rebalanced.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions