Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RoundRobinServerSelector to speed up segment assignments #13367

Merged
merged 5 commits into from
Nov 16, 2022

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Nov 15, 2022

Description

Segment assignments can take very long due to the strategy cost computation for a large number of segments.

This PR addresses the issue by making the segment assignments round-robin within a tier.
Only segment balancing takes cost-based decisions to move segments around.

Changes

  • Add dynamic config useRoundRobinSegmentAssignment with default value false
  • Add RoundRobinServerSelector. This does not implement the BalancerStrategy
    as it does not conform to that contract and may also be used in conjunction with a
    strategy (round-robin for RunRules and strategy for BalanceSegments)
  • Parameterize LoadRuleTest to test segment loading using both regular balancer strategy
    and round-robin

Changes not in this PR

  • Drops are still cost-based even when round-robin assignment is enabled.

Web-console change

round_robin

Release note

Add a round-robin segment strategy to speed up initial segment assignments.

Set useRoundRobinSegmentAssigment to true in the coordinator dynamic config to enable this feature.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@rohangarg
Copy link
Member

Is there a specific reason to not try using RandomBalancerStrategy for load rules?

@kfaraz
Copy link
Contributor Author

kfaraz commented Nov 15, 2022

@rohangarg , RandomBalancerStrategy might not help get a tier with uniform-ish distribution (atleast in terms of number of segments).

This is because even within a tier, the list of eligible servers given as input to the RandomBalancerStrategy would be different in every call. The list of eligible servers for a segment depends on whether the segment is being served or loaded by a server or not. Thus the random integer might not give a uniform distribution since the output integer range, as well as the server at each index changes in each call. It might be even more non-uniform with multiple datasources having different number of replicas in a tier.

I have not validated this behaviour though, it might be interesting to see if we get a similar behaviour with random strategy as we would with round-robin. Let me know what you think.

@rohangarg
Copy link
Member

I think with respect to uniform distribution, even the round-robin allocation might create holes in servers since it also skips them incase a segment is already scheduled on the server.
Although, I've also not compared the behavior of the two strategies. My question was more related to that itself - that given there is RandomBalancerStrategy which has been there for sometime, is there some specific reason to not pick it?
Also, I think that if RoundRobin strategy is considered superior to Random principally and practically, we should probably implement that as a BalancerStrategy and recommend it over random too.

@kfaraz
Copy link
Contributor Author

kfaraz commented Nov 15, 2022

Yeah, I thought of making round-robin a balancer strategy but then decided against it as balancing is the one thing it would not be used for. Balancing with round-robin would again cause issues similar to the ones I believe random would face during assignments.
Also, the contract for the two are fairly different. We could make them conform to each other, I guess.

Also, I think that if RoundRobin strategy is considered superior to Random principally and practically, we should probably implement that as a BalancerStrategy and recommend it over random too.

I don't think RandomBalancerStrategy is ever recommended for a real use case. It is really a bare bones impl mostly for testing purposes and is effectively no-op as it always returns null for balancing, meaning "segment is already optimally placed".

even the round-robin allocation might create holes in servers since it also skips them incase a segment is already scheduled on the server.

I agree, round-robin would also create holes but these would be filled in the next round itself. After any number of rounds, this would be expected to have a near uniform distribution.

random would do it too, if we always passed the complete list of servers to the strategy (rather than only eligible servers) and kept generating random numbers until we found an eligible server. The problem is primarily the range of the generated number, which keeps changing, thus defeating the pseudo-randomness. But given enough rounds, this too might attain a uniform distribution.

But it would be best to validate this hypothesis 🙂. I will write out some simulation tests and share the results here.

@rohangarg
Copy link
Member

Yeah, I thought of making round-robin a balancer strategy but then decided against it as balancing is the one thing it would not be used for. Balancing with round-robin would again cause issues similar to the ones I believe random would face during assignments.

I'm not sure how would we balance in round-robin strategy out of the box without a cost function. I thought it would be like random strategy where the balancing is a no-op logic. Only the assignment APIs of BalancerStrategy are implemented using findNewSegmentHomeReplicator method.

@kfaraz
Copy link
Contributor Author

kfaraz commented Nov 16, 2022

@rohangarg , I ran some simulation tests with 10 historicals of equal disk capacity.
The round-robin does seem to perform better than random strategy in the following cases, as seen by the distribution of number of segments on the different historicals.

1. datasource "wiki" with 1k segments, 2 replicas
random:      [186, 192, 192, 195, 199, 202, 207, 207, 210, 210] (max error 7%)
round-robin: [200, 200, 200, 200, 200, 200, 200, 200, 200, 200]

2. datasource "koala" with 10k segments, 2 replicas
random:       [965, 969, 989, 992, 995, 998, 1000, 1020, 1023, 1049] (max error 5%)
round-robin:  [1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000]

3. both "koala" (1 replica) and "wiki" (3 replicas) published together
random:       [1274, 1278, 1292, 1294, 1301, 1302, 1309, 1310, 1317, 1323] (max error 3%)
round-robin:  [1300, 1300, 1300, 1300, 1300, 1300, 1300, 1300, 1300, 1300]

I have also included the RoundRobinAssigmentTest in this PR.
To get the above results, you can re-run the tests in that class with round-robin disabled.

Even with random, the error rate reduces with a larger number of segments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants