Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinator drop segment selection through cost balancer #5529

Merged
merged 6 commits into from
Apr 3, 2018

Conversation

clintropolis
Copy link
Member

Fixes issues that occur due to load rule segment drop does not use the segment balancer, one of the issues described in #5521.

Drop segment will now use the balancer to get an iterator of ServerHolder objects which can be consumed to get the best candidate servers to drop segments from.

@clintropolis clintropolis changed the title drop segment selection through cost balancer Coordinator drop segment selection through cost balancer Mar 24, 2018
Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. I'd like to comment one thing. I'm not sure it's easy to add a test or even it's possible, but it would be great if we can have a unit test to avoid the same issue in the future. @clintropolis what do you think?

@@ -31,5 +33,10 @@

BalancerSegmentHolder pickSegmentToMove(List<ServerHolder> serverHolders);

default Iterator<ServerHolder> pickServersToDrop(DataSegment toDropSegment, NavigableSet<ServerHolder> serverHolders)
{
return serverHolders.descendingIterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to include a comment about why this descendingIterator is meaningful. (Like the comment on the old code: // Use the reverse order to get the holders with least available size first.)

// Use the reverse order to get the holders with least available size first.
final Iterator<ServerHolder> iterator = holdersInTier.descendingIterator();
final NavigableSet<ServerHolder> isServingSubset =
holdersInTier.stream().filter(s -> s.isServingSegment(segment)).collect(Collectors.toCollection(TreeSet::new));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a (now pointless) check for isServingSegment below. It should be removed lest it confuse people.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the check in case in the time between the initial filtering and the loop something changed and the server was no longer serving the segment, but I can remove the extra check if you think it's safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a race between pickServersToDrop and this line, then there's still a race even with the check. I think it's ok to leave the check in, but maybe log a warning if it actually gets tripped. (We don't expect it to unless there is some race)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 added log.warn

@@ -31,5 +33,10 @@

BalancerSegmentHolder pickSegmentToMove(List<ServerHolder> serverHolders);

default Iterator<ServerHolder> pickServersToDrop(DataSegment toDropSegment, NavigableSet<ServerHolder> serverHolders)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have some javadoc explaining what the returned iterator should be. It's non-obvious: the "obvious" return value would be a set of servers to drop from, where the order doesn't matter. But the actual return value is ordered by preferredness, and may contain more servers than we actually want to drop from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added javadoc for whole interface, lmk if I got anything wrong


try {
List<Pair<Double, ServerHolder>> results = resultsFuture.get();
return results.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is slightly obtuse (what's the double? why reverse it?) and would benefit from a comment.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@clintropolis
Copy link
Member Author

The test that failed, CuratorDruidCoordinatorTest added in #5528, is hopefully is fixed by this commit in #5555 which increases the timeout.

@gianm gianm merged commit f31dba6 into apache:master Apr 3, 2018
jon-wei pushed a commit to implydata/druid-public that referenced this pull request Apr 4, 2018
* drop selection through cost balancer

* use collections.emptyIterator

* add test to ensure does not drop from server with larger loading queue with cost balancer

* javadocs and comments to clear things up

* random drop for completeness
gianm pushed a commit to implydata/druid-public that referenced this pull request May 16, 2018
* drop selection through cost balancer

* use collections.emptyIterator

* add test to ensure does not drop from server with larger loading queue with cost balancer

* javadocs and comments to clear things up

* random drop for completeness
@clintropolis clintropolis deleted the coordinator-drop-balance-fix branch July 4, 2018 00:08
@jihoonson jihoonson added this to the 0.12.2 milestone Jul 5, 2018
jihoonson pushed a commit to jihoonson/druid that referenced this pull request Jul 5, 2018
* drop selection through cost balancer

* use collections.emptyIterator

* add test to ensure does not drop from server with larger loading queue with cost balancer

* javadocs and comments to clear things up

* random drop for completeness
fjy pushed a commit that referenced this pull request Jul 6, 2018
* drop selection through cost balancer

* use collections.emptyIterator

* add test to ensure does not drop from server with larger loading queue with cost balancer

* javadocs and comments to clear things up

* random drop for completeness
leventov pushed a commit to metamx/druid that referenced this pull request Jul 20, 2018
apache#5948)

* drop selection through cost balancer

* use collections.emptyIterator

* add test to ensure does not drop from server with larger loading queue with cost balancer

* javadocs and comments to clear things up

* random drop for completeness
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants