Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not sort histogram buckets on shards. #8797

Closed
wants to merge 3 commits into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 5, 2014

Histogram do not perform any selection of the buckets at the shard level so it
is useless to sort buckets there given that we are going to sort again on the
coordinating node once buckets with the same key have been merged.

Histogram do not perform any selection of the buckets on the shard level so it
is useless to sort buckets there given that we are going to sort again on the
coordinating node once buckets with the same key have been merged.
@rjernst
Copy link
Member

rjernst commented Dec 6, 2014

LGTM, although why can't the keys be merged with a merge sort on the coordinating node? I would just think asking for results from a single shard shouldn't break what the API promises (ie sorting)?

@jpountz
Copy link
Contributor Author

jpountz commented Dec 10, 2014

One issue is that the histogram aggregation not only allows to sort by key (for which a merge sort would work) but also by doc_count or sub aggregation, and in that case merge sorting does not work anymore. But I agree it would be cleaner to expect shards to return buckets sorted by key, merge sort on the coordinating node and finally only re-sort if the order is different from key/asc. Let me see what I can do...

…eue.

This commit makes histogram reduction a bit cleaner by expecting buckets
returned from shards to be sorted by key and merging them on-the-fly on the
coordinating node using a priority queue.
@jpountz
Copy link
Contributor Author

jpountz commented Dec 11, 2014

@rjernst I pushed a new commit that merge sorts on the coordinating node. This makes reduction simpler since histograms already needed buckets to be sorted by key in order to be able to eg. add empty buckets and probably a bit more memory-efficient too.

@@ -289,96 +292,144 @@ public B getBucketByKey(Number key) {
return FACTORY;
}

@Override
public InternalAggregation reduce(ReduceContext reduceContext) {
private static class IteratorAndCurrent<B> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate java doesn't have something like this :/

@rjernst
Copy link
Member

rjernst commented Dec 15, 2014

Thanks @jpountz. I added some more comments. It looks good to me in general.

@jpountz
Copy link
Contributor Author

jpountz commented Dec 16, 2014

@rjernst I tried working on your suggested changes but this code is a bit tricky (eg. the min and max bounds are optional so you can have min=null and max set). Since this code that you commented on was just moved (factored to a method mainly), would you mind if I push this change as-is and work on improving the handling of bounds on another PR?

@rjernst
Copy link
Member

rjernst commented Dec 16, 2014

would you mind if I push this change as-is and work on improving the handling of bounds on another PR

Sure, sounds great.

@jpountz jpountz closed this in 5694626 Dec 17, 2014
jpountz added a commit that referenced this pull request Dec 17, 2014
…eue.

This commit makes histogram reduction a bit cleaner by expecting buckets
returned from shards to be sorted by key and merging them on-the-fly on the
coordinating node using a priority queue.

Close #8797
@clintongormley clintongormley changed the title Aggregations: Do not sort histogram buckets on shards. Do not sort histogram buckets on shards. Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants