New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not sort histogram buckets on shards. #8797
Conversation
Histogram do not perform any selection of the buckets on the shard level so it is useless to sort buckets there given that we are going to sort again on the coordinating node once buckets with the same key have been merged.
LGTM, although why can't the keys be merged with a merge sort on the coordinating node? I would just think asking for results from a single shard shouldn't break what the API promises (ie sorting)? |
One issue is that the histogram aggregation not only allows to sort by key (for which a merge sort would work) but also by |
This reverts commit 5f26e7e.
…eue. This commit makes histogram reduction a bit cleaner by expecting buckets returned from shards to be sorted by key and merging them on-the-fly on the coordinating node using a priority queue.
@rjernst I pushed a new commit that merge sorts on the coordinating node. This makes reduction simpler since histograms already needed buckets to be sorted by key in order to be able to eg. add empty buckets and probably a bit more memory-efficient too. |
@@ -289,96 +292,144 @@ public B getBucketByKey(Number key) { | |||
return FACTORY; | |||
} | |||
|
|||
@Override | |||
public InternalAggregation reduce(ReduceContext reduceContext) { | |||
private static class IteratorAndCurrent<B> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate java doesn't have something like this :/
Thanks @jpountz. I added some more comments. It looks good to me in general. |
@rjernst I tried working on your suggested changes but this code is a bit tricky (eg. the min and max bounds are optional so you can have min=null and max set). Since this code that you commented on was just moved (factored to a method mainly), would you mind if I push this change as-is and work on improving the handling of bounds on another PR? |
Sure, sounds great. |
…eue. This commit makes histogram reduction a bit cleaner by expecting buckets returned from shards to be sorted by key and merging them on-the-fly on the coordinating node using a priority queue. Close #8797
Histogram do not perform any selection of the buckets at the shard level so it
is useless to sort buckets there given that we are going to sort again on the
coordinating node once buckets with the same key have been merged.