Deferred aggregations prevent combinatorial explosion #6128

markharwood · 2014-05-12T13:28:03Z

New BucketCollector classes to aid the recording and subsequent playback of "collect" streams in aggs to reduce combinatorial explosions where pruning of parent buckets should occur before calculating child aggs.
Aggregator base class now wraps the subAgg BucketCollectors with any required caching of collect streams for sub aggregations that are indicated as being deferred. Aggregator subclasses should now override shouldDefer to indicate any aggs that are expensive to compute and in the BuildAggregation call should subsequently call runDeferredCollections with the subset of bucket ordinals that represent the pruned parent buckets of interest.

jpountz · 2014-05-20T07:47:43Z

src/main/java/org/elasticsearch/search/aggregations/FilteringBucketCollector.java

+
+    @Override
+    public final void collect(int docId, long bucketOrdinal) throws IOException {
+        int pos = Arrays.binarySearch(sortedOrds, bucketOrdinal);


Should it be a hash table instead to make the access constant-time? I think it wouldn't matter with the default size of 10 but maybe it would it the user sets shard_size to eg. 1000?

We already make a split in choice of collector impl for the case where the number of buckets is 1 or >1 so maybe there could be another break-point where we choose between hash table and sorted array?

jpountz · 2014-05-30T10:58:39Z

I left a few comments, but I like the new per-segment buffering of documents/buckets. I also think we should remove FilteringSingleBucketCollector, I don't like having a specialization for something that would be used so rarely: it is only used when shard_size is 1.

jpountz · 2014-05-30T21:19:23Z

I quickly looked at the last changes and they look good! Before we pull that in, I think we should make sure users would get a meaningful error if they try to use scores while replaying doc IDs and to take another look at the formatting (some missing spaces around operators/brackets and lines with trailing spaces).

…prune_first strategies

…ow for more compact data structures downstream where heavy pruning reduces the numbers of buckets under consideration

…asses

Added 'Deferred Aggregation' to the TermsAggregationSearchBenchmark and created a new benchmark for testing nested aggregations with different combinations of collect mode at each level

jpountz · 2014-06-03T08:18:09Z

src/main/java/org/elasticsearch/search/aggregations/Aggregator.java

+    // A scorer used for the deferred collection mode to handle any child aggs asking for scores that are not 
+    // recorded.
+    static final Scorer unavailableScorer=new Scorer(null){
+        private final String MSG="A limitation of the "+SubAggCollectionMode.DEPTH_FIRST.parseField.getPreferredName()+


s/DEPTH_FIRST/BREADTH_FIRST/ ?

…on. Formatting

markharwood · 2014-06-04T06:10:03Z

Another "deferred" use case to consider? https://groups.google.com/forum/#!topic/elasticsearch/CtDhs0HDK2Q

jpountz · 2014-06-04T07:49:44Z

I don't think it could help: building buckets based on counts is not practical as you would need the global counts to make a decision while a shard would only have shard-local knowledge.

jpountz · 2014-06-04T07:59:40Z

src/main/java/org/elasticsearch/search/aggregations/Aggregator.java

+    // A scorer used for the deferred collection mode to handle any child aggs asking for scores that are not 
+    // recorded.
+    static final Scorer unavailableScorer=new Scorer(null){
+        private final String MSG="A limitation of the "+SubAggCollectionMode.BREADTH_FIRST.parseField.getPreferredName()+


Can you add spaces around '=' and '+'?

jpountz · 2014-06-04T08:44:54Z

LGTM, I just left comments about formatting. Can you fix these before pushing?

… - sorry!)

jpountz · 2014-06-06T15:14:20Z

LGTM

…regator class to support a new mode of deferred collection. A new "breadth_first" results collection mode allows upper branches of aggregation tree to be calculated and then pruned to a smaller selection before advancing into executing collection on child branches. Closes #6128

jpountz added v1.2.0 and removed v1.2.0 labels May 12, 2014

jpountz reviewed May 20, 2014
View reviewed changes

markharwood and others added 5 commits June 2, 2014 11:25

Added changes to core Aggregator class to support deferred collections

262d939

Added support for collectMode in TermsBuilder to include support for …

16af22a

…prune_first strategies

Optimisation to use rebased ordinals for deferred aggregations to all…

1253e59

…ow for more compact data structures downstream where heavy pruning reduces the numbers of buckets under consideration

Added random sub aggregation colection method to all relevent test cl…

355e8e6

…asses

Updated Aggregation Benchmark tests for collect mode

d9c3148

Added 'Deferred Aggregation' to the TermsAggregationSearchBenchmark and created a new benchmark for testing nested aggregations with different combinations of collect mode at each level

jpountz reviewed Jun 3, 2014
View reviewed changes

markharwood added 3 commits June 3, 2014 09:28

Docs/err msg tidy up from @jpountz review

fee87e2

Added ParseField for consistency of parsing and error messages

bcc9026

Renamed Aggregator.initialize to preCollection to mirror postCollecti…

7a584ff

…on. Formatting

jpountz reviewed Jun 4, 2014
View reviewed changes

Spaces (thx for filling in for my substandard IDE’s formatting @jpountz…

3def3de

… - sorry!)

markharwood closed this in 724129e Jun 6, 2014

s1monw removed the review label Jun 18, 2014

colings86 mentioned this pull request Jul 1, 2014

Extended_bound does not return empty buckets #6652

Closed

s1monw added feature labels Jul 9, 2014

clintongormley changed the title ~~Deferred aggregation~~ Aggregations: Deferred aggregations prevent combinatorial explosion Jul 16, 2014

clintongormley added the :Analytics/Aggregations Aggregations label Jun 6, 2015

clintongormley changed the title ~~Aggregations: Deferred aggregations prevent combinatorial explosion~~ Deferred aggregations prevent combinatorial explosion Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deferred aggregations prevent combinatorial explosion #6128

Deferred aggregations prevent combinatorial explosion #6128

Uh oh!

markharwood commented May 12, 2014

Uh oh!

jpountz May 20, 2014

Uh oh!

markharwood May 20, 2014

Uh oh!

jpountz commented May 30, 2014

Uh oh!

jpountz commented May 30, 2014

Uh oh!

jpountz Jun 3, 2014

Uh oh!

markharwood commented Jun 4, 2014

Uh oh!

jpountz commented Jun 4, 2014

Uh oh!

jpountz Jun 4, 2014

Uh oh!

jpountz commented Jun 4, 2014

Uh oh!

jpountz commented Jun 6, 2014

Uh oh!

Uh oh!

Deferred aggregations prevent combinatorial explosion #6128

Deferred aggregations prevent combinatorial explosion #6128

Uh oh!

Conversation

markharwood commented May 12, 2014

Uh oh!

jpountz May 20, 2014

Choose a reason for hiding this comment

Uh oh!

markharwood May 20, 2014

Choose a reason for hiding this comment

Uh oh!

jpountz commented May 30, 2014

Uh oh!

jpountz commented May 30, 2014

Uh oh!

jpountz Jun 3, 2014

Choose a reason for hiding this comment

Uh oh!

markharwood commented Jun 4, 2014

Uh oh!

jpountz commented Jun 4, 2014

Uh oh!

jpountz Jun 4, 2014

Choose a reason for hiding this comment

Uh oh!

jpountz commented Jun 4, 2014

Uh oh!

jpountz commented Jun 6, 2014

Uh oh!

Uh oh!