Delegation of nextReader calls #6477

colings86 · 2014-06-12T10:08:45Z

Currently aggregations subscribe to setNextReader calls in AggregationContext. When a new reader is used all reader aware objects are notified of the new reader. With deferred aggregations children aggregations of a breath-first aggregation do not require to be notified of reader changes until the replay stage. Also, at the replay stage only the child aggregations of the breath-first aggregation need to be notified, we should not be notifying all the other aggregations of the new reader.

To solve this the calls for setNextReader are now handled by each aggregation so it can notify its child aggregations and any other ReaderContextAware objects (e.g. ValueSources) of the new reader at the relevant time.

The same idea has also been applied to the setScorer and setTopReader calls for Aggregators and ValueSources.

Currently aggregations subscribe to setNextReader calls in AggregationContext. When a new reader is used all reader aware objects are notified of the new reader. With deferred aggregations children aggregations of a breath-first aggregation do not require to be notified of reader changes until the replay stage. Also, at the replay stage only the child aggregations of the breath-first aggregation need to be notified, we should not be notifying all the other aggregations of the new reader. To solve this the calls for setNextReader are now handled by each aggregation so it can notify its child aggregations and any other ReaderContextAware objects (e.g. ValueSources) of the new reader at the relevant time. The same idea has also been applied to the setScorer and setTopReader calls for Aggregators and ValueSources.

s1monw · 2014-06-12T12:44:37Z

src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java

                        aggregators.set(owningBucketOrdinal, aggregator);
                    }
                    aggregator.collect(doc, 0);
                }

                @Override
-                public void setNextReader(AtomicReaderContext reader) {
+                public void doSetNextReader(AtomicReaderContext reader) {
+                    for (int i = 0 ; i < aggregators.size(); i++) {


maybe use a foreach loop here?

aggregators is an ObjectArray which does not implement Iterable, although maybe it should?

@jpountz any reason why it couldn't?

It's a common theme across the codebase, we don't create iterators (explicitly/implicitly) if we don't need to

oh I thought that is an array from line 66 hmmm

s1monw · 2014-06-12T12:56:52Z

@colings86 I stopped half way through and I wonder given the number of added simple delegate method if we should add an interface that we can implement like SegmentAware that has the two methods such that we can add util methods to iterate over arrays or list. Another example would be to have a

public static class FilteredValueSource extends ValueSource {
   private final SegmentAware delegate;

   public FilteredValueSource(SegmentAware delegate) {
      this.delegate = delegate;
   }

   public void setNextReader(IndexReaderContext ctx) {
      delegate.setNextReader(ctx);
   }

   //....
}

just to clean this up a bit and remove some of the boilerplate code?

uboness · 2014-06-12T13:45:21Z

@colings86 did you benchmark this change? In the past, calling nextReader on every aggregator was a bottleneck... perhaps it's not anymore in breadth first mode, but in depth first it still might be. In any case, we need to benchmark this change and compare it to calling nexReader on the agg context.

colings86 · 2014-06-12T13:47:00Z

@uboness no I haven't, but I will do now

colings86 · 2014-06-12T20:21:05Z

@uboness your concerns are well founded. The performance of calling nextReader is about the same for single aggregations and sibling aggregations but deteriorates when aggregations are put on multiple levels. I'll have a quick look to see if there is any fixable bottleneck but if not then maybe the original problem could be solved by providing a different context for the replay stage of the deferred aggregations so that the nextReader is not called multiple times?

jpountz · 2014-06-12T20:54:27Z

+1 to a dedicated context if calling setNextReader from the tree of aggregators proves to be too slow

uboness · 2014-06-12T21:35:14Z

@colings86 sounds like a plan, +1 on dedicated context

colings86 · 2014-06-13T10:10:29Z

Performance of nextReader calls delegated through the aggregation tree is going to be too slow. I will go with the using a different context for the replay stage of deferred aggregations. This is going to be easier to implement from a new PR off master so I will close this PR and work on a new one for that change.

jpountz · 2014-11-18T22:20:59Z

I was looking at how to integrate per-segment collection into the aggregations framework but the fact that AggregationContext rules all calls to setNextReader makes it a nightmare, so it would be nice if we could somehow revamp this PR in a way that it still fast. Otherwise I have the feeling that it is going to cause more and more issues as we move forward.

Aggregators now return a new collector instance per segment, like Lucene 5 does with its oal.search.Collector API. This is important for us because things like knowing whether the field is single or multi-valued is only known at a segment level. In order to do that I had to change aggregators to notify their sub aggregators of new incoming segments (pretty much in the spirit of elastic#6477) while everything used to be centralized in the AggregationContext class. While this might slow down a bit deeply nested aggregation trees, this also makes the children aggregation and the `breadth_first` collection mode much better options since they can now only replay what they need while they used to have to replay the whole aggregation tree. I also took advantage of this big refactoring to remove some abstractions that were not really required like ValuesSource.MetaData or BucketAnalysisCollector. I also splitted Aggregator into Aggregator and AggregatorBase in order to separate the Aggregator API from implementation helpers.

Aggregators now return a new collector instance per segment, like Lucene 5 does with its oal.search.Collector API. This is important for us because things like knowing whether the field is single or multi-valued is only known at a segment level. In order to do that I had to change aggregators to notify their sub aggregators of new incoming segments (pretty much in the spirit of #6477) while everything used to be centralized in the AggregationContext class. While this might slow down a bit deeply nested aggregation trees, this also makes the children aggregation and the `breadth_first` collection mode much better options since they can now only replay what they need while they used to have to replay the whole aggregation tree. I also took advantage of this big refactoring to remove some abstractions that were not really required like ValuesSource.MetaData or BucketAnalysisCollector. I also splitted Aggregator into Aggregator and AggregatorBase in order to separate the Aggregator API from implementation helpers. Close #9544

colings86 added review labels Jun 12, 2014

s1monw reviewed Jun 12, 2014
View reviewed changes

s1monw removed the review label Jun 12, 2014

colings86 closed this Jun 13, 2014

clintongormley added the enhancement label Jul 16, 2014

colings86 self-assigned this Aug 21, 2014

jpountz mentioned this pull request Dec 30, 2014

Aggregations: delegation of setNextReader calls #9098

Closed

jpountz mentioned this pull request Feb 3, 2015

Refactor aggregations to use lucene5-style collectors. #9544

Closed

clintongormley added the :Analytics/Aggregations Aggregations label Jun 7, 2015

clintongormley changed the title ~~Aggregations: Delegation of nextReader calls~~ Delegation of nextReader calls Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delegation of nextReader calls #6477

Delegation of nextReader calls #6477

colings86 commented Jun 12, 2014

s1monw Jun 12, 2014

colings86 Jun 12, 2014

uboness Jun 12, 2014

s1monw Jun 12, 2014

s1monw commented Jun 12, 2014

uboness commented Jun 12, 2014

colings86 commented Jun 12, 2014

colings86 commented Jun 12, 2014

jpountz commented Jun 12, 2014

uboness commented Jun 12, 2014

colings86 commented Jun 13, 2014

jpountz commented Nov 18, 2014

Delegation of nextReader calls #6477

Delegation of nextReader calls #6477

Conversation

colings86 commented Jun 12, 2014

s1monw Jun 12, 2014

Choose a reason for hiding this comment

colings86 Jun 12, 2014

Choose a reason for hiding this comment

uboness Jun 12, 2014

Choose a reason for hiding this comment

s1monw Jun 12, 2014

Choose a reason for hiding this comment

s1monw commented Jun 12, 2014

uboness commented Jun 12, 2014

colings86 commented Jun 12, 2014

colings86 commented Jun 12, 2014

jpountz commented Jun 12, 2014

uboness commented Jun 12, 2014

colings86 commented Jun 13, 2014

jpountz commented Nov 18, 2014