New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delegation of nextReader calls #6477
Conversation
Currently aggregations subscribe to setNextReader calls in AggregationContext. When a new reader is used all reader aware objects are notified of the new reader. With deferred aggregations children aggregations of a breath-first aggregation do not require to be notified of reader changes until the replay stage. Also, at the replay stage only the child aggregations of the breath-first aggregation need to be notified, we should not be notifying all the other aggregations of the new reader. To solve this the calls for setNextReader are now handled by each aggregation so it can notify its child aggregations and any other ReaderContextAware objects (e.g. ValueSources) of the new reader at the relevant time. The same idea has also been applied to the setScorer and setTopReader calls for Aggregators and ValueSources.
aggregators.set(owningBucketOrdinal, aggregator); | ||
} | ||
aggregator.collect(doc, 0); | ||
} | ||
|
||
@Override | ||
public void setNextReader(AtomicReaderContext reader) { | ||
public void doSetNextReader(AtomicReaderContext reader) { | ||
for (int i = 0 ; i < aggregators.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use a foreach loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregators is an ObjectArray which does not implement Iterable, although maybe it should?
@jpountz any reason why it couldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a common theme across the codebase, we don't create iterators (explicitly/implicitly) if we don't need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I thought that is an array from line 66 hmmm
@colings86 I stopped half way through and I wonder given the number of added simple delegate method if we should add an interface that we can implement like public static class FilteredValueSource extends ValueSource {
private final SegmentAware delegate;
public FilteredValueSource(SegmentAware delegate) {
this.delegate = delegate;
}
public void setNextReader(IndexReaderContext ctx) {
delegate.setNextReader(ctx);
}
//....
} just to clean this up a bit and remove some of the boilerplate code? |
@colings86 did you benchmark this change? In the past, calling |
@uboness no I haven't, but I will do now |
@uboness your concerns are well founded. The performance of calling nextReader is about the same for single aggregations and sibling aggregations but deteriorates when aggregations are put on multiple levels. I'll have a quick look to see if there is any fixable bottleneck but if not then maybe the original problem could be solved by providing a different context for the replay stage of the deferred aggregations so that the nextReader is not called multiple times? |
+1 to a dedicated context if calling setNextReader from the tree of aggregators proves to be too slow |
@colings86 sounds like a plan, +1 on dedicated context |
Performance of nextReader calls delegated through the aggregation tree is going to be too slow. I will go with the using a different context for the replay stage of deferred aggregations. This is going to be easier to implement from a new PR off master so I will close this PR and work on a new one for that change. |
I was looking at how to integrate per-segment collection into the aggregations framework but the fact that |
Aggregators now return a new collector instance per segment, like Lucene 5 does with its oal.search.Collector API. This is important for us because things like knowing whether the field is single or multi-valued is only known at a segment level. In order to do that I had to change aggregators to notify their sub aggregators of new incoming segments (pretty much in the spirit of elastic#6477) while everything used to be centralized in the AggregationContext class. While this might slow down a bit deeply nested aggregation trees, this also makes the children aggregation and the `breadth_first` collection mode much better options since they can now only replay what they need while they used to have to replay the whole aggregation tree. I also took advantage of this big refactoring to remove some abstractions that were not really required like ValuesSource.MetaData or BucketAnalysisCollector. I also splitted Aggregator into Aggregator and AggregatorBase in order to separate the Aggregator API from implementation helpers.
Aggregators now return a new collector instance per segment, like Lucene 5 does with its oal.search.Collector API. This is important for us because things like knowing whether the field is single or multi-valued is only known at a segment level. In order to do that I had to change aggregators to notify their sub aggregators of new incoming segments (pretty much in the spirit of #6477) while everything used to be centralized in the AggregationContext class. While this might slow down a bit deeply nested aggregation trees, this also makes the children aggregation and the `breadth_first` collection mode much better options since they can now only replay what they need while they used to have to replay the whole aggregation tree. I also took advantage of this big refactoring to remove some abstractions that were not really required like ValuesSource.MetaData or BucketAnalysisCollector. I also splitted Aggregator into Aggregator and AggregatorBase in order to separate the Aggregator API from implementation helpers. Close #9544
Currently aggregations subscribe to setNextReader calls in AggregationContext. When a new reader is used all reader aware objects are notified of the new reader. With deferred aggregations children aggregations of a breath-first aggregation do not require to be notified of reader changes until the replay stage. Also, at the replay stage only the child aggregations of the breath-first aggregation need to be notified, we should not be notifying all the other aggregations of the new reader.
To solve this the calls for setNextReader are now handled by each aggregation so it can notify its child aggregations and any other ReaderContextAware objects (e.g. ValueSources) of the new reader at the relevant time.
The same idea has also been applied to the setScorer and setTopReader calls for Aggregators and ValueSources.