Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegation of nextReader calls #6477

Closed
wants to merge 1 commit into from
Closed

Delegation of nextReader calls #6477

wants to merge 1 commit into from

Conversation

colings86
Copy link
Contributor

Currently aggregations subscribe to setNextReader calls in AggregationContext. When a new reader is used all reader aware objects are notified of the new reader. With deferred aggregations children aggregations of a breath-first aggregation do not require to be notified of reader changes until the replay stage. Also, at the replay stage only the child aggregations of the breath-first aggregation need to be notified, we should not be notifying all the other aggregations of the new reader.

To solve this the calls for setNextReader are now handled by each aggregation so it can notify its child aggregations and any other ReaderContextAware objects (e.g. ValueSources) of the new reader at the relevant time.

The same idea has also been applied to the setScorer and setTopReader calls for Aggregators and ValueSources.

Currently aggregations subscribe to setNextReader calls in AggregationContext.  When a new reader is used all reader aware objects are notified of the new reader.  With deferred aggregations children aggregations of a breath-first aggregation do not require to be notified of reader changes until the replay stage. Also, at the replay stage only the child aggregations of the breath-first aggregation need to be notified, we should not be notifying all the other aggregations of the new reader.

To solve this the calls for setNextReader are now handled by each aggregation so it can notify its child aggregations and any other ReaderContextAware objects (e.g. ValueSources) of the new reader at the relevant time.

The same idea has also been applied to the setScorer and setTopReader calls for Aggregators and ValueSources.
aggregators.set(owningBucketOrdinal, aggregator);
}
aggregator.collect(doc, 0);
}

@Override
public void setNextReader(AtomicReaderContext reader) {
public void doSetNextReader(AtomicReaderContext reader) {
for (int i = 0 ; i < aggregators.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use a foreach loop here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregators is an ObjectArray which does not implement Iterable, although maybe it should?

@jpountz any reason why it couldn't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a common theme across the codebase, we don't create iterators (explicitly/implicitly) if we don't need to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I thought that is an array from line 66 hmmm

@s1monw
Copy link
Contributor

s1monw commented Jun 12, 2014

@colings86 I stopped half way through and I wonder given the number of added simple delegate method if we should add an interface that we can implement like SegmentAware that has the two methods such that we can add util methods to iterate over arrays or list. Another example would be to have a

public static class FilteredValueSource extends ValueSource {
   private final SegmentAware delegate;

   public FilteredValueSource(SegmentAware delegate) {
      this.delegate = delegate;
   }

   public void setNextReader(IndexReaderContext ctx) {
      delegate.setNextReader(ctx);
   }

   //....
} 

just to clean this up a bit and remove some of the boilerplate code?

@s1monw s1monw removed the review label Jun 12, 2014
@uboness
Copy link
Contributor

uboness commented Jun 12, 2014

@colings86 did you benchmark this change? In the past, calling nextReader on every aggregator was a bottleneck... perhaps it's not anymore in breadth first mode, but in depth first it still might be. In any case, we need to benchmark this change and compare it to calling nexReader on the agg context.

@colings86
Copy link
Contributor Author

@uboness no I haven't, but I will do now

@colings86
Copy link
Contributor Author

@uboness your concerns are well founded. The performance of calling nextReader is about the same for single aggregations and sibling aggregations but deteriorates when aggregations are put on multiple levels. I'll have a quick look to see if there is any fixable bottleneck but if not then maybe the original problem could be solved by providing a different context for the replay stage of the deferred aggregations so that the nextReader is not called multiple times?

@jpountz
Copy link
Contributor

jpountz commented Jun 12, 2014

+1 to a dedicated context if calling setNextReader from the tree of aggregators proves to be too slow

@uboness
Copy link
Contributor

uboness commented Jun 12, 2014

@colings86 sounds like a plan, +1 on dedicated context

@colings86
Copy link
Contributor Author

Performance of nextReader calls delegated through the aggregation tree is going to be too slow. I will go with the using a different context for the replay stage of deferred aggregations. This is going to be easier to implement from a new PR off master so I will close this PR and work on a new one for that change.

@jpountz
Copy link
Contributor

jpountz commented Nov 18, 2014

I was looking at how to integrate per-segment collection into the aggregations framework but the fact that AggregationContext rules all calls to setNextReader makes it a nightmare, so it would be nice if we could somehow revamp this PR in a way that it still fast. Otherwise I have the feeling that it is going to cause more and more issues as we move forward.

jpountz added a commit to jpountz/elasticsearch that referenced this pull request Feb 4, 2015
Aggregators now return a new collector instance per segment, like Lucene 5 does
with its oal.search.Collector API. This is important for us because things like
knowing whether the field is single or multi-valued is only known at a segment
level.

In order to do that I had to change aggregators to notify their sub aggregators
of new incoming segments (pretty much in the spirit of elastic#6477) while everything
used to be centralized in the AggregationContext class. While this might slow
down a bit deeply nested aggregation trees, this also makes the children
aggregation and the `breadth_first` collection mode much better options since
they can now only replay what they need while they used to have to replay the
whole aggregation tree.

I also took advantage of this big refactoring to remove some abstractions that
were not really required like ValuesSource.MetaData or BucketAnalysisCollector.
I also splitted Aggregator into Aggregator and AggregatorBase in order to
separate the Aggregator API from implementation helpers.
jpountz added a commit that referenced this pull request Feb 12, 2015
Aggregators now return a new collector instance per segment, like Lucene 5 does
with its oal.search.Collector API. This is important for us because things like
knowing whether the field is single or multi-valued is only known at a segment
level.

In order to do that I had to change aggregators to notify their sub aggregators
of new incoming segments (pretty much in the spirit of #6477) while everything
used to be centralized in the AggregationContext class. While this might slow
down a bit deeply nested aggregation trees, this also makes the children
aggregation and the `breadth_first` collection mode much better options since
they can now only replay what they need while they used to have to replay the
whole aggregation tree.

I also took advantage of this big refactoring to remove some abstractions that
were not really required like ValuesSource.MetaData or BucketAnalysisCollector.
I also splitted Aggregator into Aggregator and AggregatorBase in order to
separate the Aggregator API from implementation helpers.

Close #9544
@clintongormley clintongormley changed the title Aggregations: Delegation of nextReader calls Delegation of nextReader calls Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants