You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sampler Aggregator is a single-bucket aggregator but if you try to use it as part of the order in a terms aggregation it fails. Below is a sense script to reproduce:
org.elasticsearch.transport.RemoteTransportException: [Stallior][inet[/192.168.0.7:9300]][indices:data/read/search[phase/query]]
Caused by: java.lang.ArrayStoreException
at java.lang.System.arraycopy(Native Method)
at org.elasticsearch.search.aggregations.support.AggregationPath.subPath(AggregationPath.java:191)
at org.elasticsearch.search.aggregations.support.AggregationPath.validate(AggregationPath.java:307)
at org.elasticsearch.search.aggregations.bucket.terms.InternalOrder.validate(InternalOrder.java:145)
at org.elasticsearch.search.aggregations.bucket.terms.InternalOrder.validate(InternalOrder.java:138)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:141)
at org.elasticsearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:39)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:75)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$2.create(TermsAggregatorFactory.java:70)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:223)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:57)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:95)
at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:69)
at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:77)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:96)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:296)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:307)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:422)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:1)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:340)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
but if you debug at org.elasticsearch.search.aggregations.support.AggregationPath.subPath(AggregationPath.java:191) you can see that the aggregator being tested is of type AggregatorFactory$1 and wraps the SamplerAggregator. This is created by the asMultiBucketAggregator(this, context, parent); call in SamplerAggregator$Factory.createInternal(...).
The reason SamplerAggregator has to be wrapped is that it's collectors do not take into account the parentBucketOrdinal.
We should update the SamplerAggregator (including the Diversity parts) to collect documents for each parentBucketOrdinal so that it doesn't need to be wrapped anymore and can be used in ordering like the other single-bucket aggregators
The text was updated successfully, but these errors were encountered:
@jpountz , can you confirm my assumption: the parent bucket IDs aggs are asked to collect on are compact and ascending (0,1,2,3...) or do I have to allow for very sparse values (7,10342,...)?
This dictates if I use a map or an array in my sampler collection and also if I in turn should rebase IDs of the buckets that survive the "best docs" selection process.
@markharwood Indeed they are fine to use as array indices. However I'm confused why you are mentioning "surviving" bucket as the sampler aggregator should not filter buckets? My assumption was that it would just compute a different sample on each bucket?
My assumption was that it would just compute a different sample on each bucket?
My bad. You are correct.
On a separate point - when replaying the deferred collection(s) I need to replay collects in docId order along with the choice of bucket ID. There may be more than one bucket per doc id. A convenient way of doing this which avoids extra object allocations is to take the ScoreDocs produced from each of the samples and sneak the bucketID into the "shardIndex" int value they hold and then sort them for replay. A bit hacky (casting long bucket ids to ints) but should be OK?
Sampler Aggregator is a single-bucket aggregator but if you try to use it as part of the order in a terms aggregation it fails. Below is a sense script to reproduce:
The search request throws an ArrayStoreException:
but if you debug at
org.elasticsearch.search.aggregations.support.AggregationPath.subPath(AggregationPath.java:191)
you can see that the aggregator being tested is of typeAggregatorFactory$1
and wraps theSamplerAggregator
. This is created by theasMultiBucketAggregator(this, context, parent);
call inSamplerAggregator$Factory.createInternal(...)
.The reason SamplerAggregator has to be wrapped is that it's collectors do not take into account the
parentBucketOrdinal
.We should update the SamplerAggregator (including the Diversity parts) to collect documents for each
parentBucketOrdinal
so that it doesn't need to be wrapped anymore and can be used in ordering like the other single-bucket aggregatorsThe text was updated successfully, but these errors were encountered: