Sampler Aggregator cannot be used in terms agg order #10719

colings86 · 2015-04-22T10:34:37Z

Sampler Aggregator is a single-bucket aggregator but if you try to use it as part of the order in a terms aggregation it fails. Below is a sense script to reproduce:

POST test/doc/1
{"color":"YELLOW","date":1500000009,"weight":105}

POST test/doc/2
{"color":"YELLOW","date":1500000008,"weight":104}

POST test/doc/3
{"color":"YELLOW","date":1500000007,"weight":103}

POST test/doc/11
{"color":"RED","date":1500000009,"weight":205}

GET test/doc/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "date": {
        "order": "desc"
      }
    }
  ],
  "aggregations": {
    "distinctColors": {
      "terms": {
        "field": "color",
        "size": 1,
        "order": {
          "sample>max_weight.value": "asc"
        }
      },
      "aggregations": {
        "sample": {
          "sampler": {
            "shard_size": 1
          },
          "aggs": {
            "max_weight": {
              "max": {
                "field": "weight"
              }
            }
          }
        }
      }
    }
  }
}

The search request throws an ArrayStoreException:

org.elasticsearch.transport.RemoteTransportException: [Stallior][inet[/192.168.0.7:9300]][indices:data/read/search[phase/query]]
Caused by: java.lang.ArrayStoreException
    at java.lang.System.arraycopy(Native Method)
    at org.elasticsearch.search.aggregations.support.AggregationPath.subPath(AggregationPath.java:191)
    at org.elasticsearch.search.aggregations.support.AggregationPath.validate(AggregationPath.java:307)
    at org.elasticsearch.search.aggregations.bucket.terms.InternalOrder.validate(InternalOrder.java:145)
    at org.elasticsearch.search.aggregations.bucket.terms.InternalOrder.validate(InternalOrder.java:138)
    at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:141)
    at org.elasticsearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:39)
    at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:75)
    at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$2.create(TermsAggregatorFactory.java:70)
    at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:223)
    at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:57)
    at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:95)
    at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:69)
    at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:77)
    at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:96)
    at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:296)
    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:307)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:422)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:1)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:340)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

but if you debug at org.elasticsearch.search.aggregations.support.AggregationPath.subPath(AggregationPath.java:191) you can see that the aggregator being tested is of type AggregatorFactory$1 and wraps the SamplerAggregator. This is created by the asMultiBucketAggregator(this, context, parent); call in SamplerAggregator$Factory.createInternal(...).

The reason SamplerAggregator has to be wrapped is that it's collectors do not take into account the parentBucketOrdinal.

We should update the SamplerAggregator (including the Diversity parts) to collect documents for each parentBucketOrdinal so that it doesn't need to be wrapped anymore and can be used in ordering like the other single-bucket aggregators

The text was updated successfully, but these errors were encountered:

markharwood · 2015-04-23T12:15:14Z

@jpountz , can you confirm my assumption: the parent bucket IDs aggs are asked to collect on are compact and ascending (0,1,2,3...) or do I have to allow for very sparse values (7,10342,...)?
This dictates if I use a map or an array in my sampler collection and also if I in turn should rebase IDs of the buckets that survive the "best docs" selection process.

jpountz · 2015-04-23T12:33:56Z

@markharwood Indeed they are fine to use as array indices. However I'm confused why you are mentioning "surviving" bucket as the sampler aggregator should not filter buckets? My assumption was that it would just compute a different sample on each bucket?

markharwood · 2015-04-23T12:45:38Z

My assumption was that it would just compute a different sample on each bucket?

My bad. You are correct.
On a separate point - when replaying the deferred collection(s) I need to replay collects in docId order along with the choice of bucket ID. There may be more than one bucket per doc id. A convenient way of doing this which avoids extra object allocations is to take the ScoreDocs produced from each of the samples and sneak the bucketID into the "shardIndex" int value they hold and then sort them for replay. A bit hacky (casting long bucket ids to ints) but should be OK?

jpountz · 2015-04-23T12:51:45Z

This hack sounds ok to me, if you use more than Integer.MAX_VALUE buckets to collect such an aggregator, you will have other issues anyway.

colings86 added >bug v2.0.0-beta1 :Analytics/Aggregations Aggregations labels Apr 22, 2015

colings86 assigned markharwood Apr 22, 2015

markharwood mentioned this issue Apr 24, 2015

Sampler agg could not be used with Terms agg’s order. #10785

Closed

markharwood closed this as completed in 8c3500a May 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampler Aggregator cannot be used in terms agg order #10719

Sampler Aggregator cannot be used in terms agg order #10719

colings86 commented Apr 22, 2015

markharwood commented Apr 23, 2015

jpountz commented Apr 23, 2015

markharwood commented Apr 23, 2015

jpountz commented Apr 23, 2015

Sampler Aggregator cannot be used in terms agg order #10719

Sampler Aggregator cannot be used in terms agg order #10719

Comments

colings86 commented Apr 22, 2015

markharwood commented Apr 23, 2015

jpountz commented Apr 23, 2015

markharwood commented Apr 23, 2015

jpountz commented Apr 23, 2015