Sampler aggregation #10221

markharwood · 2015-03-23T15:28:20Z

Used to limit any nested aggregations' processing to a sample of the top-scoring documents.

Optionally, a “diversify” setting can limit the number of collected matches that share a common value such as an "author".

The original "DeferringBucketCollector" is now abstracted with the bulk of the original code in new subclass BestBucketsDeferringCollector and the new alternative policy for deferring is implemented in the BestDocsDeferringCollector subclass.

The diversifying logic is reliant on Lucene 5.1 which has changes to support this specialized form of result collection.

Closes #8108

colings86 · 2015-03-23T15:52:33Z

src/main/java/org/elasticsearch/search/aggregations/bucket/BestBucketsDeferringCollector.java

+import java.util.ArrayList;
+import java.util.List;
+
+public class BestBucketsDeferringCollector extends DeferringBucketCollector {


Now that we have multiple implementations of DeferringBucketCollector, could we have a class-level Javadoc on the implementation to describe briefly what each one is trying to achieve?

colings86 · 2015-03-24T10:07:58Z

docs/reference/search/aggregations/bucket/sampler-aggregation.asciidoc

+[[search-aggregations-bucket-sampler-aggregation]]
+=== Sampler Aggregation
+
+A filtering aggregation used to limit any nested aggregations' processing to a sample of the top-scoring documents.


Can we say sub-aggregation here so we don't overload 'nested aggregation' which could cause confusion?

markharwood · 2015-04-14T11:16:57Z

@colings86 rebased on latest master if you get a chance to review

colings86 · 2015-04-14T12:20:34Z

docs/reference/search/aggregations/bucket/sampler-aggregation.asciidoc

@@ -0,0 +1,160 @@
+[[search-aggregations-bucket-sampler-aggregation]]
+=== Sampler Aggregation


Not sure if it already is, but can't see it if so: we should mark this feature as experimental in the docs

colings86 · 2015-04-15T14:57:23Z

@markharwood Left some comments

markharwood · 2015-04-15T17:52:13Z

@colings86 Thanks for the review. I added a couple of comments above on execution_hint test coverage and updated the code based on your other comments.

markharwood · 2015-04-16T11:29:34Z

@jpountz @clintongormley This PR allows users to do analytics on a sample where you can also choose to diversify results on the basis of a particular field (e.g. analyse top X tweets but no more than Y tweets from a single Twitter account on each shard).

The question is what is the least-worst thing to do on each shard given the unmapped problem ie the choice of diversifying field doesn't exist on one of the indexes/shards being queried:

Throw an error
Return no results (because we can't guarantee diversification)
Return top results but without applying any of the diversity constraints

…ns' processing to a sample of the top-scoring documents. Optionally, a “diversify” setting can limit the number of collected matches that share a common value such as an "author". Closes #8108

…SamplerAggregator, added nestedSamples test.

…for this condition.

markharwood · 2015-04-20T13:11:40Z

Took a decision with Colin on the 2 remaining questions:

execution_hint follows the precedent set in terms agg - never errors e.g. when used in relation to a numeric field in which case is ignored
Unmapped choices of diversifying field will return no results rather than a sample of undiversified results

markharwood · 2015-04-20T13:12:15Z

Poke @colings86

colings86 · 2015-04-20T15:32:53Z

LGTM

markharwood · 2015-04-21T09:41:08Z

Pushed to master 63db34f

markharwood added >feature v2.0.0-beta1 review labels Mar 23, 2015

colings86 reviewed Mar 23, 2015
View reviewed changes

s1monw assigned colings86 Mar 24, 2015

colings86 reviewed Mar 24, 2015
View reviewed changes

colings86 reviewed Apr 14, 2015
View reviewed changes

markharwood added 3 commits April 20, 2015 11:17

New feature - Sampler aggregation used to limit any nested aggregatio…

2dc62e9

…ns' processing to a sample of the top-scoring documents. Optionally, a “diversify” setting can limit the number of collected matches that share a common value such as an "author". Closes #8108

Small changes: DiversifiedSamplerAggregator renamed to DiversifiedMap…

33d10ba

…SamplerAggregator, added nestedSamples test.

Fixed issue with unmapped choices of diversifying field. Added tests …

e219732

…for this condition.

markharwood added :Analytics/Aggregations Aggregations and removed review labels Apr 21, 2015

markharwood closed this Apr 21, 2015

clintongormley mentioned this pull request May 25, 2015

Aggregations: new “Sampler” provides a filter for top-scoring docs #8191

Closed

clintongormley changed the title ~~New feature - Sampler aggregation~~ Sampler aggregation Jun 6, 2015

colings86 mentioned this pull request Aug 4, 2016

Should we remove/modify some of the experiment tags in the documentation #19798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampler aggregation #10221

Sampler aggregation #10221

markharwood commented Mar 23, 2015

colings86 Mar 23, 2015

colings86 Mar 24, 2015

markharwood Mar 25, 2015

markharwood commented Apr 14, 2015

colings86 Apr 14, 2015

colings86 commented Apr 15, 2015

markharwood commented Apr 15, 2015

markharwood commented Apr 16, 2015

markharwood commented Apr 20, 2015

markharwood commented Apr 20, 2015

colings86 commented Apr 20, 2015

markharwood commented Apr 21, 2015

		@@ -0,0 +1,160 @@
		[[search-aggregations-bucket-sampler-aggregation]]
		=== Sampler Aggregation

Sampler aggregation #10221

Sampler aggregation #10221

Conversation

markharwood commented Mar 23, 2015

colings86 Mar 23, 2015

Choose a reason for hiding this comment

colings86 Mar 24, 2015

Choose a reason for hiding this comment

markharwood Mar 25, 2015

Choose a reason for hiding this comment

markharwood commented Apr 14, 2015

colings86 Apr 14, 2015

Choose a reason for hiding this comment

colings86 commented Apr 15, 2015

markharwood commented Apr 15, 2015

markharwood commented Apr 16, 2015

markharwood commented Apr 20, 2015

markharwood commented Apr 20, 2015

colings86 commented Apr 20, 2015

markharwood commented Apr 21, 2015