Skip to content

Commit

Permalink
Docs: Update execution hint docs for Significant terms agg
Browse files Browse the repository at this point in the history
copied over the relevant pieces from the terms agg

Closes elastic#8532
  • Loading branch information
bleskes committed Nov 18, 2014
1 parent e736dd5 commit 053e13f
Showing 1 changed file with 25 additions and 7 deletions.
Expand Up @@ -474,12 +474,29 @@ http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNIX_LINES

===== Execution hint

There are two mechanisms by which terms aggregations can be executed: either by using field values directly in order to aggregate
data per-bucket (`map`), or by using ordinals of the field values instead of the values themselves (`ordinals`). Although the
latter execution mode can be expected to be slightly faster, it is only available for use when the underlying data source exposes
those terms ordinals. Moreover, it may actually be slower if most field values are unique. Elasticsearch tries to have sensible
defaults when it comes to the execution mode that should be used, but in case you know that an execution mode may perform better
than the other one, you have the ability to provide Elasticsearch with a hint:
added[1.2.0] Added the `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality` execution modes

deprecated[1.3.0] Removed the `ordinals` execution mode

There are different mechanisms by which terms aggregations can be executed:

- by using field values directly in order to aggregate data per-bucket (`map`)
- by using ordinals of the field and preemptively allocating one bucket per ordinal value (`global_ordinals`)
- by using ordinals of the field and dynamically allocating one bucket per ordinal value (`global_ordinals_hash`)

Elasticsearch tries to have sensible defaults so this is something that generally doesn't need to be configured.

`map` should only be considered when very few documents match a query. Otherwise the ordinals-based execution modes
are significantly faster. By default, `map` is only used when running an aggregation on scripts, since they don't have
ordinals.

`global_ordinals` is the second fastest option, but the fact that it preemptively allocates buckets can be memory-intensive,
especially if you have one or more sub aggregations. It is used by default on top-level terms aggregations.

`global_ordinals_hash` on the contrary to `global_ordinals` and `global_ordinals_low_cardinality` allocates buckets dynamically
so memory usage is linear to the number of values of the documents that are part of the aggregation scope. It is used by default
in inner aggregations.


[source,js]
--------------------------------------------------
Expand All @@ -495,6 +512,7 @@ than the other one, you have the ability to provide Elasticsearch with a hint:
}
--------------------------------------------------

<1> the possible values are `map` and `ordinals`
<1> the possible values are `map`, `global_ordinals` and `global_ordinals_hash`

Please note that Elasticsearch will ignore this execution hint if it is not applicable.

0 comments on commit 053e13f

Please sign in to comment.