Skip to content

Commit

Permalink
Documented the query cache module
Browse files Browse the repository at this point in the history
Related to #7161 and #7167
  • Loading branch information
clintongormley authored and areek committed Sep 8, 2014
1 parent 5710c22 commit c38252d
Show file tree
Hide file tree
Showing 6 changed files with 233 additions and 51 deletions.
2 changes: 2 additions & 0 deletions docs/reference/index-modules.asciidoc
Expand Up @@ -72,6 +72,8 @@ include::index-modules/translog.asciidoc[]

include::index-modules/cache.asciidoc[]

include::index-modules/query-cache.asciidoc[]

include::index-modules/fielddata.asciidoc[]

include::index-modules/codec.asciidoc[]
Expand Down
145 changes: 145 additions & 0 deletions docs/reference/index-modules/query-cache.asciidoc
@@ -0,0 +1,145 @@
[[index-modules-shard-query-cache]]
== Shard query cache

coming[1.4.0]

When a search request is run against an index or against many indices, each
involved shard executes the search locally and returns its local results to
the _coordinating node_, which combines these shard-level results into a
``global'' result set.

The shard-level query cache module caches the local results on each shard.
This allows frequently used (and potentially heavy) search requests to return
results almost instantly. The query cache is a very good fit for the logging
use case, where only the most recent index is being actively updated --
results from older indices will be served directly from the cache.

[IMPORTANT]
==================================
For now, the query cache will only only cache the results of search requests
where <<count,`?search_type=count`>>, so it will not cache `hits`,
but it will cache `hits.total`, <<search-aggregations,aggregations>>, and
<<search-suggesters,suggestions>>.
Queries that use `now` (see <<date-math>>) cannot be cached.
==================================

[float]
=== Cache invalidation

The cache is smart -- it keeps the same _near real-time_ promise as uncached
search.

Cached results are invalidated automatically whenever the shard refreshes, but
only if the data in the shard has actually changed. In other words, you will
always get the same results from the cache as you would for an uncached search
request.

The longer the refresh interval, the longer that cached entries will remain
valid. If the cache is full, the least recently used cache keys will be
evicted.

The cache can be expired manually with the <<indices-clearcache,`clear-cache` API>>:

[source,json]
------------------------
curl -XPOST 'localhost:9200/kimchy,elasticsearch/_cache/clear?query_cache=true'
------------------------

[float]
=== Enabling caching by default

The cache is not enabled by default, but can be enabled when creating a new
index as follows:

[source,json]
-----------------------------
curl -XPUT localhost:9200/my_index -d'
{
"settings": {
"index.cache.query.enable": true
}
}
'
-----------------------------

It can also be enabled or disabled dynamically on an existing index with the
<<indices-update-settings,`update-settings`>> API:

[source,json]
-----------------------------
curl -XPUT localhost:9200/my_index/_settings -d'
{ "index.cache.query.enable": true }
'
-----------------------------

[float]
=== Enabling caching per request

The `query_cache` query-string parameter can be used to enable or disable
caching on a *per-query* basis. If set, it overrides the index-level setting:

[source,json]
-----------------------------
curl localhost:9200/my_index/_search?search_type=count&query_cache=true -d'
{
"aggs": {
"popular_colors": {
"terms": {
"field": "colors"
}
}
}
}
'
-----------------------------

IMPORTANT: If your query uses a script whose result is not deterministic (e.g.
it uses a random function or references the current time) you should set the
`query_cache` flag to `false` to disable caching for that request.

[float]
=== Cache key

The whole JSON body is used as the cache key. This means that if the JSON
changes -- for instance if keys are output in a different order -- then the
cache key will not be recognised.

TIP: Most JSON libraries support a _canonical_ mode which ensures that JSON
keys are always emitted in the same order. This canonical mode can be used in
the application to ensure that a request is always serialized in the same way.

[float]
=== Cache settings

The cache is managed at the node level, and has a default maximum size of `1%`
of the heap. This can be changed in the `config/elasticsearch.yml` file with:

[source,yaml]
--------------------------------
indices.cache.query.size: 2%
--------------------------------

Also, you can use the +indices.cache.query.expire+ setting to specify a TTL
for cached results, but there should be no reason to do so. Remember that
stale results are automatically invalidated when the index is refreshed. This
setting is provided for completeness' sake only.

[float]
=== Monitoring cache usage

The size of the cache (in bytes) and the number of evictions can be viewed
by index, with the <<indices-stats,`indices-stats`>> API:

[source,json]
------------------------
curl -XPOST 'localhost:9200/_stats/query_cache?pretty&human'
------------------------

or by node with the <<cluster-nodes-stats,`nodes-stats`>> API:

[source,json]
------------------------
curl -XPOST 'localhost:9200/_nodes/stats/indices/query_cache?pretty&human'
------------------------
6 changes: 3 additions & 3 deletions docs/reference/indices/clearcache.asciidoc
Expand Up @@ -9,9 +9,9 @@ associated with one ore more indices.
$ curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
--------------------------------------------------

The API, by default, will clear all caches. Specific caches can be
cleaned explicitly by setting `filter`, `field_data` or `id_cache` to
`true`.
The API, by default, will clear all caches. Specific caches can be cleaned
explicitly by setting `filter`, `field_data`, `query_cache` coming[1.4.0],
or `id_cache` to `true`.

All caches relating to a specific field(s) can also be cleared by
specifying `fields` parameter with a comma delimited list of the
Expand Down
36 changes: 24 additions & 12 deletions docs/reference/indices/stats.asciidoc
Expand Up @@ -39,20 +39,32 @@ specified as well in the URI. Those stats can be any of:
groups). The `groups` parameter accepts a comma separated list of group names.
Use `_all` to return statistics for all groups.

`warmer`:: Warmer statistics.
`merge`:: Merge statistics.
`fielddata`:: Fielddata statistics.
`flush`:: Flush statistics.
`completion`:: Completion suggest statistics.
`refresh`:: Refresh statistics.
`suggest`:: Suggest statistics.

Some statistics allow per field granularity which accepts a list comma-separated list of included fields. By default all fields are included:
`completion`:: Completion suggest statistics.
`fielddata`:: Fielddata statistics.
`flush`:: Flush statistics.
`merge`:: Merge statistics.
`query_cache`:: <<index-modules-shard-query-cache,Shard query cache>> statistics. coming[1.4.0]
`refresh`:: Refresh statistics.
`suggest`:: Suggest statistics.
`warmer`:: Warmer statistics.

Some statistics allow per field granularity which accepts a list
comma-separated list of included fields. By default all fields are included:

[horizontal]
`fields`:: List of fields to be included in the statistics. This is used as the default list unless a more specific field list is provided (see below).
`completion_fields`:: List of fields to be included in the Completion Suggest statistics
`fielddata_fields`:: List of fields to be included in the Fielddata statistics
`fields`::

List of fields to be included in the statistics. This is used as the
default list unless a more specific field list is provided (see below).

`completion_fields`::

List of fields to be included in the Completion Suggest statistics.

`fielddata_fields`::

List of fields to be included in the Fielddata statistics.


Here are some samples:

Expand Down
18 changes: 15 additions & 3 deletions docs/reference/search/aggregations.asciidoc
Expand Up @@ -104,9 +104,9 @@ are being aggregated. The values are typically extracted from the fields of the
can also be generated using scripts.

Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output
a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple
metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and
multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple
metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and
multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).


Expand All @@ -125,6 +125,18 @@ aggregated for the buckets created by their "parent" bucket aggregation.
There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.

[float]
=== Caching heavy aggregations

coming[1.4.0]

Frequently used aggregations (e.g. for display on the home page of a website)
can be cached for faster responses. These cached results are the same results
that would be returned by an uncached aggregation -- you will never get stale
results.

See <<index-modules-shard-query-cache>> for more details.


include::aggregations/metrics.asciidoc[]

Expand Down
77 changes: 44 additions & 33 deletions docs/reference/search/request-body.asciidoc
Expand Up @@ -46,39 +46,50 @@ And here is a sample response:
[float]
=== Parameters

[cols="<,<",options="header",]
|=======================================================================
|Name |Description
|`timeout` |A search timeout, bounding the search request to be executed
within the specified time value and bail with the hits accumulated up to
that point when expired. Defaults to no timeout. See <<time-units>>.

|`from` |The starting from index of the hits to return. Defaults to `0`.

|`size` |The number of hits to return. Defaults to `10`.

|`search_type` |The type of the search operation to perform. Can be
`dfs_query_then_fetch`, `dfs_query_and_fetch`, `query_then_fetch`,
`query_and_fetch`. Defaults to `query_then_fetch`. See
<<search-request-search-type,_Search Type_>> for
more details on the different types of search that can be performed.

|coming[1.4.0] `terminate_after` |The maximum number of documents to collect for
each shard, upon reaching which the query execution will terminate early.
If set, the response will have a boolean field `terminated_early` to
indicate whether the query execution has actually terminated_early.
Defaults to no terminate_after.
|=======================================================================

Out of the above, the `search_type` is the one that can not be passed
within the search request body, and in order to set it, it must be
passed as a request REST parameter.

The rest of the search request should be passed within the body itself.
The body content can also be passed as a REST parameter named `source`.

Both HTTP GET and HTTP POST can be used to execute search with body.
Since not all clients support GET with body, POST is allowed as well.
[horizontal]
`timeout`::

A search timeout, bounding the search request to be executed within the
specified time value and bail with the hits accumulated up to that point
when expired. Defaults to no timeout. See <<time-units>>.

`from`::

The starting from index of the hits to return. Defaults to `0`.

`size`::

The number of hits to return. Defaults to `10`.

`search_type`::

The type of the search operation to perform. Can be
`dfs_query_then_fetch`, `dfs_query_and_fetch`, `query_then_fetch`,
`query_and_fetch`. Defaults to `query_then_fetch`. See
<<search-request-search-type,_Search Type_>> for more.

`query_cache`::

coming[1.4.0] Set to `true` or `false` to enable or disable the caching
of search results for requests where `?search_type=count`, ie
aggregations and suggestions. See <<index-modules-shard-query-cache>>.

`terminate_after`::

coming[1.4.0] The maximum number of documents to collect for each shard,
upon reaching which the query execution will terminate early. If set, the
response will have a boolean field `terminated_early` to indicate whether
the query execution has actually terminated_early. Defaults to no
terminate_after.


Out of the above, the `search_type` and the `query_cache` must be passed as
query-string parameters. The rest of the search request should be passed
within the body itself. The body content can also be passed as a REST
parameter named `source`.

Both HTTP GET and HTTP POST can be used to execute search with body. Since not
all clients support GET with body, POST is allowed as well.


include::request/query.asciidoc[]
Expand Down

0 comments on commit c38252d

Please sign in to comment.