Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut over to the Lucene filter cache #10897

Merged
merged 1 commit into from May 4, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 0 additions & 3 deletions dev-tools/forbidden/core-signatures.txt
Expand Up @@ -39,9 +39,6 @@ org.apache.lucene.index.IndexReader#decRef()
org.apache.lucene.index.IndexReader#incRef()
org.apache.lucene.index.IndexReader#tryIncRef()

@defaultMessage QueryWrapperFilter is cacheable by default - use Queries#wrap instead
org.apache.lucene.search.QueryWrapperFilter#<init>(org.apache.lucene.search.Query)

@defaultMessage Pass the precision step from the mappings explicitly instead
org.apache.lucene.search.NumericRangeQuery#newDoubleRange(java.lang.String,java.lang.Double,java.lang.Double,boolean,boolean)
org.apache.lucene.search.NumericRangeQuery#newFloatRange(java.lang.String,java.lang.Float,java.lang.Float,boolean,boolean)
Expand Down
3 changes: 0 additions & 3 deletions docs/reference/cluster/update-settings.asciidoc
Expand Up @@ -153,9 +153,6 @@ due to forced awareness or allocation filtering.
`indices.cache.filter.size`::
See <<index-modules-cache>>

`indices.cache.filter.expire` (time)::
See <<index-modules-cache>>

[float]
==== TTL interval

Expand Down
6 changes: 6 additions & 0 deletions docs/reference/migration/migrate_2_0.asciidoc
Expand Up @@ -418,6 +418,12 @@ favour or `bool`.
The `execution` option of the `terms` filter is now deprecated and ignored if
provided.

The `_cache` and `_cache_key` parameters of filters are deprecated in the REST
layer and removed in the Java API. In case they are specified they will be
ignored. Instead filters are always used as their own cache key and elasticsearch
makes decisions by itself about whether it should cache filters based on how
often they are used.

=== Snapshot and Restore

The obsolete parameters `expand_wildcards_open` and `expand_wildcards_close` are no longer
Expand Down
87 changes: 8 additions & 79 deletions docs/reference/query-dsl/filters.asciidoc
Expand Up @@ -10,85 +10,14 @@ As a general rule, filters should be used instead of queries:
[[caching]]
=== Filters and Caching

Filters can be a great candidate for caching. Caching the result of a
filter does not require a lot of memory, and will cause other queries
executing against the same filter (same parameters) to be blazingly
fast.

However the cost of caching is not the same for all filters. For
instance some filters are already fast out of the box while caching could
add significant overhead, and some filters produce results that are already
cacheable so caching them is just a matter of putting the result in the
cache.

The default caching policy, `_cache: auto`, tracks the 1000 most recently
used filters on a per-index basis and makes decisions based on their
frequency.

[float]
==== Filters that read directly the index structure

Some filters can directly read the index structure and potentially jump
over large sequences of documents that are not worth evaluating (for
instance when these documents do not match the query). Caching these
filters introduces overhead given that all documents that the filter
matches need to be consumed in order to be loaded into the cache.

These filters, which include the <<query-dsl-term-filter,term>> and
<<query-dsl-term-query,query>> filters, are only cached after they
appear 5 times or more in the history of the 1000 most recently used
filters.

[float]
==== Filters that produce results that are already cacheable

Some filters produce results that are already cacheable, and the difference
between caching and not caching them is the act of placing the result in
the cache or not. These filters, which include the
<<query-dsl-terms-filter,terms>>,
<<query-dsl-prefix-filter,prefix>>, and
<<query-dsl-range-filter,range>> filters, are by default cached after they
appear twice or more in the history of the most 1000 recently used filters.

[float]
==== Computational filters

Some filters need to run some computation in order to figure out whether
a given document matches a filter. These filters, which include the geo and
<<query-dsl-script-filter,script>> filters, but also the
<<query-dsl-terms-filter,terms>> and <<query-dsl-range-filter,range>>
filters when using the `fielddata` execution mode are never cached by default,
as it would require to evaluate the filter on all documents in your indices
while they can otherwise be only evaluated on documents that match the query.

[float]
==== Compound filters

The last type of filters are those working with other filters, and includes
the <<query-dsl-bool-filter,bool>>,
<<query-dsl-and-filter,and>>,
<<query-dsl-not-filter,not>> and
<<query-dsl-or-filter,or>> filters.

There is no general rule about these filters. Depending on the filters that
they wrap, they will sometimes return a filter that dynamically evaluates the
sub filters and sometimes evaluate the sub filters eagerly in order to return
a result that is already cacheable, so depending on the case, these filters
will be cached after they appear 2+ or 5+ times in the history of the most
1000 recently used filters.

[float]
==== Overriding the default behaviour

All filters allow to set `_cache` element on them to explicitly control
caching. It accepts 3 values: `true` in order to cache the filter, `false`
to make sure that the filter will not be cached, and `auto`, which is the
default and will decide on whether to cache the filter based on the cost
to cache it and how often it has been used as explained above.

Filters also allow to set `_cache_key` which will be used as the
caching key for that filter. This can be handy when using very large
filters (like a terms filter with many elements in it).
Filters can be a great candidate for caching. Caching the document set that
a filter matches does not require much memory and can help improve
execution speed of queries.

Elasticsearch decides to cache filters based on how often they are used. For
this reason you might occasionally see better performance by splitting
complex filters into a static part that Elasticsearch will cache and a dynamic
part which is least costly than the original filter.

include::filters/and-filter.asciidoc[]

Expand Down
37 changes: 0 additions & 37 deletions docs/reference/query-dsl/filters/and-filter.asciidoc
Expand Up @@ -32,40 +32,3 @@ filters. Can be placed within queries that accept a filter.
}
--------------------------------------------------

[float]
==== Caching

The result of the filter is only cached by default if there is evidence of
reuse. It is possible to opt-in explicitely for caching by setting `_cache`
to `true`. Since the `_cache` element requires to be set on the `and` filter
itself, the structure then changes a bit to have the filters provided within a
`filters` element:

[source,js]
--------------------------------------------------
{
"filtered" : {
"query" : {
"term" : { "name.first" : "shay" }
},
"filter" : {
"and" : {
"filters": [
{
"range" : {
"postDate" : {
"from" : "2010-03-01",
"to" : "2010-04-01"
}
}
},
{
"prefix" : { "name.second" : "ba" }
}
],
"_cache" : true
}
}
}
}
--------------------------------------------------
Expand Up @@ -230,11 +230,3 @@ are not supported. Here is an example:
}
--------------------------------------------------

[float]
==== Caching

The result of the filter is not cached by default. The `_cache` can be
set to `true` to cache the *result* of the filter. This is handy when
the same bounding box parameters are used on several (many) other
queries. Note, the process of caching the first execution is higher when
caching (since it needs to satisfy different queries).
8 changes: 0 additions & 8 deletions docs/reference/query-dsl/filters/geo-distance-filter.asciidoc
Expand Up @@ -172,11 +172,3 @@ The `geo_distance` filter can work with multiple locations / points per
document. Once a single location / point matches the filter, the
document will be included in the filter.

[float]
==== Caching

The result of the filter is not cached by default. The `_cache` can be
set to `true` to cache the *result* of the filter. This is handy when
the same point and distance parameters are used on several (many) other
queries. Note, the process of caching the first execution is higher when
caching (since it needs to satisfy different queries).
8 changes: 0 additions & 8 deletions docs/reference/query-dsl/filters/geo-polygon-filter.asciidoc
Expand Up @@ -116,11 +116,3 @@ The filter *requires* the
<<mapping-geo-point-type,geo_point>> type to be
set on the relevant field.

[float]
==== Caching

The result of the filter is not cached by default. The `_cache` can be
set to `true` to cache the *result* of the filter. This is handy when
the same points parameters are used on several (many) other queries.
Note, the process of caching the first execution is higher when caching
(since it needs to satisfy different queries).
9 changes: 0 additions & 9 deletions docs/reference/query-dsl/filters/geo-shape-filter.asciidoc
Expand Up @@ -110,12 +110,3 @@ shape:
}
--------------------------------------------------

[float]
==== Caching

The result of the Filter is not cached by default. Setting `_cache` to
`true` will mean the results of the Filter will be cached. Since shapes
can contain 10s-100s of coordinates and any one differing means a new
shape, it may make sense to only using caching when you are sure that
the shapes will remain reasonably static.

7 changes: 0 additions & 7 deletions docs/reference/query-dsl/filters/geohash-cell-filter.asciidoc
Expand Up @@ -61,10 +61,3 @@ next to the given cell.
}
--------------------------------------------------

[float]
==== Caching

The result of the filter is not cached by default. The
`_cache` parameter can be set to `true` to turn caching on.
By default the filter uses the resulting geohash cells as a cache key.
This can be changed by using the `_cache_key` option.
6 changes: 0 additions & 6 deletions docs/reference/query-dsl/filters/has-child-filter.asciidoc
Expand Up @@ -88,9 +88,3 @@ APIS, eg:
curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"
--------------------------------------------------

[float]
==== Caching

The `has_child` filter cannot be cached in the filter cache. The `_cache`
and `_cache_key` options are a no-op in this filter. Also any filter that
wraps the `has_child` filter either directly or indirectly will not be cached.
6 changes: 0 additions & 6 deletions docs/reference/query-dsl/filters/has-parent-filter.asciidoc
Expand Up @@ -63,9 +63,3 @@ APIS, eg:
curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"
--------------------------------------------------

[float]
==== Caching

The `has_parent` filter cannot be cached in the filter cache. The `_cache`
and `_cache_key` options are a no-op in this filter. Also any filter that
wraps the `has_parent` filter either directly or indirectly will not be cached.
8 changes: 2 additions & 6 deletions docs/reference/query-dsl/filters/nested-filter.asciidoc
Expand Up @@ -2,10 +2,7 @@
=== Nested Filter

A `nested` filter works in a similar fashion to the
<<query-dsl-nested-query,nested>> query, except it's
used as a filter. It follows exactly the same structure, but also allows
to cache the results (set `_cache` to `true`), and have it named (set
the `_name` value). For example:
<<query-dsl-nested-query,nested>> query. For example:

[source,js]
--------------------------------------------------
Expand All @@ -26,8 +23,7 @@ the `_name` value). For example:
}
]
}
},
"_cache" : true
}
}
}
}
Expand Down
30 changes: 0 additions & 30 deletions docs/reference/query-dsl/filters/not-filter.asciidoc
Expand Up @@ -50,33 +50,3 @@ Or, in a longer form with a `filter` element:
}
--------------------------------------------------

[float]
==== Caching

The result of the filter is only cached if there is evidence of reuse.
The `_cache` can be set to `true` in order to cache it (though usually
not needed). Here is an example:

[source,js]
--------------------------------------------------
{
"filtered" : {
"query" : {
"term" : { "name.first" : "shay" }
},
"filter" : {
"not" : {
"filter" : {
"range" : {
"postDate" : {
"from" : "2010-03-01",
"to" : "2010-04-01"
}
}
},
"_cache" : true
}
}
}
}
--------------------------------------------------
33 changes: 0 additions & 33 deletions docs/reference/query-dsl/filters/or-filter.asciidoc
Expand Up @@ -27,36 +27,3 @@ filters. Can be placed within queries that accept a filter.
}
--------------------------------------------------

[float]
==== Caching

The result of the filter is only cached by default if there is evidence
of reuse. The `_cache` can be
set to `true` in order to cache it (though usually not needed). Since
the `_cache` element requires to be set on the `or` filter itself, the
structure then changes a bit to have the filters provided within a
`filters` element:

[source,js]
--------------------------------------------------
{
"filtered" : {
"query" : {
"term" : { "name.first" : "shay" }
},
"filter" : {
"or" : {
"filters" : [
{
"term" : { "name.second" : "banon" }
},
{
"term" : { "name.nick" : "kimchy" }
}
],
"_cache" : true
}
}
}
}
--------------------------------------------------
19 changes: 0 additions & 19 deletions docs/reference/query-dsl/filters/prefix-filter.asciidoc
Expand Up @@ -16,22 +16,3 @@ a filter. Can be placed within queries that accept a filter.
}
--------------------------------------------------

[float]
==== Caching

The result of the filter is cached by default if there is evidence of reuse.
The `_cache` can be set to `true` in order to cache it. Here is an example:

[source,js]
--------------------------------------------------
{
"constant_score" : {
"filter" : {
"prefix" : {
"user" : "ki",
"_cache" : true
}
}
}
}
--------------------------------------------------
31 changes: 0 additions & 31 deletions docs/reference/query-dsl/filters/query-filter.asciidoc
Expand Up @@ -19,34 +19,3 @@ that accept a filter.
}
--------------------------------------------------

[float]
==== Caching

The result of the filter is only cached by default if there is evidence of reuse.

The `_cache` can be
set to `true` to cache the *result* of the filter. This is handy when
the same query is used on several (many) other queries. Note, the
process of caching the first execution is higher when not caching (since
it needs to satisfy different queries).

Setting the `_cache` element requires a different format for the
`query`:

[source,js]
--------------------------------------------------
{
"constantScore" : {
"filter" : {
"fquery" : {
"query" : {
"query_string" : {
"query" : "this AND that OR thus"
}
},
"_cache" : true
}
}
}
}
--------------------------------------------------