Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terms filter lookup caching should cache values, not filters. #9027

Closed
wants to merge 1 commit into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 22, 2014

The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:

  • create index1 with _id indexed
  • run terms filter with caching, the parsed filter looks like _id: 1 OR _id: 2
  • remove index1
  • create index1 with _id not indexed
  • run terms filter without caching, the parsed filter is _uid: type#1 OR _uid: type#2 (the _id field mapper knows how to use the _uid field when _id is not indexed)
  • run terms filter with caching, the filter is fetched from the cache: _id: 1 OR _id: 2 but does not match anything since _id is not indexed.

The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.
@martijnvg
Copy link
Member

LGTM

@jpountz jpountz closed this in 67eba23 Dec 24, 2014
@jpountz jpountz deleted the fix/terms_lookup_caching branch December 24, 2014 09:11
jpountz added a commit that referenced this pull request Dec 24, 2014
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close #9027
jpountz added a commit that referenced this pull request Dec 24, 2014
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close #9027
jpountz added a commit that referenced this pull request Dec 24, 2014
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close #9027
@lmenezes
Copy link
Contributor

@jpountz Probably too late to this conversation... but wouldn't it be possible to make this more flexible, choosing what to cache?

My particular use case is that I use lookup quite heavily, and the lists of terms are be quite big. Even though this is slow, it's only slow once...

This change will both increase memory usage on my side(keeping the list of terms) and also slow down things in general. I use a _cache_key which is hour based... So I only pay the full price of lookups once every hour, which is quite ok.

I can also see the relationship of this and:
#8573
#9176
#9056

but I still wonder, if it is possible to maintain the option of caching the result filter of a terms lookup filter.

thanks :)

@clintongormley clintongormley added the :Core/Infra/Core Core issues without another label label Mar 19, 2015
@clintongormley clintongormley changed the title Core: Terms filter lookup caching should cache values, not filters. Terms filter lookup caching should cache values, not filters. Jun 7, 2015
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Core/Infra/Core Core issues without another label labels Jun 7, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close elastic#9027
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close elastic#9027
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v1.3.8 v1.4.3 v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants