Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query DSL: Allow to associate a custom cache key with a filter #1142

Closed
kimchy opened this issue Jul 21, 2011 · 5 comments
Closed

Query DSL: Allow to associate a custom cache key with a filter #1142

kimchy opened this issue Jul 21, 2011 · 5 comments

Comments

@kimchy
Copy link
Member

kimchy commented Jul 21, 2011

Filtres, when cached, use the filter itself as the cache key. The filter itself can be quite big, memory wise, for example, when using a terms filter with many terms. Allow to associate a custom cache key (that should be unique based on the content of the filter) that will be used instead of the actual filter, thus reserving memory. For example, a user based filter of friends can be cached like this (for a user named kimchy):

{
    "terms" : {
        "friends" : ["first", "second", "third"],
        "_cache_key" : "kimchy_friends"
    }
}

Will cache the filter under kimchy_friends, instead of using the filter itself (which includes the whole list of friends).

The _cache_key can be placed on all filters, in the same level as _cache and _name.

@kimchy kimchy closed this as completed in fbd6e85 Jul 21, 2011
@clintongormley
Copy link

Hiya - is this the best way to do this? I can see issues arising when people reuse the _cache_key incorrectly. Wouldn't it be better just to make a SHA1 of the filter, and use that as a unique ID?

@kimchy
Copy link
Member Author

kimchy commented Jul 21, 2011

@clintongormley People will always make mistakes..., it does not mean we should not expose features because of that. The downside of using SHA1 / MD5 on the data is the cost that comes with calculating it. Not saying that we shouldn't do it, possibly even trying to guess the size vs. cost automatically, but there should be an option for people to provide the cache key.

@clintongormley
Copy link

@kimchy My question is: what functional purpose exists for exposing this in the API?

All people care about is whether a filter is cached or not, they shouldn't care what name you use to cache the filter internally.

Re cost, you could just say: if length $cache_id > $max, then $cache_id = sha1($cache_id), so you'll be hashing the minority of cached filters

@kimchy
Copy link
Member Author

kimchy commented Jul 21, 2011

All people care about is whether a filter is cached or not, they shouldn't care what name you use to cache the filter internally.

Now :), there are more things that will happen down the road where this makes sense. Just a simple sample is invalidating specific filters based on cache key patterns. Other as well, though, its too early to really (nothing baked) to write about...

Re cost, you could just say: if length $cache_id > $max, then $cache_id = sha1($cache_id), so you'll be hashing the minority of cached filters

Agreed, thats what I meant by doing size vs. cost. As to the fact that its a minority or not, it really depends on the app..., so you want to allow them to control it. This size vs. cost thingy can actually be the default for something like terms filter (the obvious heavy one), maybe open an issue for it?

@clintongormley
Copy link

Opened: #1146

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants