New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: ability to get the list of keys, or a subset of them. #163

Open
mjharwood opened this Issue Jun 9, 2017 · 6 comments

Comments

Projects
None yet
3 participants
@mjharwood

mjharwood commented Jun 9, 2017

I needed the ability to remove all keys matching particular patterns, where one key might match more than one pattern, so regions would not suffice.

The work around is to have OnAdd/OnRemove hooks that keep track of all the keys, but this is inefficient as many of the backends have clear ways to get the keys (in memory and redis are the ones I'm using for the project).

@MichaCo

This comment has been minimized.

Show comment
Hide comment
@MichaCo

MichaCo Jun 9, 2017

Owner

I totally get that this is kinda a great feature.
The problem is though, distributed cache systems are not meant to be queried. Most of them don't support it at all or have n+1 issues like Redis' Keys command.

What I would suggest is, how about building that feature for in-memory cache only? For in-memory, this is actually relatively easy to control and do and fast.

So, how about make that feature available client side only and in case you have a distributed layer, you'll have to either not use the feature or build your own solution.

Maybe there will be solution for Redis some day with all the new features coming in 4.0, but right now, I really don't want to have any method calling keys in this library. Also, you call keys only for one server. This will not work in a cluster for example.

What do you think?

Owner

MichaCo commented Jun 9, 2017

I totally get that this is kinda a great feature.
The problem is though, distributed cache systems are not meant to be queried. Most of them don't support it at all or have n+1 issues like Redis' Keys command.

What I would suggest is, how about building that feature for in-memory cache only? For in-memory, this is actually relatively easy to control and do and fast.

So, how about make that feature available client side only and in case you have a distributed layer, you'll have to either not use the feature or build your own solution.

Maybe there will be solution for Redis some day with all the new features coming in 4.0, but right now, I really don't want to have any method calling keys in this library. Also, you call keys only for one server. This will not work in a cluster for example.

What do you think?

@mjharwood

This comment has been minimized.

Show comment
Hide comment
@mjharwood

mjharwood Jun 9, 2017

The library for redis (StackExchange.Redis) does use SCAN with a pattern match rather than KEYS if the redis server is new enough to support it (>= 2.8.0), so it's somewhat less intense.

We have two use cases:

  1. In memory + redis + redis backplane: as long as we use the same timeout for both layers, then in-memory should be a near-instantaneous copy of redis (at least for that region). So that would work except for the corner case where it hasn't propagated yet, but you have that same problem with SCAN/KEYS.
  2. Pure redis: choices here are either SCAN or hashes of keys (either one global, or one per pattern).

Your suggestion would work for case 1, and maybe that's more useful to others and thus worth implementing at the library level.

The "hashes of keys" solution is very similar to the way regions support is implemented (at least for the memory and redis cases), except on multiple dimensions rather than just the one.

Example:
Key1:Foo=a:Bar=x
Key2:Foo=a:Bar=y
Key2:Foo=b:Bar=y

and we want to be able to either remove all where "Foo=a" or, at a different time, "Bar=y" (basically deleting on named property)

Maybe there's a better way to solve this particular case.

mjharwood commented Jun 9, 2017

The library for redis (StackExchange.Redis) does use SCAN with a pattern match rather than KEYS if the redis server is new enough to support it (>= 2.8.0), so it's somewhat less intense.

We have two use cases:

  1. In memory + redis + redis backplane: as long as we use the same timeout for both layers, then in-memory should be a near-instantaneous copy of redis (at least for that region). So that would work except for the corner case where it hasn't propagated yet, but you have that same problem with SCAN/KEYS.
  2. Pure redis: choices here are either SCAN or hashes of keys (either one global, or one per pattern).

Your suggestion would work for case 1, and maybe that's more useful to others and thus worth implementing at the library level.

The "hashes of keys" solution is very similar to the way regions support is implemented (at least for the memory and redis cases), except on multiple dimensions rather than just the one.

Example:
Key1:Foo=a:Bar=x
Key2:Foo=a:Bar=y
Key2:Foo=b:Bar=y

and we want to be able to either remove all where "Foo=a" or, at a different time, "Bar=y" (basically deleting on named property)

Maybe there's a better way to solve this particular case.

@MichaCo

This comment has been minimized.

Show comment
Hide comment
@MichaCo

MichaCo Jun 9, 2017

Owner

I think the Redis use-case would always be very dependent on how you setup Redis and what you actually need.
You could still use CacheManager for layered cache and backplane, pass in a ConnectionMultiplexer to the configuration and use that one directly for searching for keys. I think that would be a good compromise.

Storing another lookup in Redis for all keys is just not something I'd like to have. That's not performing very well as you would have to keep it somewhat in sync. There could be millions of keys eventually...

Yes this would be similar to the regions implementation with Redis. But I know that the list of keys, i keep for a region, might not be 100% accurate and it doesn't really matter in that case. Again, if you would store really tons of keys in one region, a clear region would also be pretty slow. But it is a trade off for that feature which I think is fine.

In addition to all that, only Redis has good support for searching at all. Memcached and other distributed caches don't support that at all. That's why I never added that feature in the first place ;)

But again, I think it would be fine to have that implemented for all the in-memory caches.

Owner

MichaCo commented Jun 9, 2017

I think the Redis use-case would always be very dependent on how you setup Redis and what you actually need.
You could still use CacheManager for layered cache and backplane, pass in a ConnectionMultiplexer to the configuration and use that one directly for searching for keys. I think that would be a good compromise.

Storing another lookup in Redis for all keys is just not something I'd like to have. That's not performing very well as you would have to keep it somewhat in sync. There could be millions of keys eventually...

Yes this would be similar to the regions implementation with Redis. But I know that the list of keys, i keep for a region, might not be 100% accurate and it doesn't really matter in that case. Again, if you would store really tons of keys in one region, a clear region would also be pretty slow. But it is a trade off for that feature which I think is fine.

In addition to all that, only Redis has good support for searching at all. Memcached and other distributed caches don't support that at all. That's why I never added that feature in the first place ;)

But again, I think it would be fine to have that implemented for all the in-memory caches.

@twgraham

This comment has been minimized.

Show comment
Hide comment
@twgraham

twgraham Feb 1, 2018

From quiet observation, I think there's two features here:

  1. Ability to search or match on keys in the cache in the hopes of doing...
  2. Retrieving multiple cache items from a list of given keys (also requested in #186)

While i understand 1. being quite expensive in some distributed caches such as Redis, and non existent in others, it is largely seperate from 2.

Currently, there might be situations where i need to get ~100 items from the cache, and if these are in Redis, this could mean 100 requests. If there was a IEnumerable<T> GetMany<T>(IEnumerable<string> keys) on the cache, then atleast we could direct Redis to do a single MGET originating from a single request instead of 100 GET originating from 100 requests (please correct me if I'm wrong).

Worst case scenario, the distributed cache doesn't have a facility for getting multiple keys in one command, and cache manager has to do what the user is otherwise forced to do - iterate and get each key individually.

twgraham commented Feb 1, 2018

From quiet observation, I think there's two features here:

  1. Ability to search or match on keys in the cache in the hopes of doing...
  2. Retrieving multiple cache items from a list of given keys (also requested in #186)

While i understand 1. being quite expensive in some distributed caches such as Redis, and non existent in others, it is largely seperate from 2.

Currently, there might be situations where i need to get ~100 items from the cache, and if these are in Redis, this could mean 100 requests. If there was a IEnumerable<T> GetMany<T>(IEnumerable<string> keys) on the cache, then atleast we could direct Redis to do a single MGET originating from a single request instead of 100 GET originating from 100 requests (please correct me if I'm wrong).

Worst case scenario, the distributed cache doesn't have a facility for getting multiple keys in one command, and cache manager has to do what the user is otherwise forced to do - iterate and get each key individually.

@MichaCo

This comment has been minimized.

Show comment
Hide comment
@MichaCo

MichaCo Feb 2, 2018

Owner

@twgraham You are absolutely right that those are two different operations.
That being said, there is not really any point for a GetMany implementation right now because
a) I'm using hashes in Redis and there is no operation in Redis where I can get multiple fields from multiple hashes at the same time. All ops on hashes are for one key.
b) With CacheManager you can use multiple layers, e.g. in-memory + Redis. Maybe some keys are already cached in-memory, doing a get would eventually return it from in-memory (much faster) or fall back to redis. This might be actually faster than any multi get against Redis (just depends on how the Cache is populated...)
c) The other distributed caches don't even have the ability to retrieve n keys, would do a manual iterator anyways. And for everything else, other in-memory caches, doing a get in a loop is really fast and usually the only way to retrieve n keys anyways.

That's why this feature doesn't exist, I'd implement it as an iterator on the top level anyways. Then, a MGet might have draw backs, you have to handle what should happen if one or more given keys do not exist. I could return either nulls to keep positions or only return found key/values. Also, I'd have to return both, key and value. There might be more logic involved, so, I think it makes more sense for anyone who needs it, to implement it with a simple loop+ custom logic...

Regarding the "search" like feature, I'll look more into it again when I find some time. Still interested in getting that feature done. I'm just not very happy with the current approach in the related PR and might have to change it a lot ...

Owner

MichaCo commented Feb 2, 2018

@twgraham You are absolutely right that those are two different operations.
That being said, there is not really any point for a GetMany implementation right now because
a) I'm using hashes in Redis and there is no operation in Redis where I can get multiple fields from multiple hashes at the same time. All ops on hashes are for one key.
b) With CacheManager you can use multiple layers, e.g. in-memory + Redis. Maybe some keys are already cached in-memory, doing a get would eventually return it from in-memory (much faster) or fall back to redis. This might be actually faster than any multi get against Redis (just depends on how the Cache is populated...)
c) The other distributed caches don't even have the ability to retrieve n keys, would do a manual iterator anyways. And for everything else, other in-memory caches, doing a get in a loop is really fast and usually the only way to retrieve n keys anyways.

That's why this feature doesn't exist, I'd implement it as an iterator on the top level anyways. Then, a MGet might have draw backs, you have to handle what should happen if one or more given keys do not exist. I could return either nulls to keep positions or only return found key/values. Also, I'd have to return both, key and value. There might be more logic involved, so, I think it makes more sense for anyone who needs it, to implement it with a simple loop+ custom logic...

Regarding the "search" like feature, I'll look more into it again when I find some time. Still interested in getting that feature done. I'm just not very happy with the current approach in the related PR and might have to change it a lot ...

@twgraham

This comment has been minimized.

Show comment
Hide comment
@twgraham

twgraham Feb 2, 2018

Thanks for the explanation.

Regarding a) do you have any further reference material (code aside) about how cache manager uses Redis to store and retrieve values?

In regards to b) my concern is for cold starts, and expiration (although I suppose that's up to me to tune).

I suppose we can agree on c).

twgraham commented Feb 2, 2018

Thanks for the explanation.

Regarding a) do you have any further reference material (code aside) about how cache manager uses Redis to store and retrieve values?

In regards to b) my concern is for cold starts, and expiration (although I suppose that's up to me to tune).

I suppose we can agree on c).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment