Skip to content
This repository was archived by the owner on Mar 19, 2025. It is now read-only.

Add custom function for filtering cached arguments. #1

Merged
merged 14 commits into from
Jan 5, 2018

Conversation

andreip
Copy link

@andreip andreip commented Jan 3, 2018

Adds ability to only use certain fields as cache key.

@andreip
Copy link
Author

andreip commented Jan 3, 2018

^ huh, that's unexpected. On local I get FAILED (failures=4) when running python setup.py test.

Copy link

@tommikaikkonen tommikaikkonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a monster comment discussing the API we want for this.

@@ -11,7 +11,8 @@
logger = logging.getLogger("cache_utils")


def cached(timeout, group=None, backend=None, key=None):
def cached(timeout, group=None, backend=None, key=None,
filter_args_kwargs=None):
Copy link

@tommikaikkonen tommikaikkonen Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very useful feature 💯 I have some opinions about the API for this, let me know your thoughts!

The goal here is to allow the user to modify the cache key for each call, right? filter_args_kwargs as a concept seems slightly too restrictive than it needs to be: we may also want to transform them. (Or if you want to live dangerously, add something new to the key, like a mutable global variable :feelsgood:)

Further, the return value doesn't have to be in the args & kwargs, tuple and dict form—we just need a serializable value we can use as a cache key. I think the simplest interface would be a function that maps arguments to any Python value that may be serialized to a concrete cache key.

I would separate out the following things and allow the user to customize each:

  • Calculating the function cache key (the prefix)
  • Based on function arguments on a single call, determine the unserialized cache key value
  • Serializing cache key value above (JSON? Pickle? hash? django-cache-utils casts to string)

Calculating the function cache key

The generated function key in django-cache-utils is something like cacheable_scrape_article[cached]:431. We may want to make the key kwarg accept a callable to facilitate this, e.g.

@cached(..., key=lambda fn: f'{fn.__name__}_version_1')

That would allow the default value of key to be the function that returns cacheable_scrape_article[cached]:431 based on the function.

Determining the cache key for a single call

Specifying the cache key for a call would be facilitated with a keyword argument similar to filter_args_kwargs; I think the most intuitive kwarg name would be key, but that is already used for the function key. Without having to think about backwards compatibility, I would rename key to fn_key and have the call cache key function specified in key.

Serializing the key value for a call

Now that we have the key (Python) value for a call, we can use the user-specified serializer to turn it into a key value usable by the cache backend. django-cache-utils uses a simple string cast, but I think JSON/pickle would be more robust. Then the whole cache key would be serialize(fn_key(fn)) + serialize(call_key(*args, **kwargs))


Obviously we don't need to make all these changes at once, but it would be useful to have a consistent, robust API in mind as we make improvements.

Copy link
Author

@andreip andreip Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that renaming key to fn_key and making it a callable with the default work as is now would be nice.

Also now that key is free, we can use that for computing the paramters to a python value, I wanted to use that but was taken.

Lastly, I guess a nice to have would be to add other serialization methods, yep. But this isn't an immediate option to use since we use cached in a bunch of places and changing this to sth other than the _args_to_unicode with prefix [cached] would be equivalent to invalidating all our cache. We could start using different serialization protocols for new code though.

I've changed the code to support this. At first I was hesitant to do too many changes and the odd implementation from before encouraged hacks. I like your idea with splitting the args and the function conversion to serializables and using those. I just used str for now, I didn't look much into what the final output needs to be so memcached/group_backend.py etc. related code would accept it. Moreover, kept backwards compatibility so our old cache is hit with this update. Let me know what you think and if you spot any issues 👍 .

""" Construct readable cache key """
if filter_args_kwargs and inspect.isfunction(filter_args_kwargs):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does callable work here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's less restrictive, we can go with that.

and returns the new args and kwargs. Those are
combined with function name to get the final key.
E.g.
def filter_args_kwargs(args, kwargs):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some discussion about filter_arg_kwargs(args, kwargs) versus filter_args_kwargs(*args, **kwargs) in my other comment, but now I see that this code comment was just outdated and that the actual implementation is like filter_args_kwargs(*args, **kwargs) which I think is the better choice. I removed that discussion from my other comment.

@andreip
Copy link
Author

andreip commented Jan 4, 2018

^ tests don't work well for me, I think because I don't have memcached setup locally. Not sure if on circleCI it worked or not, but now seems some tests fail, so might have worked, idk.

Nevertheless, should sort out the tests thing before merging this in. I'm not sure if all tests passed beforehand since some were failing for me because I don't have memcached running. Putting below which ones failed for me in master probably due to not running memcached on local.

======================================================================
FAIL: test_decorator (cache_utils.tests.DecoratorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/andrei/repos/upbeat/django-cache-utils/cache_utils/tests.py", line 161, in test_decorator
    self.assertEqual(my_func(), "1")
AssertionError: '2' != '1'
- 2
+ 1


======================================================================
FAIL: test_func_invalidation (cache_utils.tests.InvalidationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/andrei/repos/upbeat/django-cache-utils/cache_utils/tests.py", line 119, in test_func_invalidation
    self.assertEqual(my_func(1, 2), 1)
AssertionError: 2 != 1

======================================================================
FAIL: test_group_invalidation (cache_utils.tests.InvalidationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/andrei/repos/upbeat/django-cache-utils/cache_utils/tests.py", line 98, in test_group_invalidation
    self.assertEqual(cache.get('vasia', group='names'), 'foo')
AssertionError: None != 'foo'

======================================================================
FAIL: test_method_invalidation (cache_utils.tests.InvalidationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/andrei/repos/upbeat/django-cache-utils/cache_utils/tests.py", line 139, in test_method_invalidation
    self.assertEqual(foo.bar(1), 1)
AssertionError: 2 != 1

Copy link

@tommikaikkonen tommikaikkonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the following changes:

  • Removed support for decorating methods

  • Improved fn_key: omitted line number, used __qualname__ instead of __name__ (is there a case we would still need the line number?)

  • Simplified the default implementation for key not to consider serialization

  • Added legacy_fn_key and legacy_key to remain compatible with previous cache key generation, added a test for that. Usage would be like this:

    from cache_utils import cached, legacy_fn_key, legacy_key
    
    @cached(3600, fn_key=legacy_fn_key, key=legacy_key)
    def foo(...):
        ...

Copy link
Author

@andreip andreip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you simplified the code even more and kept an option for the legacy way of doing it. We probably will be using those ourselves for old code.

See the only question with None I had below. Otherwise I think we can merge this in.

@@ -46,16 +71,26 @@ def test(*args, **kwargs):
cache_backend = caches['default']

def _cached(func):
func_type = _func_type(func)
fn_key_str = str(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could become str(None) = "None" (if you explicitly pass fn_key=None for any reason) and then it wouldn't raise an error below.

Do you think we should maybe also add an if above for fn_key to not be falsy? Or if a user provides an argument, then maybe they check docs? But there aren't any, so the code in this case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! We can make a hard requirement for fn_key to either be a callable or a str, and the result of a callable fn_key to be a str to remove the str casting. If these requirements fail, we raise an exception. This should avoid any unexpected behavior.



def default_fn_key(func):
return ".".join([func.__module__, func.__qualname__])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, didn't like lineno either, and qualname is better. But I suppose we'll have to make sure to keep on using the legacy ones for the caches we've already set up.

# try to get the value from cache
key = _get_key(wrapper._full_name, func_type, args, kwargs)
key = _get_key(*args, **kwargs)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

also, makes perfect sense to move all the _get_key and fn_key_str computations inside the _cached context, since we have the function and can drop one param.

@tommikaikkonen
Copy link

Re: setting up memcached locally and getting the tests to run, this is all I had to do:

brew install libmemcached
brew services start memcached

@tommikaikkonen tommikaikkonen merged commit 031b162 into master Jan 5, 2018
@tommikaikkonen tommikaikkonen deleted the feature/filter_cached_arguments branch January 5, 2018 20:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants