-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weakref rather than str(class_instance) due to id reuse #641
Comments
I'm not really sure what you're suggesting or what the issue is.. What do you want to change to a weakref? |
The issue is that It would be better to either use weakref directly (rather than |
Right, I think I understand what you were trying to say. The first argument of However, this is unlikely, as it's unlikely you'd be throwing away and recreating new instances of something you wanted to cache results on. So, in practice, I doubt this is a real issue. weakref is not a |
In my case, I caught it due to a flaky test, but you could imagine an ORM type use case where you have a user object or something with a |
But as i say at the bottom of the inital message, it's not even just the self argument. you could also have a situation where you are passing a user object into a function (not even a class method), and if the user object has the same id it will map back to the same cached result, again that would be bad. |
I'm not convince it's likely to cause issues in practice. But, it has just highlighted another more likely issue:
All instances would get the same cache key with this class. So, maybe we should combine |
i think weakref is the way to go if possible; caching is likely one of the main purposes for that to exist at all |
Yep, maybe that is a concern, but I don't see a better solution than combining
Again, cache keys are |
It may also be a desired result that you get the same (cached) result when doing something like:
Even though they are not the same instance. So, actually, I think sticking to |
I was thinking of something like: obj_unique_str_from_ref = weakref.WeakKeyDictionary()
def upsert_unique_str for_obj(ob):
if ob not in obj_unique_str_from_ref:
obj_unique_str_from_ref[ob] = uuid() # or whatever
return obj_unique_str_from_ref[ob]
...
def _key_from_args(...):
..."-".join(upsert_unique_str(a) for a in args) # very roughly This would work for in-memory and would prevent accidental reuse of keys when using redis, but would not allow reuse of keys across server instances....which defeats the point of redis. so you'd need another way to allow users to say they really do know what they are doing. Maybe you check for the presence of a specific magic method (not sure if ..but i'm definitely not claiming to be an expert! |
OK, that could work, but I think reusing from different instances might be useful in some cases. A custom I think we could add some examples that demonstrate appropriate uses, like:
Note that Sqlalchemy produces a |
I'm not going to push this any further than this message, but I do think the current behaviour is dangerous and should be addressed in some form, at the very least with a prominent warning in the docs. Yes in many real-world cases there will be a sensible from functools import lru_cache
import aiocache
import asyncio
import random
class Thing:
pass
@lru_cache(maxsize=10)
def functools_cached_thing_argument(a_thing):
return random.random()
@aiocache.cached()
async def aiocache_cached_thing_argument(a_thing):
return random.random()
async def main():
functools_res_strs = set()
aiocache_res_strs = set()
for idx in range(100):
thing = Thing()
functools_res = functools_cached_thing_argument(thing)
aiocache_res = await aiocache_cached_thing_argument(thing)
print(f"{idx}: functools_res={functools_res}, aiocache_res={aiocache_res}, str={str(thing)}")
assert functools_res not in functools_res_strs
functools_res_strs.add(functools_res)
# if you uncomment this assertion it will fail, but the assertion above never fails.
#assert aiocache_res not in aiocache_res_strs
aiocache_res_strs.add(aiocache_res)
asyncio.get_event_loop().run_until_complete(main()) As far as I can tell the functools |
Umm, it appears to suffer the same issue in regards to keys. The only reason you don't hit the issue in your example is because the objects remain referenced in the cache and are never garbage collected. Here it returns a hashed tuple of the arguments as the key: If you leave maxsize at the default value and uncomment the assert, you'll see the aiocache version passes as well, because no objects have been garbage collected. Your idea of a WeakKeyDictionary mapping to uuids seems like the only potential solution currently. It also still seems useful more often than not to have this duplication on the str() value, when considering temporary ORM objects and multiple processes (which you're more likely to be wanting if you're using a redis cache or something). So, it seems to me like something that should be disabled on a case-by-case basis, and that can already be done with a custom |
Yes, but that misses the point. The functools version can never map two different objects to the same cache key* as a side effect of the In other words, the test here is the exact opposite of what you suggest, namely comment out the call to the functools version (rather than increase its ability to prevent GC) and then the aiocache version is left to fend for itself, and things actually get worse. *Unless we consider custom hash implementation or collisions. |
I just mean that we are solving a fundamentally different problem and there is nothing we can take from lru_cache's implementation. Currently WeakKeyDictionary is the only proposal to fix this. But, if you're using an external cache, that's almost certainly because you want the cache to persist between executions and/or to be shared between multiple processes. WeakKeyDictionary would essentially reset the cache on every execution (but, leave stale data around). I can't see a solution currently (other than an opt-in for WeakKeyDictionary, or customising |
See this on SO and links within, or a simple test case as follows:
This prints:
This means that the cache is not scoped properly. It sounds like using weakref is the better option if possible.
This is where it is used currently. Note that this actually applies to all references in the args/kwargs, not just the
self
object.The text was updated successfully, but these errors were encountered: