Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The default behaviour for in-memory cache should not include serialization #269

Closed
eu9ene opened this issue Jun 14, 2017 · 8 comments
Closed

Comments

@eu9ene
Copy link

eu9ene commented Jun 14, 2017

The basic use case for aiocache is just wrapping some async method with decorator to use in-memory caching. Like that:

@cached(ttl=600)
async def get_smth_async(id):
  ...

Obviously, user doesn't expect any serialization in this case. It is redundant and makes additional performance impact. So default serializer should do nothing, like DefaultSerializer in previous versions. Currently JsonSerializer is used as default.

For not in-memory use cases user should explicitly specify what type of serialization he needs.

May be different default serializers should be used for different cache types.

@argaen
Copy link
Member

argaen commented Jun 15, 2017

Hey @xdanmer you have a point there and I've thought about this already. For the current version my aim was to keep it simple and focus on given an input, give the same output no matter what backend you are using. Here are some points on why I didn't go for what you are proposing:

  • As already explained, no matter what backend you are using, given an input the output must be the same.
  • DefaultSerializer from previous versions where no serialization was happening was confusing and produced unexpected outputs depending on which backend you were using which is not acceptable.
  • I've thought about adding a NullSerializer which wouldn't apply serialization but, when using it with memory backend, if the user stores mutable objects this can have some really ugly side effects. For example:
await cache.set("key", {a: 1, b: 2})    # here we store the dict in the cache
....  # random stuff
value = await cache.get("key")    # dict is returned
value['c'] = 3   # dict in the cache will also contain the key 'c'!! I don't think this is expected behavior!

In order for the above example to behave correctly in those cases, a copy of the mutable object should be produced and this also adds overhead.

Because of this, I decided to not support this in the package. Anyway, In case you really want this, you can always create your own serializer and pass it to the decorator (or even use aliases)

If you can think of a proposal which is simple and solves those edge cases cleanly, I'm happy to discuss it :)

@eu9ene
Copy link
Author

eu9ene commented Jun 15, 2017

@argaen, thanks for detailed answer, but I disagree with some points, so let's discuss.

  • First of all, I think that in-memory caching and caching in external storage are two different concepts and we should not try to make behaviour absolutely equal. Users should understand where they store objects and what to expect.
  • As for objects immutability, to my mind it should be up to user. Recently I had a case when I needed memory cache of mutable objects. The point was to update some of objects in memory and store them periodically to db. Objects are loaded from db on demand and expire in some time after last update. Also if we check popular caching libraries, I'm sure that default behaviour will be without deep cloning of object on caching, so possibility to change cached object is more intuitive for users. Users understand that they cache object reference, not value. It is the same as to pass object reference to a function and store it as a field of another object.
  • Lack of overhead is very important for in-memory caching, we use it because it's faster. With asyncio we can have high loads and this infinite deserialization can become a bottleneck.
  • I agree that NullDeserializer or may be EmptyDeserializer are better names that DefaultSerializer.
  • According to the above points, to keep things simple for user, better to make default behaviour intuitive and not force them to implement empty serializers or specify any serializers for basic cases.

Overall, current solution looks like a limit of current architecture, so may be we should try to change it a bit. For example, make serialization optional or suppose that every backend has it's own default serializer. Of course if you will agree about objects mutability point, otherwise, everything is ok, but, in my opinion, rather not intuitive.

@argaen
Copy link
Member

argaen commented Jun 15, 2017

Lack of overhead is very important for in-memory caching, we use it because it's faster. With asyncio we can have high loads and this infinite deserialization can become a bottleneck.

You raise a good point here with asyncio and being a bottleneck for the reactor.

I'm going to check other libraries how they deal with the in memory caching. If the majority go for not using deep copy I'll go for that which ends up being more intuitive for the rest of users.

Anyway, if we end up doing this, this is how I imagine the end picture:

  • New serializer called NullSerializer which basically does nothing (with the current design there is always a serializer in the middle and I don't want to start putting ifs around)
  • Each cache should have a default serializer:
    • Memory: NullSerializer
    • Redis: JsonSerializer
    • Memcached: JsonSerializer
  • NullSerializer will come with a super BIG disclaimer about mutability of objects and so if I find out other libraries are not doing the deepcopies and so.
  • Decorators will end up using the default serializers of the caches rather than having a default for all.

Makes sense?

Also, just to have one more opinion on that, @pfreixes what do you think?

@eu9ene
Copy link
Author

eu9ene commented Jun 15, 2017

@argaen yes, I agree. Let's check libraries. One example - popular caching library for .NET https://github.com/MichaCo/CacheManager. It's not Python, but anyway.
From documentation:

In general, serialization is only used by CacheManager, if the cache handle cannot store the object reference in memory. This is the case for any distributed cache, because in those scenarios, the cache value has to be stored remotely/out of process. For cache handles like the DictionaryCacheHandle or SystemRuntimeCaching, serialization is not needed and doesn't have to be configured at all.

Why / When to use Update
Updating a cache item in a distributed cache is different from just changing the item within an in-process cache. With in-process caches, we can ensure thread safe writes, and with poco objects, the in-process cache will just keep the reference to that object and therefor always holds the same version for all threads.

@argaen
Copy link
Member

argaen commented Jun 17, 2017

Yup and python standard library lru_cache does the same:

from functools import lru_cache


@lru_cache()
def what():
    return [2]


def call():
    res = what()
    print(res)   # output is [2]
    res.append(3)
    res.append(5)
    res.append(8)
    another_res = what()
    print(another_res)   # output is  [2, 3, 5, 8]


call()

If python standard does that, I'm going for it too :)

@argaen argaen added the feature label Jun 17, 2017
@argaen
Copy link
Member

argaen commented Jun 17, 2017

So, TODO:

  • Add new NullSerializer with big disclaimer for mutability when used with memory and that it won't work (in most of the cases) with other backends
  • Each backend should have its own serializer class
  • Decorators should pick those backend serializers as defaults

@argaen argaen added this to the 0.7.0 milestone Jun 17, 2017
@argaen
Copy link
Member

argaen commented Jun 17, 2017

Hey @xdanmer, can you give master a try and see if the behavior now is the expected?

@eu9ene
Copy link
Author

eu9ene commented Jun 20, 2017

@argaen, yes, behavior is as expected in the latest master. NullSerializer works by default for memory cache. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants