-
Notifications
You must be signed in to change notification settings - Fork 147
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds a whole comparison with other ORM caches.
- Loading branch information
1 parent
6249ade
commit 130be2a
Showing
7 changed files
with
230 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
Introduction | ||
------------ | ||
|
||
Should you use it? | ||
.................. | ||
|
||
Django-cachalot is the perfect speedup tool for most Django projects. | ||
It will speedup a website of 100 000 visits per month without any problem. | ||
In fact, **the more visitors you have, the faster the website becomes**. | ||
That’s because every possible SQL query on the project ends up being cached. | ||
|
||
Django-cachalot is especially efficient in the Django administration website | ||
since it’s unfortunately badly optimised (use foreign keys in list_editable | ||
if you need to be convinced). | ||
|
||
However, it’s not suited for projects where there is **a high number | ||
of modifications per minute** on each table, like a social network with | ||
more than a 30 messages per minute. Django-cachalot may still give a small | ||
speedup in such cases, but it may also slow things a bit | ||
(in the worst case scenario, a 20% slowdown, | ||
according to :ref:`the benchmark <Benchmark>`). | ||
If you have a website like that, optimising your SQL database and queries | ||
is the number one thing you have to do. | ||
|
||
There is also an obvious case where you don’t need django-cachalot: | ||
when the project is already fast enough (all pages load in less than 300 ms). | ||
Like any other dependency, django-cachalot is a potential source of problems | ||
(even though it’s currently bug free). | ||
Don’t use dependencies you can avoid, a “future you” may thank you for that. | ||
|
||
Features | ||
........ | ||
|
||
- **Saves in cache the results of any SQL query** generated by the Django ORM | ||
that reads data. These saved results are then returned instead | ||
of executing the same SQL query, which is faster. | ||
- The first time a query is executed is about 10% slower, then the following | ||
times are way faster (7× faster being the average). | ||
- Automatically invalidates saved results, | ||
so that **you never get stale results**. | ||
- **Invalidates per table, not per object**: if you change an object, | ||
all the queries done on other objects of the same model are also invalidated. | ||
This is unfortunately technically impossible to make a reliable | ||
per-object cache. Don’t be fooled by packages pretending having | ||
that per-object feature, they are unreliable and dangerous for your data. | ||
- **Handles everything in the ORM**. You can use the most advanced features | ||
from the ORM without a single issue, django-cachalot is extremely robust. | ||
- An easy control thanks to :ref:`settings` and :ref:`a simple API <API>`. | ||
But that’s only required if you have a complex infrastructure. Most people | ||
will never use settings or the API. | ||
- A few bonus features like | ||
:ref:`a signal triggered at each database change <Signal>` | ||
(including bulk changes) and | ||
:ref:`a template tag for a better template fragment caching <Template tag>`. | ||
|
||
Comparison with similar tools | ||
............................. | ||
|
||
This comparison was done in October 2015. It compares django-cachalot | ||
to the other popular automatic ORM caches at the moment: | ||
`django-cache-machine <https://github.com/django-cache-machine/django-cache-machine>`_ | ||
& `django-cacheops <https://github.com/Suor/django-cacheops>`_. | ||
|
||
Features | ||
~~~~~~~~ | ||
|
||
======================================================== ========= ============= ========= | ||
Feature cachalot cache-machine cacheops | ||
======================================================== ========= ============= ========= | ||
Type of invalidation per table per object per table | ||
CPU & memory performance optimal bad terrible | ||
Easy to install ✔ ✘ quite | ||
Cache agnostic ✔ ✔ ✘ | ||
Reliable ✔ ✘ quite | ||
Handles ``QuerySet.count`` ✔ ✘ ✔ | ||
Handles empty queries ✔ ✘ ✔ | ||
Handles multi-table inheritance ✔ probably not ✘ | ||
Handles proxy models ✔ ✘ ✔ | ||
Handles many-to-many fields ✔ ✘ ✔ | ||
Handles transactions ✔ probably not ✘ | ||
Handles ``QuerySet.aggregate``/``annotate`` ✔ probably not ✘ | ||
Handles ``QuerySet.bulk_create``/``update``/``delete`` ✔ probably not ✘ | ||
Handles ``QuerySet.select_related``/``prefetch_related`` ✔ partially ✘ | ||
Handles ``cursor.execute`` ✔ ✘ ✘ | ||
Handles GeoDjango ✔ maybe ✔ | ||
Handles django.contrib.postgres ✔ maybe partially | ||
======================================================== ========= ============= ========= | ||
|
||
To find if a package supports a feature, I searched in the documentation, | ||
the issues, the tests and the code. | ||
I really tried to avoid writing “maybe”, “probably not”, etc. | ||
Unfortunately, the absence of tests for such cases and sometimes the confusion | ||
of the authors themselves about these features makes it difficult to know | ||
whether they support a feature or not. | ||
|
||
Explanations | ||
~~~~~~~~~~~~ | ||
|
||
Of course, I can’t just throw a table with such | ||
“Reliable” and “CPU & memory performance” lines without explanation. | ||
My goal is not to start another stupid open source conflict, nor | ||
to be pretentious about my work. I’m just trying to inform users here, so they | ||
can fully grasp the consequences of using one or another tool. | ||
I actually used django-cache-machine in production for a week | ||
and django-cacheops for a month. On both solutions, I faced a lot | ||
of invalidation issues, and the bigger the cache became, | ||
the worst the performance was. | ||
|
||
I now know the reason of these issues: in short, this is due to | ||
their invalidation systems. Read the following paragraphs for more detail. | ||
|
||
django-cache-machine | ||
'''''''''''''''''''' | ||
|
||
django-cache-machine is using “flush lists” to remember which SQL queries are | ||
linked to which objects. This is the approach I chose when I created | ||
a prototype of django-cachalot, except it was invalidated per table, | ||
not per object like django-cache-machine does. Unfortunately, there are several | ||
important issues due to this approach that lead me to drop it. | ||
|
||
The smaller issue is that each time you execute a new SQL query, | ||
django-cache-machine needs to fetch the “flush list” from the cache, | ||
update it and add it back to the cache. This means we have to make two | ||
cache calls in addition of the cache call to store the SQL query results. | ||
It may seem small tiny, but when your cache size increases, | ||
the “flush lists” start becoming huge (a list of hundreds of cache keys | ||
for each database object), leading to an exponentially growing cache size | ||
and a longer time to fetch the always-growing “flush list”. | ||
So **bad memory and CPU usage**. | ||
|
||
The second issue is only linked to the per object invalidation. | ||
When django-cache-machine invalidates an object, it also needs to invalidate | ||
the queries of the related objects, otherwise they may contain stale data. | ||
Django-cache-machine invalidates foreign keys only, not many-to-many | ||
or generic foreign keys (because… I don’t know). This degrades performance | ||
at each writing operation to the database, because it needs to fetch | ||
related objects, fetch “flush lists” and delete these cache keys. | ||
And of course it can’t invalidate basic queries such as count or empty queries | ||
(probably aggregations too, but I’m not sure). | ||
|
||
And at last but not least: a critical issue. It simply proves that the | ||
django-cache-machine team **doesn’t know how caches work**. | ||
Caches are fast because they are stupid: when your cache is full and | ||
needs room, it randomly fetches a few keys, selects the older ones if possible | ||
then deletes them. This means that **a cache key with a 1 year timeout | ||
can be deleted before a cache key with a 1 minute timeout**. | ||
But django-cache-machine assumes its “flush lists” will always stay longer | ||
in cache than the saved query results will, because they have the same timeout | ||
and “flush list” are saved a few milli-seconds after query results. | ||
Until the cache is full, this is kind of true because no cache key is deleted. | ||
But when it is full, the “flush list” can be removed at any moment, | ||
so the other cache keys will never be invalidated until they are deleted. | ||
|
||
**To sum up, django-cache-machine has bad memory and CPU performance | ||
and is absolutely not reliable.** | ||
|
||
django-cacheops | ||
''''''''''''''' | ||
|
||
django-cacheops uses | ||
`a debug feature from Redis, KEYS, <http://redis.io/commands/KEYS>`_ | ||
to invalidate cache keys (that’s why it only supports Redis). | ||
It’s a feature that becomes linearly slower as your cache size grows. | ||
I measured, one single call of this command by django-cacheops | ||
slows down any database save by 50 ms to 3.5 seconds, | ||
depending on your database and cache sizes. | ||
The problem is also that django-cacheops runs this command several times | ||
at each save. Suppose you have a model with 3 many-to-many. Suppose you save | ||
an object with 3 related objects each many-to-many. django-cacheops will run | ||
the Redis ``KEYS`` command at least 10 times! If you have | ||
a large cache and database, it means **you can wait 30 seconds | ||
while this object is saved!** | ||
|
||
Another bad consequence of that use of the ``KEYS`` command is that Redis jumps | ||
to a 100% CPU usage when the command is running, degrading performance for | ||
other users or even blocking them until the command is finished. | ||
|
||
In a general way, the workflow of django-cacheops is totally unoptimised. | ||
When an object is modified, an ``invalidate_obj`` function is called, | ||
calling an ``invalidate_dict`` function, calling the ``manage.py invalidate`` | ||
command with a serialized version of the object (yes!) | ||
calling an ``invalidate_model`` function that calls the Redis ``KEYS`` command | ||
to get all the cache keys from that model then delete them. | ||
And as I said above, it executes all that N times, | ||
N being the number of related objects to the current object, | ||
even though multiple objects have the same model and we therefore | ||
don’t need to invalidate the model multiple times. | ||
|
||
**To sum up, django-cacheops has a terrible performance, | ||
but is reliable on what it handles. | ||
If you set it up correctly and never use some features such as | ||
transactions (used by Django admin), | ||
multi-table inheritance, or | ||
raw queries (the three features being used by Wagtail and django CMS), | ||
you’re good to go.** | ||
|
||
Number of lines of code | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Django-cachalot tries to be as minimalist as possible, while handling most | ||
use cases. Being minimalist is essential to create maintainable projects, | ||
and having a large test suite is essential to get an excellent quality. | ||
The statistics below speak for themselves… | ||
|
||
============ ======== ============= ======== | ||
Project part cachalot cache-machine cacheops | ||
============ ======== ============= ======== | ||
Application 743 843 1662 | ||
Tests 3023 659 1491 | ||
============ ======== ============= ======== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters