Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

[1.1.X] Created a 'DB optimization' topic, with cross-refs to relevan…

…t sections.

Also fixed #10291, which was related, and cleaned up some inconsistent doc labels.

Backport of r12229 from trunk


git-svn-id: http://code.djangoproject.com/svn/django/branches/releases/1.1.X@12230 bcc190cf-cafb-0310-a4f2-bffc1f526a37
  • Loading branch information...
commit ad6368809cd6583eda23f3aed5ccae0ae44f15b6 1 parent 9041b1a
Luke Plant authored January 16, 2010
2  docs/faq/models.txt
@@ -3,6 +3,8 @@
3 3
 FAQ: Databases and models
4 4
 =========================
5 5
 
  6
+.. _faq-see-raw-sql-queries:
  7
+
6 8
 How can I see the raw SQL queries Django is running?
7 9
 ----------------------------------------------------
8 10
 
3  docs/index.txt
@@ -70,7 +70,8 @@ The model layer
70 70
     * **Other:**
71 71
       :ref:`Supported databases <ref-databases>` |
72 72
       :ref:`Legacy databases <howto-legacy-databases>` |
73  
-      :ref:`Providing initial data <howto-initial-data>`
  73
+      :ref:`Providing initial data <howto-initial-data>` |
  74
+      :ref:`Optimize database access <topics-db-optimization>`
74 75
 
75 76
 The template layer
76 77
 ==================
33  docs/ref/models/querysets.txt
@@ -66,6 +66,18 @@ You can evaluate a ``QuerySet`` in the following ways:
66 66
       iterating over a ``QuerySet`` will take advantage of your database to
67 67
       load data and instantiate objects only as you need them.
68 68
 
  69
+    * **bool().** Testing a ``QuerySet`` in a boolean context, such as using
  70
+      ``bool()``, ``or``, ``and`` or an ``if`` statement, will cause the query
  71
+      to be executed. If there is at least one result, the ``QuerySet`` is
  72
+      ``True``, otherwise ``False``. For example::
  73
+
  74
+          if Entry.objects.filter(headline="Test"):
  75
+             print "There is at least one Entry with the headline Test"
  76
+
  77
+      Note: *Don't* use this if all you want to do is determine if at least one
  78
+      result exists, and don't need the actual objects. It's more efficient to
  79
+      use ``exists()`` (see below).
  80
+
69 81
 .. _pickling QuerySets:
70 82
 
71 83
 Pickling QuerySets
@@ -302,7 +314,7 @@ a model which defines a default ordering, or when using
302 314
 ordering was undefined prior to calling ``reverse()``, and will remain
303 315
 undefined afterward).
304 316
 
305  
-.. _querysets-distinct:
  317
+.. _queryset-distinct:
306 318
 
307 319
 ``distinct()``
308 320
 ~~~~~~~~~~~~~~
@@ -336,6 +348,8 @@ query spans multiple tables, it's possible to get duplicate results when a
336 348
     ``values()`` call.
337 349
 
338 350
 
  351
+.. _queryset-values:
  352
+
339 353
 ``values(*fields)``
340 354
 ~~~~~~~~~~~~~~~~~~~
341 355
 
@@ -616,7 +630,7 @@ call, since they are conflicting options.
616 630
 Both the ``depth`` argument and the ability to specify field names in the call
617 631
 to ``select_related()`` are new in Django version 1.0.
618 632
 
619  
-.. _extra:
  633
+.. _queryset-extra:
620 634
 
621 635
 ``extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)``
622 636
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1043,17 +1057,18 @@ Example::
1043 1057
 
1044 1058
 If you pass ``in_bulk()`` an empty list, you'll get an empty dictionary.
1045 1059
 
  1060
+.. _queryset-iterator:
  1061
+
1046 1062
 ``iterator()``
1047 1063
 ~~~~~~~~~~~~~~
1048 1064
 
1049 1065
 Evaluates the ``QuerySet`` (by performing the query) and returns an
1050  
-`iterator`_ over the results. A ``QuerySet`` typically reads all of
1051  
-its results and instantiates all of the corresponding objects the
1052  
-first time you access it; ``iterator()`` will instead read results and
1053  
-instantiate objects in discrete chunks, yielding them one at a
1054  
-time. For a ``QuerySet`` which returns a large number of objects, this
1055  
-often results in better performance and a significant reduction in
1056  
-memory use.
  1066
+`iterator`_ over the results. A ``QuerySet`` typically caches its
  1067
+results internally so that repeated evaluations do not result in
  1068
+additional queries; ``iterator()`` will instead read results directly,
  1069
+without doing any caching at the ``QuerySet`` level. For a
  1070
+``QuerySet`` which returns a large number of objects, this often
  1071
+results in better performance and a significant reduction in memory
1057 1072
 
1058 1073
 Note that using ``iterator()`` on a ``QuerySet`` which has already
1059 1074
 been evaluated will force it to evaluate again, repeating the query.
2  docs/topics/db/aggregation.txt
@@ -353,7 +353,7 @@ without any harmful effects, since that is already playing a role in the
353 353
 query.
354 354
 
355 355
 This behavior is the same as that noted in the queryset documentation for
356  
-:ref:`distinct() <querysets-distinct>` and the general rule is the same:
  356
+:ref:`distinct() <queryset-distinct>` and the general rule is the same:
357 357
 normally you won't want extra columns playing a part in the result, so clear
358 358
 out the ordering, or at least make sure it's restricted only to those fields
359 359
 you also select in a ``values()`` call.
1  docs/topics/db/index.txt
@@ -16,3 +16,4 @@ model maps to a single database table.
16 16
    managers
17 17
    sql
18 18
    transactions
  19
+   optimization
248  docs/topics/db/optimization.txt
... ...
@@ -0,0 +1,248 @@
  1
+.. _topics-db-optimization:
  2
+
  3
+============================
  4
+Database access optimization
  5
+============================
  6
+
  7
+Django's database layer provides various ways to help developers get the most
  8
+out of their databases. This documents gathers together links to the relevant
  9
+documentation, and adds various tips, organized under an number of headings that
  10
+outline the steps to take when attempting to optimize your database usage.
  11
+
  12
+Profile first
  13
+=============
  14
+
  15
+As general programming practice, this goes without saying. Find out :ref:`what
  16
+queries you are doing and what they are costing you
  17
+<faq-see-raw-sql-queries>`. You may also want to use an external project like
  18
+'django-debug-toolbar', or a tool that monitors your database directly.
  19
+
  20
+Remember that you may be optimizing for speed or memory or both, depending on
  21
+your requirements. Sometimes optimizing for one will be detrimental to the
  22
+other, but sometimes they will help each other. Also, work that is done by the
  23
+database process might not have the same cost (to you) as the same amount of
  24
+work done in your Python process. It is up to you to decide what your
  25
+priorities are, where the balance must lie, and profile all of these as required
  26
+since this will depend on your application and server.
  27
+
  28
+With everything that follows, remember to profile after every change to ensure
  29
+that the change is a benefit, and a big enough benefit given the decrease in
  30
+readability of your code. **All** of the suggestions below come with the caveat
  31
+that in your circumstances the general principle might not apply, or might even
  32
+be reversed.
  33
+
  34
+Use standard DB optimization techniques
  35
+=======================================
  36
+
  37
+...including:
  38
+
  39
+* Indexes. This is a number one priority, *after* you have determined from
  40
+  profiling what indexes should be added. Use :attr:`django.db.models.Field.db_index` to add
  41
+  these from Django.
  42
+
  43
+* Appropriate use of field types.
  44
+
  45
+We will assume you have done the obvious things above. The rest of this document
  46
+focuses on how to use Django in such a way that you are not doing unnecessary
  47
+work. This document also does not address other optimization techniques that
  48
+apply to all expensive operations, such as :ref:`general purpose caching
  49
+<topics-cache>`.
  50
+
  51
+Understand QuerySets
  52
+====================
  53
+
  54
+Understanding :ref:`QuerySets <ref-models-querysets>` is vital to getting good
  55
+performance with simple code. In particular:
  56
+
  57
+Understand QuerySet evaluation
  58
+------------------------------
  59
+
  60
+To avoid performance problems, it is important to understand:
  61
+
  62
+* that :ref:`QuerySets are lazy <querysets-are-lazy>`.
  63
+
  64
+* when :ref:`they are evaluated <when-querysets-are-evaluated>`.
  65
+
  66
+* how :ref:`the data is held in memory <caching-and-querysets>`.
  67
+
  68
+Understand cached attributes
  69
+----------------------------
  70
+
  71
+As well as caching of the whole ``QuerySet``, there is caching of the result of
  72
+attributes on ORM objects. In general, attributes that are not callable will be
  73
+cached. For example, assuming the :ref:`example weblog models
  74
+<queryset-model-example>`:
  75
+
  76
+  >>> entry = Entry.objects.get(id=1)
  77
+  >>> entry.blog   # Blog object is retrieved at this point
  78
+  >>> entry.blog   # cached version, no DB access
  79
+
  80
+But in general, callable attributes cause DB lookups every time::
  81
+
  82
+  >>> entry = Entry.objects.get(id=1)
  83
+  >>> entry.authors.all()   # query performed
  84
+  >>> entry.authors.all()   # query performed again
  85
+
  86
+Be careful when reading template code - the template system does not allow use
  87
+of parentheses, but will call callables automatically, hiding the above
  88
+distinction.
  89
+
  90
+Be careful with your own custom properties - it is up to you to implement
  91
+caching.
  92
+
  93
+Use the ``with`` template tag
  94
+-----------------------------
  95
+
  96
+To make use of the caching behaviour of ``QuerySet``, you may need to use the
  97
+:ttag:`with` template tag.
  98
+
  99
+Use ``iterator()``
  100
+------------------
  101
+
  102
+When you have a lot of objects, the caching behaviour of the ``QuerySet`` can
  103
+cause a large amount of memory to be used. In this case,
  104
+:ref:`QuerySet.iterator() <queryset-iterator>` may help.
  105
+
  106
+Do database work in the database rather than in Python
  107
+======================================================
  108
+
  109
+For instance:
  110
+
  111
+* At the most basic level, use :ref:`filter and exclude <queryset-api>` to
  112
+  filtering in the database to avoid loading data into your Python process, only
  113
+  to throw much of it away.
  114
+
  115
+* Use :ref:`F() object query expressions <query-expressions>` to do filtering
  116
+  against other fields within the same model.
  117
+
  118
+* Use :ref:`annotate to do aggregation in the database <topics-db-aggregation>`.
  119
+
  120
+If these aren't enough to generate the SQL you need:
  121
+
  122
+Use ``QuerySet.extra()``
  123
+------------------------
  124
+
  125
+A less portable but more powerful method is :ref:`QuerySet.extra()
  126
+<queryset-extra>`, which allows some SQL to be explicitly added to the query.
  127
+If that still isn't powerful enough:
  128
+
  129
+Use raw SQL
  130
+-----------
  131
+
  132
+Write your own :ref:`custom SQL to retrieve data <topics-db-sql>`. Use
  133
+``django.db.connection.queries`` to find out what Django is writing for you and
  134
+start from there.
  135
+
  136
+Retrieve everything at once if you know you will need it
  137
+========================================================
  138
+
  139
+Hitting the database multiple times for different parts of a single 'set' of
  140
+data that you will need all parts of is, in general, less efficient than
  141
+retrieving it all in one query. This is particularly important if you have a
  142
+query that is executed in a loop, and could therefore end up doing many database
  143
+queries, when only one was needed. So:
  144
+
  145
+Use ``QuerySet.select_related()``
  146
+---------------------------------
  147
+
  148
+Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it:
  149
+
  150
+* in view code,
  151
+
  152
+* and in :ref:`managers and default managers <topics-db-managers>` where
  153
+  appropriate. Be aware when your manager is and is not used; sometimes this is
  154
+  tricky so don't make assumptions.
  155
+
  156
+Don't retrieve things you don't need
  157
+====================================
  158
+
  159
+Use ``QuerySet.values()`` and ``values_list()``
  160
+-----------------------------------------------
  161
+
  162
+When you just want a dict/list of values, and don't need ORM model objects, make
  163
+appropriate usage of :ref:`QuerySet.values() <queryset-values>`.
  164
+These can be useful for replacing model objects in template code - as long as
  165
+the dicts you supply have the same attributes as those used in the template, you
  166
+are fine.
  167
+
  168
+Use QuerySet.count()
  169
+--------------------
  170
+
  171
+...if you only want the count, rather than doing ``len(queryset)``.
  172
+
  173
+But:
  174
+
  175
+Don't overuse ``count()``
  176
+-------------------------
  177
+
  178
+If you are going to need other data from the QuerySet, just evaluate it.
  179
+
  180
+For example, assuming an Email class that has a ``body`` attribute and a
  181
+many-to-many relation to User, the following template code is optimal:
  182
+
  183
+.. code-block:: html+django
  184
+
  185
+   {% if display_inbox %}
  186
+     {% with user.emails.all as emails %}
  187
+       {% if emails %}
  188
+         <p>You have {{ emails|length }} email(s)</p>
  189
+         {% for email in emails %}
  190
+           <p>{{ email.body }}</p>
  191
+         {% endfor %}
  192
+       {% else %}
  193
+         <p>No messages today.</p>
  194
+       {% endif %}
  195
+     {% endwith %}
  196
+   {% endif %}
  197
+
  198
+
  199
+It is optimal because:
  200
+
  201
+ 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False.
  202
+
  203
+ #. Use of ``with`` means that we store ``user.emails.all`` in a variable for
  204
+    later use, allowing its cache to be re-used.
  205
+
  206
+ #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called,
  207
+    which causes the ``user.emails.all()`` query to be run on the database, and
  208
+    at the least the first line to be turned into an ORM object. If there aren't
  209
+    any results, it will return False, otherwise True.
  210
+
  211
+ #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
  212
+    out the rest of the cache without doing another query.
  213
+
  214
+ #. The ``for`` loop iterates over the already filled cache.
  215
+
  216
+In total, this code does either one or zero database queries. The only
  217
+deliberate optimization performed is the use of the ``with`` tag. Using
  218
+``QuerySet.count()`` at any point would cause additional queries.
  219
+
  220
+Use ``QuerySet.update()`` and ``delete()``
  221
+------------------------------------------
  222
+
  223
+Rather than retrieve a load of objects, set some values, and save them
  224
+individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
  225
+<topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
  226
+<topics-db-queries-delete>` where possible.
  227
+
  228
+Note, however, that these bulk update methods cannot call the ``save()`` or ``delete()``
  229
+methods of individual instances, which means that any custom behaviour you have
  230
+added for these methods will not be executed, including anything driven from the
  231
+normal database object :ref:`signals <ref-signals>`.
  232
+
  233
+Don't retrieve things you already have
  234
+======================================
  235
+
  236
+Use foreign key values directly
  237
+-------------------------------
  238
+
  239
+If you only need a foreign key value, use the foreign key value that is already on
  240
+the object you've got, rather than getting the whole related object and taking
  241
+its primary key. i.e. do::
  242
+
  243
+   entry.blog_id
  244
+
  245
+instead of::
  246
+
  247
+   entry.blog.id
  248
+
2  docs/topics/db/sql.txt
@@ -83,6 +83,6 @@ An easier option?
83 83
 
84 84
 A final note: If all you want to do is a custom ``WHERE`` clause, you can just
85 85
 use the ``where``, ``tables`` and ``params`` arguments to the
86  
-:ref:`extra clause <extra>` in the standard queryset API.
  86
+:ref:`extra clause <queryset-extra>` in the standard queryset API.
87 87
 
88 88
 .. _Python DB-API: http://www.python.org/dev/peps/pep-0249/

0 notes on commit ad63688

Please sign in to comment.
Something went wrong with that request. Please try again.