Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Added documentation to explain the gains and losses when using utf8_bin

collation in MySQL. This should help people to make a reasonably informed
decision. Usually, leaving the MySQL collation alone will be the best solution,
but if you must change it, this gives a start to the information you need and
pointers to the appropriate place in the MySQL docs.

There's a small chance I also got all the necessary Sphinx markup correct, too
(it builds without errors, but I may have missed some chances for glory and
linkage).

Fixed #2335, #8506.


git-svn-id: http://code.djangoproject.com/svn/django/trunk@8568 bcc190cf-cafb-0310-a4f2-bffc1f526a37
  • Loading branch information...
commit f2b389b354165cceb578aa3b13bec88f0e44c654 1 parent b2c2c3a
Malcolm Tredinnick malcolmt authored
59 docs/ref/databases.txt
View
@@ -95,6 +95,65 @@ This ensures all tables and columns will use UTF-8 by default.
.. _create your database: http://dev.mysql.com/doc/refman/5.0/en/create-database.html
+.. _mysql-collation:
+
+Collation settings
+~~~~~~~~~~~~~~~~~~
+
+The collation setting for a column controls the order in which data is sorted
+as well as what strings compare as equal. It can be set on a database-wide
+level and also per-table and per-column. This is `documented thoroughly`_ in
+the MySQL documentation. In all cases, you set the collation by directly
+manipulating the database tables; Django doesn't provide a way to set this on
+the model definition.
+
+.. _documented thoroughly: http://dev.mysql.com/doc/refman/5.0/en/charset.html
+
+By default, with a UTF-8 database, MySQL will use the
+``utf8_general_ci_swedish`` collation. This results in all string equality
+comparisons being done in a *case-insensitive* manner. That is, ``"Fred"`` and
+``"freD"`` are considered equal at the database level. If you have a unique
+constraint on a field, it would be illegal to try to insert both ``"aa"`` and
+``"AA"`` into the same column, since they compare as equal (and, hence,
+non-unique) with the default collation.
+
+In many cases, this default will not be a problem. However, if you really want
+case-sensitive comparisons on a particular column or table, you would change
+the column or table to use the ``utf8_bin`` collation. The main thing to be
+aware of in this case is that if you are using MySQLdb 1.2.2, the database backend in Django will then return
+bytestrings (instead of unicode strings) for any character fields it returns
+receive from the database. This is a strong variation from Django's normal
+practice of *always* returning unicode strings. It is up to you, the
+developer, to handle the fact that you will receive bytestrings if you
+configure your table(s) to use ``utf8_bin`` collation. Django itself should work
+smoothly with such columns, but if your code must be prepared to call
+``django.utils.encoding.smart_unicode()`` at times if it really wants to work
+with consistent data -- Django will not do this for you (the database backend
+layer and the model population layer are separated internally so the database
+layer doesn't know it needs to make this conversion in this one particular
+case).
+
+If you're using MySQLdb 1.2.1p2, Django's standard
+:class:`~django.db.models.CharField` class will return unicode strings even
+with ``utf8_bin`` collation. However, :class:`~django.db.models.TextField`
+fields will be returned as an ``array.array`` instance (from Python's standard
+``array`` module). There isn't a lot Django can do about that, since, again,
+the information needed to make the necessary conversions isn't available when
+the data is read in from the database. This problem was `fixed in MySQLdb
+1.2.2`_, so if you want to use :class:`~django.db.models.TextField` with
+``utf8_bin`` collation, upgrading to version 1.2.2 and then dealing with the
+bytestrings (which shouldn't be too difficult) is the recommended solution.
+
+Should you decide to use ``utf8_bin`` collation for some of your tables with
+MySQLdb 1.2.1p2, you should still use ``utf8_collation_ci_swedish`` (the
+default) collation for the :class:`django.contrib.sessions.models.Session`
+table (usually called ``django_session`` and the table
+:class:`django.contrib.admin.models.LogEntry` table (usually called
+``django_admin_log``). Those are the two standard tables that use
+:class:`~django.db.model.TextField` internally.
+
+.. _fixed in MySQLdb 1.2.2: http://sourceforge.net/tracker/index.php?func=detail&aid=1495765&group_id=22307&atid=374932
+
Connecting to the database
--------------------------
15 docs/ref/models/fields.txt
View
@@ -340,6 +340,14 @@ The admin represents this as an ``<input type="text">`` (a single-line input).
The maximum length (in characters) of the field. The max_length is enforced
at the database level and in Django's validation.
+.. admonition:: MySQL users
+
+ If you are using this field with MySQLdb 1.2.2 and the ``utf8_bin``
+ collation (which is *not* the default), there are some issues to be aware
+ of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
+ details.
+
+
``CommaSeparatedIntegerField``
------------------------------
@@ -689,6 +697,13 @@ Like an :class:`IntegerField`, but only allows values under a certain
A large text field. The admin represents this as a ``<textarea>`` (a multi-line
input).
+.. admonition:: MySQL users
+
+ If you are using this field with MySQLdb 1.2.1p2 and the ``utf8_bin``
+ collation (which is *not* the default), there are some issues to be aware
+ of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
+ details.
+
``TimeField``
-------------
17 docs/ref/models/querysets.txt
View
@@ -729,16 +729,13 @@ anything. It has now been changed to behave the same as ``id__isnull=True``.
.. admonition:: MySQL comparisons
- In MySQL, whether or not ``exact`` comparisons are case-sensitive depends
- upon the collation setting of the table involved. The default is usually
- ``latin1_swedish_ci`` or ``utf8_swedish_ci``, which results in
- case-insensitive comparisons. Change the collation to
- ``latin1_swedish_cs`` or ``utf8_bin`` for case sensitive comparisons.
-
- For more details, refer to the MySQL manual section about `character sets
- and collations`_.
-
-.. _character sets and collations: http://dev.mysql.com/doc/refman/5.0/en/charset.html
+ In MySQL, whether or not ``exact`` comparisons are case-insensitive by
+ default. This is controlled by the collation setting on the database
+ tables (this is a database setting, *not* a Django setting). It is
+ possible to configured you MySQL tables to use case-sensitive comparisons,
+ however there are some trade-offs involved. For more information about
+ this, see the :ref:`collation section <mysql-collation>` in the
+ :ref:`databases <ref-databases>` documentation.
iexact
~~~~~~
Please sign in to comment.
Something went wrong with that request. Please try again.