Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Rob Vaterlaus
committed
Sep 14, 2010
0 parents
commit c596123
Showing
20 changed files
with
1,989 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<projectDescription> | ||
<name>django_cassandra_backend</name> | ||
<comment></comment> | ||
<projects> | ||
</projects> | ||
<buildSpec> | ||
<buildCommand> | ||
<name>org.python.pydev.PyDevBuilder</name> | ||
<arguments> | ||
</arguments> | ||
</buildCommand> | ||
</buildSpec> | ||
<natures> | ||
<nature>org.python.pydev.pythonNature</nature> | ||
</natures> | ||
</projectDescription> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<?xml version="1.0" encoding="UTF-8" standalone="no"?> | ||
<?eclipse-pydev version="1.0"?> | ||
|
||
<pydev_project> | ||
<pydev_property name="org.python.pydev.PYTHON_PROJECT_INTERPRETER">Default</pydev_property> | ||
<pydev_property name="org.python.pydev.PYTHON_PROJECT_VERSION">python 2.6</pydev_property> | ||
|
||
</pydev_project> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
Introduction | ||
============ | ||
This is an early development release of a Django backend for the Cassandra database. | ||
It has only been under development for a short time and there are almost certainly | ||
issues/bugs with this release -- see the end of this document for a list of known | ||
issues. Needless to say, you shouldn't use this release in a production setting, the | ||
format of the data stored in Cassandra may change in future versions, there's no | ||
promise of backwards compatibility with this version, and so on. | ||
|
||
Please let me know if you find any bugs or have any suggestions for how to improve | ||
the backend. You can contact me at: rob.vaterlaus@bigswitch.com | ||
|
||
Installation | ||
============ | ||
The backend requires the 0.7 version of Cassandra. 0.7 has several features | ||
(e.g. programmatic creation/deletion of keyspaces & column families, secondary index | ||
support) that are useful with running as a Django database backend, so I targeted | ||
that version instead of 0.6. Unfortunately, the Cassandra Thrift API changed between | ||
0.6 and 0.7 so the two version are incompatible. | ||
|
||
There's a beta1 version of 0.7 available at the Cassandra web site. I'm actually using a | ||
somewhat later daily binary release dated 8/23. I obtained the daily release by following | ||
the "Latest Builds" link in the Cassandra downloads page, but the last few times I've | ||
tried it that link was dead, so I'm not sure what's going on with that. I had switched | ||
to the 8/23 release, because I had read that there was an issue with the secondary | ||
index support in the beta1 release and I was trying to get secondary index support | ||
working in the backend. As it turns out I was still seeing problems with the 8.23 | ||
release, so I wound up disabling the secondary index code (details below), so it's | ||
possible/probable that the backend will work with the beta1 release, especially | ||
if you don't try to enable secondary index support (i.e. don't set the db_index | ||
to True for any of the fields). But I haven't tested with beta1, so no promises. | ||
|
||
The backend also requires the Django-nonrel fork of Django and djangotoolbox. | ||
Both are available here: <http://www.allbuttonspressed.com/projects/django-nonrel>. | ||
I installed the Django-nonrel version of Django globally in site-packages and | ||
copied djangotoolbox into the directory where I'm testing the Cassandra backend, | ||
but there are probably other (better?) ways to install those things. | ||
|
||
You also need to generate the Python Thrift API code as described in the Cassandra | ||
documentation and copy the generated "cassandra" directory (from Cassandra's | ||
interface/gen-py directory) over to the top-level Django project directory. | ||
|
||
To configure a project to use the Cassandra backend all you have to do is change | ||
the database settings in the settings.py file. Change the ENGINE value to be | ||
'django_cassandra.db' and the NAME value to be the name of the keyspace to use. | ||
You can set HOST and PORT to override the default values of 'localhost' and 9160. | ||
In theory you can also set USER and PASSWORD if you're using authentication with | ||
Cassandra, but this hasn't been tested yet, so it may not work. | ||
|
||
Configure Cassandra as described in the Cassandra documentation. | ||
If want to be able to do range queries over primary keys then you need to set the | ||
partitioner in the cassandra.yaml config file to be the OrderPreservingPartitioner. | ||
|
||
Once you're finished configuring Cassandra start up the Cassandra daemon process as | ||
described in the Cassandra documentation. | ||
|
||
Run syncdb. This creates the keyspace (if necessary) and the column families for the | ||
models in the installed apps. The Cassandra backend creates one column family per | ||
model. It will use the db_table value from the meta settings for the name of the | ||
column family if it's specified; otherwise it uses the default name similar to | ||
other backends. | ||
|
||
Now you should be able to use the normal model and query set calls from you | ||
Django code. | ||
|
||
This release includes a test project and app. If you want to use the backend in | ||
another project you just need to copy the django_cassandra directory to the | ||
top-level directory of the project (along with the cassandra and djangotoolbox | ||
directories). | ||
|
||
What Works | ||
========== | ||
- the basics: creating model instances, querying (get/filter/exclude), count, | ||
update/save, delete, order_by | ||
- efficient queries for exact matches on the primary key. It can also do range | ||
queries on the primary key, but your Cassandra cluster must be configured to use the | ||
OrderPreservingPartitioner if you want to do that. Unfortunately, currently it | ||
doesn't fail gracefully if you try to do range queries when using the | ||
RandomPartitioner, so just don't do that for now :-) | ||
- inefficient queries for everything else that can't be done efficiently in | ||
Cassandra. The basic approach used in the query processing code is to first try | ||
to prune the number of rows to look at by finding a part of the query that can | ||
be evaluated efficiently (i.e. a primary key filter predicate or a secondary | ||
index predicate, once that's working). Then it evaluates the remaining filter | ||
predicates over the pruned rows to obtain the final result. If there's no part | ||
of the query that can be evaluated efficiently, then it just fetches the entire | ||
set of rows and does all of the filtering in the backend code. | ||
- programmatic creation of the keyspace & column families via syncdb | ||
- Django admin UI, except for users in the auth application (see below) | ||
- I think all of the filter operations (e.g. gt, startswith, regex, etc.) are supported | ||
although it's possible I missed something | ||
- complex queries with Q nodes | ||
|
||
What Doesn't Work (Yet) | ||
======================= | ||
- Secondary Index Support: There's code in there to use secondary indexes, but | ||
I was seeing weird results when I tried to execute Cassandra queries using the | ||
secondary indexes so I disabled that code. Hopefully that's just an issue with the | ||
specific version of Cassandra I'm using, but I haven't tried it out with a more | ||
recent version to see if it's working now. If you're feeling adventurous you could | ||
try it out with a newer version of Cassandra and enable the secondary index code | ||
by setting the value of SECONDARY_INDEX_SUPPORT_ENABLED to True in predicate.py. | ||
You enable secondary index support for fields by setting the db_index argument to | ||
True when constructing the field. | ||
- I haven't tested all of the different field types, so there are probably | ||
issues there with how the data is converted to and from Cassandra with some of the | ||
field types. My use case was mostly string fields, so most of the testing was with | ||
that. I've also tried out date, datetime, time, and decimal fields, so I think | ||
those should work too, but I haven't tried anything else. | ||
- joins | ||
- chunked queries. It just tries to get everything all at once from Cassandra. | ||
Currently the maximum that it can get (i.e. the count value in the Cassandra | ||
Thrift API) is set semi-arbitrarily to 10000, so if you try to query over a | ||
column family with more rows (or columns) than that it may not work. | ||
Probably the value could be set higher than that, but at some point Cassandra | ||
fails if it's too big (i.e. it didn't work if I set it to 0x7fffffff). | ||
If you want to make it bigger you can change the MAX_FETCH_COUNT variable | ||
in compiler.py. | ||
- ListModel/ListField support from djangotoolbox (I think?). I haven't | ||
investigated how this works and if it's feasible to support in Cassandra, | ||
although I'm guessing it probably wouldn't be too hard. For now, this means | ||
that several of the unit tests from djangotoolbox fail if you have that | ||
in your installed apps. | ||
- there's no way to configure the settings used to create the keyspaces | ||
and column families (e.g. replication strategy, replication factor) or the | ||
read & write consistency levels used when querying or inserting/mutating | ||
columns in Cassandra. My plan was to add global database settings and | ||
per-model Meta settings to configure those things, but I haven't gotten to | ||
it yet. | ||
- Cassandra authentication. Actually this may work but I haven't tested it yet. | ||
There's code in there that tries to login to Cassandra if the USER and | ||
PASSWORD are specified in the database settings, but I've only tested with | ||
the AllowAllAuthenticator. | ||
- probably a lot of other stuff that I've forgotten or am unaware of :-) | ||
|
||
Known Issues | ||
============ | ||
- I haven't been able to get the admin UI to work for users in the Django | ||
authentication middleware. I included djangotoolbox in my installed apps, as | ||
suggested on the Django-nonrel web site, which got my further, but I still get | ||
an error in some Django template code that tries to render a change list (I think). | ||
I still need to track down what's going on there. | ||
- f you enable the authentication and session middleware a bunch of the | ||
associated unit tests fail if you run all of the unit tests. | ||
This may be related to the issue with editing users in the admin UI | ||
- the code needs a cleanup pass for things like the exception handling/safety, | ||
some refactoring, more pydoc comments, etc. | ||
- I have a feeling there are some places where I haven't completely leveraged | ||
the code in djangotoolbox, so there may be places where I haven't done | ||
things in the optimal way | ||
- the error handling/messaging isn't great for things like the Cassandra | ||
daemon not running, a versioning mismatch between client and Cassandra | ||
daemon, etc. |
Empty file.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
from djangotoolbox.db.base import NonrelDatabaseFeatures, \ | ||
NonrelDatabaseOperations, NonrelDatabaseWrapper, NonrelDatabaseClient, \ | ||
NonrelDatabaseValidation, NonrelDatabaseIntrospection, \ | ||
NonrelDatabaseCreation | ||
|
||
from thrift import Thrift | ||
from thrift.transport import TTransport | ||
from thrift.transport import TSocket | ||
from thrift.protocol import TBinaryProtocol | ||
from cassandra import Cassandra | ||
from cassandra.ttypes import * | ||
import time | ||
from .creation import DatabaseCreation | ||
from .introspection import DatabaseIntrospection | ||
|
||
class DatabaseFeatures(NonrelDatabaseFeatures): | ||
string_based_auto_field = True | ||
|
||
class DatabaseOperations(NonrelDatabaseOperations): | ||
compiler_module = __name__.rsplit('.', 1)[0] + '.compiler' | ||
|
||
def sql_flush(self, style, tables, sequence_list): | ||
for table_name in tables: | ||
self.connection.creation.flush_table(table_name) | ||
return "" | ||
|
||
class DatabaseClient(NonrelDatabaseClient): | ||
pass | ||
|
||
class DatabaseValidation(NonrelDatabaseValidation): | ||
pass | ||
|
||
# TODO: Maybe move this somewhere else? db.utils.py maybe? | ||
class CassandraConnection(object): | ||
def __init__(self, client, transport): | ||
self.client = client | ||
self.transport = transport | ||
|
||
def commit(self): | ||
pass | ||
|
||
def open(self): | ||
if self.transport: | ||
self.transport.open() | ||
|
||
def close(self): | ||
if self.transport: | ||
self.transport.close() | ||
|
||
class DatabaseWrapper(NonrelDatabaseWrapper): | ||
def __init__(self, *args, **kwds): | ||
super(DatabaseWrapper, self).__init__(*args, **kwds) | ||
|
||
# Set up the associated backend objects | ||
self.features = DatabaseFeatures(self) | ||
self.ops = DatabaseOperations(self) | ||
self.client = DatabaseClient(self) | ||
self.creation = DatabaseCreation(self) | ||
self.validation = DatabaseValidation(self) | ||
self.introspection = DatabaseIntrospection(self) | ||
|
||
# Get the host and port specified in the database backend settings. | ||
# Default to the standard Cassandra settings. | ||
host = self.settings_dict.get('HOST') | ||
if not host or host == '': | ||
host = 'localhost' | ||
port = self.settings_dict.get('PORT') | ||
if not port or port == '': | ||
port = 9160 | ||
|
||
# Create the client connection to the Cassandra daemon | ||
socket = TSocket.TSocket(host, port) | ||
transport = TTransport.TFramedTransport(TTransport.TBufferedTransport(socket)) | ||
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) | ||
client = Cassandra.Client(protocol) | ||
|
||
# Create our connection wrapper | ||
self.db_connection = CassandraConnection(client, transport) | ||
self.db_connection.open() | ||
|
||
version = client.describe_version() | ||
# FIXME: Should do some version check here to make sure that we're | ||
# talking to a cassandra daemon that supports the operations we require | ||
|
||
# Set up the Cassandra keyspace | ||
keyspace_name = self.settings_dict.get('NAME') | ||
self.creation.init_keyspace(keyspace_name) | ||
|
Oops, something went wrong.