You can clone with
MongoDB is great, but for this specific use case, Cassandra looks like a better choice, from a resources perspective.
If someone watching this project has a better idea, or would like to help, let me know.
This needs to be done soon(ish) if I'm going to keep growing the database. It's at 50GB (for about 14 million comments), and while I've got lots of hard-drive space, I don't have an infinite amount.
Which library are you thinking of using? I'm sure you already know, but pycassa doesn't seem to be supporting python3 yet: pycassa/pycassa#178
I'm not sure how much of a problem that will, but someone on SO recommended using the Python CQL.
Dammit. Maybe I'm better off using SQLAlchemy with PostgreSQL. At least then I get ACID. Plus, assuming I design things right, it should then be able to use SQLite for testing.
The coffeescript rewrite will support SQLite, MySQL, and PostgreSQL. When it comes down storage size, MySQL's compression should work well.