by Jens Alfke
TouchDB is a lightweight CouchDB-compatible database engine suitable for embedding into mobile apps. Think of it this way: If CouchDB is MySQL, then TouchDB is SQLite.
By “CouchDB-compatible” I mean that it can replicate with Apache CouchDB, and that its data model and high-level design are “Couch-like” enough to make it familiar to CouchDB/Couchbase developers. Its REST API is invoked somewhat differently (since it runs in-process) but is nearly identical, apart from omitting some CouchDB features (like user accounts) that aren’t useful in mobile apps. Its implementation is not based on CouchDB’s (it’s not even written in Erlang.)
By “suitable for embedding into mobile apps”, I mean that it meets the following requirements:
And by “mobile apps” I’m focusing on iOS and Android, although there’s no reason we couldn’t extend this to other platforms like Windows Phone. And it’s not limited to mobile OSs — the initial Objective-C implementation runs on Mac OS as well.
Couchbase Mobile successfully crammed CouchDB into a form that’s embeddable into iOS and Android apps. But its code size (~4MB) and startup time (5-10 sec on typical devices) are serious problems, which deter some developers. Improving these is a top priority for the next major release.
Unfortunately, further optimizations were not easy or obvious. We had already stripped out all the unnecessary modules we could find, and startup time was dominated by the overhead of the Erlang bytecode loader and interpreter. Further improvements would require insight from Erlang/BEAM gurus and would likely be incremental. That’s not enough; even if we cut its code size and launch time in half, developers would still (justifiably) grumble.
Last fall (2011) we decided it was worth experimenting to see if a ground-up reimplementation focused on mobile needs could deliver the order-of-magnitude improvements we’d like to have. It quickly became clear that it did, and we’ve been working on TouchDB since.
TouchDB is implemented in the platform’s preferred language (Objective-C on iOS, Java on Android) as a library that developers link into their apps. Instead of a custom B-tree storage engine, it uses the ubiquitously-available SQLite. Instead of HTTP, it has a traditional API in the implementation language.
Language choices are always controversial, and I’ve talked a bit with other developers about what language to use for a mobile database. There seems to be agreement on native code rather than interpreted languages, for performance reasons. There are nontraditional languages that compile to native code (like Haskell) but they often have performance limitations due to other runtime factors like garbage collection or lack of mutable state.
That doesn’t leave a lot of choices. C is ubiquitous, but has no high-level data structures or object model; there are libraries like GLib that add these, but they add size. C+\+ doesn’t have those problems, but its complexity often leads to code bloat and hard-to-maintain code. In addition, both are tricky to write cross-platform code for since essential APIs like networking and threads are platform-specific. D is a sweet-looking language but has limited compiler support, which rules out its use on iOS or Android.
I’ve concluded that cross-platform code may be a luxury we don’t have room for in mobile. Instead, we should design a clean and well-documented architecture, write a solid reference implementation, and port that as needed. The source code isn’t that big anyway.
For iOS I’ve gone with Objective-C, which is the platform’s preferred language. As a superset of C it’s compact and understandable, it has a strong object model, and the Cocoa frameworks provide excellent data structures. It has the least ‘impedance mismatch’ on iOS, which will improve its code size and startup time.
On Android, Java is the best choice. While the JVM itself is large and slow to start up, that’s a non-issue for Android apps since Java is already running. The incremental performance is pretty good due to the high state of the art of JIT optimization. Moreover, calling back and forth between Java and native code through JNI has nontrivial overhead, which has been known to be a performance problem in some real projects. (There is a Java/Android port under development.)
This one’s obvious. iOS doesn’t allow apps to spawn subprocesses at all. Android does, but there are enough problems with subprocess cleanup that we stopped doing it. And since the database has only a single app as a client, it make sense to minimize overhead by putting it right in the client’s process.
CouchDB contains a very clever B-tree engine with a lot of valuable characteristics. But rewriting it in a new language would be a big undertaking. There are a lot of other storage engines out there (Berkeley DB, Kyoto Cabinet, LevelDB…) but we’d have to consider their code size and licenses.
In the end, I think SQLite is the best choice. It isn’t a perfect fit for TouchDB’s data model, but you can’t beat its code size (effectively zero since it’s already incorporated into every OS). It obviously doesn’t scale up especially well, but that’s not an issue for mobile apps. SQLite is already very widely used as the persistence layer for a lot of apps, especially considering that Apple’s CoreData framework uses it.
SQLite is of course B-tree based, but its low-level B-tree API is considered internal and not recommended for developer use. Instead we can just use the regular SQL API to define tables and indexes for what we need. This has the advantage of moving a fair bit of the database logic into SQL, making it cross-platform and reducing the size of the source code.
A REST API makes great sense for a traditional server, which has to be accessed via IPC anyway. It’s not such a good fit for an embedded storage engine. Couchbase Mobile used HTTP over the loopback interface, presenting the typical CouchDB API to the app, but this was problematic. The socket is available to other apps running on the device, which introduces security concerns (we had to turn on auth); and iOS sometimes closes the sockets while the app is in the background, requiring a restart of the HTTP server on wake. There’s also the overhead of marshaling application data to and from HTTP request/response messages.
Ideally we can instead craft an API in the app developer’s language, which will fit cleanly into the platform’s APIs and abstractions. CouchCocoa already does this on iOS, and Ektorp on Android.
It’s also feasible to write an HTTP adapter that can accept request objects and return responses, without going through an actual socket. The Objective-C implementation has one, which allows CouchCocoa to work with TouchDB unmodified just by changing the server URL scheme from “http:” to “touchdb:”.