by Jens Alfke
TouchDB’s replication protocol is compatible with CouchDB. This interoperability is an important feature, but implementing it was challenging because much of CouchDB’s replication protocol is undocumented. In the future I would like to see an explicit spec for replication, to ensure that different products remain compatible. For now I’ll document it here, as I understand it.
Note: If you want to follow along, this algorithm is implemented in TouchDB’s
TDReplicatorand its subclasses
TDPuller(plus a number of helper classes.)
These notes were derived from reading the API documentation on the CouchDB wiki and from conversation with engineers who’ve worked on CouchDB’s replicator (Damien Katz and Filipe Manana). But don’t take them as gospel.
There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.
Given a source and a target database, identify all current revisions (including deletions) in the source that do not exist in the target, and copy them (with contents, attachments and histories) to the target. Afterwards, all current revisions in the source exist at the target and have the same revision histories there.
Secondary goal: Do this without redundantly transferring the contents of any revisions that already exist at the target.
Note: A current revision is one that has not been replaced, i.e. a leaf node in the revision tree. Most of the time a document has only one current revision, but multiple current revisions can exist and that’s called a conflict.
_changesfeed, starting just past the last source sequence ID (if any). Use the “
?style=all_docs” URL parameter so that conflicting revisions will be included. In continuous replication you should use the “
?feed=longpoll” or “
?feed=continuous” mode and leave the algorithm running indefinitely to process changes as they occur. Filtered replication will specify the name of a filter function in this URL request.
_changesfeed and send them to the target database’s
_revs_diff. The result will contain the subset of those revisions that are not in the target database.
GETeach such revision from the source database. Use the
?revs=trueURL parameter to include its list of parent revisions, so the target database can update its revision tree. Use “
?attachments=true” so the revision data will include attachment bodies. Also use the “
?atts_since” query parameter to pass a list of revisions that the target already has, so the source can optimize by not including the bodies of attachments already known to the target.
_bulk_docsAPI, with the
new_edits:falseJSON property to preserve their revision IDs.
There’s also a ladder diagram which shows these steps along with the interaction between the replicator and source/target db’s.
— The replication algorithm does not have to run on either the source’s or target’s server. It could be run from anywhere with read access to the source and write access to the target. However, it’s nearly always run by either the source or target server (and TouchDB only supports those modes). Replication run by the source is commonly called a “push”, and by the target is called a “pull”. An implementation run by the source or target server may optimize by using lower-level APIs to operate on the local database; for example, it might listen for internal change notifications rather than reading the
— Replication does not transfer obsolete revisions of documents, only the current ones. This derives from the behavior of the
_changes feed, which only lists current revisions. Replication does transfer the revision history of each document, which is just the list of IDs of prior revisions; this is to make it possible for the database to identify common ancestors and merge revision histories into a tree.
— Sequence IDs are usually but not necessarily numeric. (Currently the only exception I know of is BigCouch.) Non-numeric sequence IDs are not intrinsically ordered, i.e. they are opaque strings that can only be compared for equality. To compare their ordering (when checkpointing) you have to keep an ordered list of sequence IDs as they appeared in the
_changes feed and compare their indices in that.
— For efficiency, the algorithm should run in parallel, as a data-flow system, with multiple steps active at the same time. This reduces the overhead of network and database latency.
— Also for efficiency, the number of revisions passed in a single
_bulk_docs call should be large. This means the implementation should group together revisions arriving from previous steps until a sufficient number have arrived or sufficient time has elapsed.
— From my limited testing, the performance bottleneck in the current algorithm seems to be in fetching the new revisions from the source. I think this is due to the overhead of handling many separate HTTP requests. It should be possible to speed up replication by introducing a new API call that fetches revisions in bulk. (The
_all_docs call can fetch a list of revisions, but currently can’t be told to include revision histories.)
- A limited case of the above-mentioned bulk-get optimization is possible with the current API: revisions of generation 1 (revision ID starts with “
1-”) can be fetched in bulk via
_all_docs, because by definition they have no revision histories. Unfortunately
_all_docs can’t include attachment bodies, so if it returns a document whose JSON indicates it has attachments, those will have to be fetched separately. Nonetheless, this optimization can help significantly, and is currently implemented in TouchDB.
These are the CouchDB REST API calls that TouchDB makes to the remote database.
GET /_local/checkpointid — To read the last checkpoint
PUT /_local/checkpointid — To save a new checkpoint
PUT /db — If told to create remote database
/_revs_diff— To find which revs are not known to the remote db
/_bulk_docs— To upload revisions
?new_edits=false— To upload a single doc with attachments
&heartbeat=heartbeat — To find changes since the last pull (feed will be
&revs=true&attachments=true&atts_since=lastrev — To download a single doc with attachments
/_all_docs?include_docs=true— To download first-generation revisions in bulk