Skip to content
This repository

by Jens Alfke

Introduction

TouchDB’s replication protocol is compatible with CouchDB. This interoperability is an important feature, but implementing it was challenging because much of CouchDB’s replication protocol is undocumented. In the future I would like to see an explicit spec for replication, to ensure that different products remain compatible. For now I’ll document it here, as I understand it.

Note: If you want to follow along, this algorithm is implemented in TouchDB’s TDReplicator and its subclasses TDPusher and TDPuller (plus a number of helper classes.)

These notes were derived from reading the API documentation on the CouchDB wiki and from conversation with engineers who’ve worked on CouchDB’s replicator (Damien Katz and Filipe Manana). But don’t take them as gospel.

Protocol? What Protocol?

There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.

Algorithm

Goal

Given a source and a target database, identify all current revisions (including deletions) in the source that do not exist in the target, and copy them (with contents, attachments and histories) to the target. Afterwards, all current revisions in the source exist at the target and have the same revision histories there.

Secondary goal: Do this without redundantly transferring the contents of any revisions that already exist at the target.

Note: A current revision is one that has not been replaced, i.e. a leaf node in the revision tree. Most of the time a document has only one current revision, but multiple current revisions can exist and that’s called a conflict.

Steps

  1. Get a unique identifier from the source database (which may just be its URL).
  2. Use this identifier to generate the doc ID of a special (_local, non-replicated) document of the target database, to look up a stored value: the last source sequence ID (also called a “checkpoint”) that was read and processed by the previous replication. (It’s OK if this value is missing for some reason; it’s just an optimization.)
  3. Fetch the source database’s _changes feed, starting just past the last source sequence ID (if any). Use the “?style=all_docsURL parameter so that conflicting revisions will be included. In continuous replication you should use the “?feed=longpoll” or “?feed=continuous” mode and leave the algorithm running indefinitely to process changes as they occur. Filtered replication will specify the name of a filter function in this URL request.
  4. Collect a group of document/revision ID pairs from the _changes feed and send them to the target database’s _revs_diff. The result will contain the subset of those revisions that are not in the target database.
  5. GET each such revision from the source database. Use the ?revs=true URL parameter to include its list of parent revisions, so the target database can update its revision tree. Use “?attachments=true” so the revision data will include attachment bodies. Also use the “?atts_since” query parameter to pass a list of revisions that the target already has, so the source can optimize by not including the bodies of attachments already known to the target.
  6. Collect a group of revisions fetched by the previous step, and store them into the target database using the _bulk_docs API, with the new_edits:false JSON property to preserve their revision IDs.
  7. After a group of revisions is stored, save a checkpoint: update the last source sequence ID value in the target database. It should be the latest sequence ID for which its revision and all prior to it have been added to the target. (Even if some revisions are rejected by a target validation handler, they still count as ‘added’ for this purpose.)

There’s also a ladder diagram which shows these steps along with the interaction between the replicator and source/target db’s.

Notes

— The replication algorithm does not have to run on either the source’s or target’s server. It could be run from anywhere with read access to the source and write access to the target. However, it’s nearly always run by either the source or target server (and TouchDB only supports those modes). Replication run by the source is commonly called a “push”, and by the target is called a “pull”. An implementation run by the source or target server may optimize by using lower-level APIs to operate on the local database; for example, it might listen for internal change notifications rather than reading the _changes feed.

— Replication does not transfer obsolete revisions of documents, only the current ones. This derives from the behavior of the _changes feed, which only lists current revisions. Replication does transfer the revision history of each document, which is just the list of IDs of prior revisions; this is to make it possible for the database to identify common ancestors and merge revision histories into a tree.

— Sequence IDs are usually but not necessarily numeric. (Currently the only exception I know of is BigCouch.) Non-numeric sequence IDs are not intrinsically ordered, i.e. they are opaque strings that can only be compared for equality. To compare their ordering (when checkpointing) you have to keep an ordered list of sequence IDs as they appeared in the _changes feed and compare their indices in that.

Performance

— For efficiency, the algorithm should run in parallel, as a data-flow system, with multiple steps active at the same time. This reduces the overhead of network and database latency.

— Also for efficiency, the number of revisions passed in a single _revs_diff or _bulk_docs call should be large. This means the implementation should group together revisions arriving from previous steps until a sufficient number have arrived or sufficient time has elapsed.

— From my limited testing, the performance bottleneck in the current algorithm seems to be in fetching the new revisions from the source. I think this is due to the overhead of handling many separate HTTP requests. It should be possible to speed up replication by introducing a new API call that fetches revisions in bulk. (The _all_docs call can fetch a list of revisions, but currently can’t be told to include revision histories.)

- A limited case of the above-mentioned bulk-get optimization is possible with the current API: revisions of generation 1 (revision ID starts with “1-”) can be fetched in bulk via _all_docs, because by definition they have no revision histories. Unfortunately _all_docs can’t include attachment bodies, so if it returns a document whose JSON indicates it has attachments, those will have to be fetched separately. Nonetheless, this optimization can help significantly, and is currently implemented in TouchDB.

API Calls Used

These are the CouchDB REST API calls that TouchDB makes to the remote database.

  • GET /_local/checkpointid — To read the last checkpoint
  • PUT /_local/checkpointid — To save a new checkpoint

Push Only:

  • PUT /db — If told to create remote database
  • POST /db /_revs_diff — To find which revs are not known to the remote db
  • POST /db /_bulk_docs — To upload revisions
  • POST /db /docid ?new_edits=false — To upload a single doc with attachments

Pull Only:

  • GET /db /_changes?style=all_docs&feed=feed &since=since &limit=limit &heartbeat=heartbeat — To find changes since the last pull (feed will be normal or longpoll)
  • GET /db /docid ?rev=revid &revs=true&attachments=true&atts_since=lastrev — To download a single doc with attachments
  • POST /db /_all_docs?include_docs=true — To download first-generation revisions in bulk
Something went wrong with that request. Please try again.