Skip to content

Multi Channel Replication

Chris Anderson edited this page Jul 30, 2013 · 2 revisions

Multi-Channel Replication

The Sync Gateway supports a superset of the CouchDB replication protocol, extended to support channels. Channels are like tags applied to documents. They have several purposes:

  • Subsetting a database, to make it convenient and efficient for a client to sync with only the data it needs. (For example, channels like "sales", "marketing", "2012" could be used to identify documents relevant to certain roles in an organization.)
  • Access control, preventing clients from seeing data they shouldn't. (For example, a "zoe" channel can be used to mark documents visible to user Zoe. Her client won't be able to replicate or view documents not visible to her.)
  • Automatically configuring what data clients receive based on the client's settings. (For example, if documents are tagged with month and year channels, a client can adjust its subscriptions based on what years the user selects in the UI.)

Data Model

Channels

A channel is merely a string. For the sake of sanity, the character set is somewhat restricted. There are no objects or entities that directly represent channels.

Documents

Every revision of a document has a set of channels. These aren't a visible part of the document, rather they're invisible metadata.

Channels are assigned by an application-defined sync function. This JavaScript function is called every time a new revision is added to the database. It uses the document contents to decide what channels the document should be in.

Users

Every user account has a set of channels it's allowed to access. Any attempt to read a document that isn't tagged with one of the user's channels will fail with a 401 Unauthorized status. Replication is implicitly filtered to the user's accessible channels, as is _all_docs.

A user has a document containing its properties; this document is only available via the REST API on the admin port, not on the regular port.

A user's set of channels is derived from several sources:

  1. The user document has an explicit list of channels.
  2. A user inherits the channels available to each of its roles (q.v.)
  3. A sync function can grant a user access to channels.

To grant access, the sync function calls a special access(users,channels) function. This grants the given user or users access to the given channels. This allows documents to define access control: for example, a document representing a chat could have a members property that lists who's allowed to view that chat.

Roles

A role is an entity like a Unix group. Like a user, a role is a document and has a name and access to a set of channels. Users may belong to zero or more roles, and inherit the channel access of their roles.

Like a user, a role has an explicit list of channels, and can be granted access by a sync function. Roles cannot inherit from other roles.

Implementation

Filtering By Channels

When a client reads the _changes feed, the set of documents returned is filtered by channels. The set of channels accessible to the user is always taken into account. The client can also pass a list of channels as a query parameter, which will be intersected with the set of accessible channels to further limit the output.

The Sync Gateway filters efficiently. It creates a special Couchbase view that tracks which documents are in which channels. The keys in this view are of the form [channel, sequence] where sequence is the sequence number of the document's revision. It's therefore efficient to use a query to retrieve all the documents in a given channel starting from a certain sequence, which is exactly what the _changes feed needs.

The feed actually needs to look at multiple channels. It does this by running one query per channel and folding the results together so that no document appears more than once. The folding can be done by walking through the query results in parallel: since each is ordered by sequence, documents will appear in the same order in each list, so it's easy to match and remove duplicates. (This also allows the feed's output to list the subset of channels each returned document belongs to.)

Access Control

Users and roles can be granted access to channels by sync functions. This means the accessible channels can change dynamically as documents are updated. This is done as follows:

  • Every document revision's metadata has a property that lists the access it grants. Every call to access() by the sync function adds to this list.
  • Every user/role document has an all_channels property that caches the complete set of channels it's allowed to access. This is normally a JSON array but can be invalidated by setting it to nil.
  • When a document is updated and either the previous or the current revision grants channel access, all of the users and roles named in both revisions' access lists have their all_channels properties cleared to nil to invalidate them.
  • When a user's or role's channels are looked up, but all_channels is nil, it's regenerated by querying a Couchbase view that's based on that document access property. The keys of this view are user/role names, and the values are the lists of channels granted. So one query of this view finds all of the granted channels. The results are saved back to the all_channels property.
Clone this wiki locally