Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer new revisions as deltas #168

Closed
snej opened this issue Oct 24, 2013 · 3 comments

Comments

@snej
Copy link
Member

commented Oct 24, 2013

The replicator could transfer a new revision as a delta encoding from the latest base revision that's known to both parties. (Ditto for attachments.)

Pros:

  • This would save a significant amount of bandwidth, especially for large documents, and doubly-especially for large attachments.

Cons:

  • The source database needs to keep older revisions around so it can generate deltas from them (or perhaps it can keep just those deltas themselves); this increases storage requirements.
  • Creating deltas takes CPU; I think the expense is roughly comparable to zip-type compression.
  • Canonical JSON encoding becomes crucial, because the delta has to be applied to the exact sequence of bytes it was generated from.
  • Requires defining new REST calls or at least options.
  • The improvement may be drowned out by the overhead of per-revision GET requests, unless we do this as part of the proposed _bulk_get request or a WebSocket-based API.
@snej

This comment has been minimized.

Copy link
Member Author

commented Oct 24, 2013

This issue of course implies the existence of a corresponding issue in sync_gateway.

Couchbase Lite (and/or the gateway) could also store non-leaf revisions of documents as reverse deltas from the child revisions — this is the same trick most version-control systems use. Basically that means that, when saving a new revision B whose parent is A, you compute the delta from B to A (A-B) and replace A's JSON with the pair {delta, B.revid}. To reconstitute A you fetch or reconstitute B, and then apply the delta to it.

The annoying part is that, if pushing revision B, the delta you need to send is the inverse (B-A). I think some delta algorithms can derive the inverse directly from the delta; otherwise you have to first reconstitute A, then compute B-A and send that. For this reason it might be best to defer compacting A to a delta until after B has been pushed to at least one destination.

@snej

This comment has been minimized.

Copy link
Member Author

commented Oct 24, 2013

Some delta algorithms:

  • zdelta — cleverly derived from zlib. BSD-type license.
  • xdelta 3 — supposed to be very efficient. GPL-licensed 👎 with commercial licenses available.
  • open-vcdiff — Generates same delta format as xdelta3. Google project. Apache license.
  • JSON Patch format — I think this is kind of silly since it's so verbose, but it might be effective for really big documents?

@snej snej added the icebox label Jun 20, 2014

@jessliu jessliu modified the milestones: Future, 1.2.0 Oct 10, 2014

@zgramana zgramana added backlog and removed icebox labels Feb 20, 2015

@snej

This comment has been minimized.

Copy link
Member Author

commented Feb 20, 2015

See also the more-recent internal design doc.

snej added a commit to couchbase/sync_gateway that referenced this issue Feb 23, 2015

Added delta-compression support for getting attachments
* With GET /db/doc, adding "?deltas=true" will allow delta-compression
  of attachments, provided both ?attachments and ?atts_since are
  specified (the latter is needed so that the gateway knows which
  versions of attachments the client already has.)
  Delta-encoded attachments will be indicated by an "encoding" value of
  "zdelta", and an extra "deltaSrc" property whose value is the digest
  of the attachment to use as the delta source.
* The same is true of _bulk_get.
* With GET /db/doc/attachment, adding "?deltas=XXX,YYY,XXX", where the
  values are digests of previous versions of the attachment, will allow
  delta compression. The response will have a Content-Encoding "zdelta"
  and a "X-Delta-Source" header whose value is the digest of the source
  attachment.

See https://github.com/couchbaselabs/couchbase-lite-api/wiki/Delta-Compression
For #452; see also couchbase/couchbase-lite-ios#168

snej added a commit that referenced this issue Feb 24, 2015

Delta compression for pulling attachments
Obviously requires an updated compatible Sync Gateway. But if the server
doesn't support this feature, it's a no-op and CBL accepts the non-
compressed revision as usual.
For #168

snej added a commit to couchbase/sync_gateway that referenced this issue Feb 26, 2015

Added delta-compression support for getting JSON docs
When deltas are enabled in a request, the JSON body may now also be
delta-compressed. The source of the delta will be the first available
revision listed in the `atts_since` parameter. This revision ID will
appear in the `X-Delta-Source` header.

See https://github.com/couchbaselabs/couchbase-lite-api/wiki/Delta-Compression
For #452; see also couchbase/couchbase-lite-ios#168

@snej snej added in progress and removed backlog labels Mar 10, 2015

@zgramana zgramana modified the milestones: 1.2, Future May 1, 2015

@pasin pasin closed this May 22, 2015

@pasin pasin removed the in progress label May 22, 2015

@zgramana zgramana reopened this May 22, 2015

@zgramana zgramana added the backlog label May 22, 2015

@snej snej modified the milestones: 1.3, 1.2 Oct 16, 2015

@snej snej added icebox and removed backlog labels Oct 16, 2015

@snej snej modified the milestones: 1.3, 1.4 May 26, 2016

@pasin pasin removed this from the 1.4.0 milestone Dec 19, 2016

@djpongh djpongh added this to the 2.0.0 milestone Jun 5, 2017

@djpongh djpongh modified the milestones: 2.0.0, 2.1.0 Nov 30, 2017

@djpongh djpongh added P2: medium and removed P1: high labels Feb 7, 2018

@djpongh djpongh removed the P2: medium label Apr 3, 2018

@djpongh djpongh modified the milestones: 2.1.0, 2.2.0 Apr 3, 2018

@djpongh djpongh modified the milestones: 2.5.0, Iridium Sep 19, 2018

@djpongh djpongh added P1: high backlog and removed icebox labels Nov 1, 2018

@pasin pasin closed this Nov 30, 2018

@pasin pasin removed the backlog label Nov 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.