Incremental mode #11

xpqz · 2017-04-27T12:18:26Z

Incremental backup mode: retain the seq id from the last completed run, allow the backup tool to start from there.

Incremental restore mode: (this may already be possible) restore from a set of incremental backups.

mikerhodes · 2017-05-10T09:37:03Z

I think the core design concern here is safety -- can we have an incremental backup file list its "parent" file for example (via something like GUID?)?

The restore tool could then construct a restore path from a complete backup via intermediary incremental backups, and refuse if that chain wasn't complete.

Other sanity checks are possible such as:

Store a completion seq in a backup file's header/footer. The backup tool can resume from a previous backup file then, rather than the user having to sort the seq values.
Store a start seq in a backup file's header/footer. The restore tool can check that the chain of completion seq to start seq is complete for a given restore chain of files.
We should encourage regular complete backups. As the incremental chain gets longer, its fidelity will decrease (as backup files may get lost) and restore time will increase.

A --force could override this, in case a backup file was lost so a partial restore is possible.

pulkitanchalia · 2017-11-27T13:05:52Z

Is this feature coming in near future?

jareware · 2018-04-14T08:18:41Z

Is this not what --resume true is for..?

ricellis · 2018-04-16T07:45:27Z

no, --resume is different, see #218 (comment)

wmbutler · 2018-07-02T16:19:55Z

Any word on planned support for incremental backups?

ricellis · 2018-07-03T09:55:50Z

Not at this time, no.

wmbutler · 2018-10-11T15:51:52Z

We have some databases that are taking 8-12 hours to backup. Can we get a concerted effort to look at this?

baversjo · 2018-10-17T08:58:51Z

We're streaming backups up to S3. Would it be possible to implement a solution that doesn't require download of the entire backup file from S3, to start a new (incremental) backup? Maybe there could be an "index" file that has information about all incremental backups and what backup is the last full backup (base for all increments). Also, maybe the backup system will max store 14 incremental backups and force a full backup after that?

wmbutler · 2018-12-02T07:44:11Z

I've been playing with the /_changes endpoint. Seems to me we could fork this repo and start playing around with the idea of adding a last-event-id flag to pass through. Only two things are needed really:

The backup must record the seq from the /_changes endpoint return
We need to store the seq so we know where to pick up from

Seems that it might make sense to store this in a text file with the backup. The file could be called:

last-event-id.txt and could contain the seq value.

 "results": [
        {
            "seq": "6-g1AAAAMteJyl0TEOwiAUgGG0JuopdHNrSii2nfQm-h4PU03FxFRXvYneRG-iN6kgi91qu0BCwvcHXsEYG-UBsRmhOhz1kpCHQGdtylxDUeZRxENVHE4EpgyNLgt7oQ8MJ1VV7fIA2N4eDBWpVPCY2PhkSG-2RtPfKE7tiou6CxKyjHdzl85d1dxUCJJI3dy1cy81N5GY2G_o5JqBXdnVbpa-ObvnbZGkBNBca1m--_LDlfvfcjxHrQRvWxZNy09ffrly4MsbOZepbK61fPPbl38mGSFEGdUmufsAPT4DhQ",
            "id": "a0234f3e12be9d3faec1510e356a2257",
            "changes": [
                {
                    "rev": "1-42fd8ac8d7af38c45be37171b344f6a5"
                }
            ],
            "doc": {
                "_id": "a0234f3e12be9d3faec1510e356a2257",
                "_rev": "1-42fd8ac8d7af38c45be37171b344f6a5",
                "first": "frank"
            }
        }
    ],
    "last_seq": "6-g1AAAAMteJyl0TEOwiAUgGG0JuopdHNrSii2nfQm-h4PU03FxFRXvYneRG-iN6kgi91qu0BCwvcHXsEYG-UBsRmhOhz1kpCHQGdtylxDUeZRxENVHE4EpgyNLgt7oQ8MJ1VV7fIA2N4eDBWpVPCY2PhkSG-2RtPfKE7tiou6CxKyjHdzl85d1dxUCJJI3dy1cy81N5GY2G_o5JqBXdnVbpa-ObvnbZGkBNBca1m--_LDlfvfcjxHrQRvWxZNy09ffrly4MsbOZepbK61fPPbl38mGSFEGdUmufsAPT4DhQ",
    "pending": 0
}

wmbutler · 2018-12-02T07:46:30Z

We're streaming backups up to S3. Would it be possible to implement a solution that doesn't require download of the entire backup file from S3, to start a new (incremental) backup? Maybe there could be an "index" file that has information about all incremental backups and what backup is the last full backup (base for all increments). Also, maybe the backup system will max store 14 incremental backups and force a full backup after that?

I think an incremental would be assumed if the user didn't pass in the last-event-id flag discussed above.

wmbutler · 2018-12-02T08:12:54Z

Looks like you are already capturing the lastSeq value upon successful backup.

https://github.com/cloudant/couchbackup/blob/e5517bad8559e44e72d5a0a43b1ad9df064fcf77/includes/spoolchanges.js#L90

Seems to me each database could have a manifest for managing incremental backups. Each line of the file could just be the latest successful lastSeq.

-last-event-id.txt

1-g1AAAAKJeJyl0UEOgjAQBdBRTNRT6M4doUECrOQmOu3UIKklMcWt3kRvojfRm2CbrtgJbKZJk_9mplUAsCgDgg1xUV9kQZyFSFepTSlRmTKKWChU3RBqE2pplA1MEfiqbduqDBDO9mIuSGQx2xIsG03yeNKSeqN8bSvfdV1MMM_ZOLdw7t65CP3TB5e-daZKE57aZUdNpWe2wt0eln44e-LtOM0I8X9tYOen7_wa8iZeeHvhM1z4esH9S_UDtxTSKQ
3-g1AAAALbeJy10EEOgjAQBdBRTNRT6M4doQFCWclNdNqpQYIlMcWt3kRvojfRm2ArK3YKcTNNJun7P1MCwCz3CFYkZHVUGQnmI52UNrnC0uRBwHxZVjWhNr5WprQfxghi0TRNkXsIB7uYSpI8ZBHBvNakdnut6GdULO0U666LMaYpG-Zmzt10XB6GFAsa5m6de-64SSwSe4ZBrp7YCRf7WPrq7FFrhwknxO-1nsm3NvleIPQUHq3wdN3Hn-4UIeco_9791SY3tnvxBvS-6xc
5-g1AAAAMTeJy10EEOgjAQQNERTNRT6M4doSlIWclNdNqpQYIlMeBWb6I30ZvoTbDIwrBTiZtpMknfTyYHgHHqEsxJqmKvE5LMQzpoU6Ya8zL1feapvKgITekZXeb2g4Mgp3VdZ6mLsLOLkSIlOAsIJpUhvdkaTV-jcmanXHZdDDGOWT83adxVxxWcUyipn7tu3GPHjUIZ2TP0cs3QTjjZx9Lnxh60No8EIX6u_Vi-tOVrU3Ze5WAhteLs7-VbW76_yxSgEKj-Xn605TpDyJ6HkfzV

With this in place, the script could look in the same location where the target file was to be written for -last-event-id.txt, strip of the last line and use that as the value for last-event-id

wmbutler · 2018-12-03T05:32:42Z

I spent the weekend working on this. Instructions are in the Readme. Anyone interested in further collaboration would be welcome. I've tested it and it works. It keeps mostly with the spirit of existing functionality but the code might be a little sloppy in places. Hoping the Cloudant team will get a developer to review and fine tune things.

As it stands, it creates a new log with _0, _1, _2 appended for each occurrence where there is a revision. This means that end users can set the recurrence interval in crontab to whatever they like: 1 hr, 6 hrs, 1 day etc.

It's not an NPM but I included instructions for forking my repo and installing it in the readme.

https://github.com/wmbutler/couchbackup

emlaver · 2018-12-03T15:20:43Z

@wmbutler Please open a PR and follow our contributing guidelines (e.g. added tests for code changes) to have our team review your changes.

ricellis · 2018-12-03T16:21:19Z

One of the reasons this feature has been outstanding for a long while is that the simple solutions do not offer guarantees of completeness. That is:
i) guaranteeing that a set of incremental backup files is restoreable
ii) guaranteeing that a restore from a set of incremental files results in a complete database

Meeting these criteria would likely mean implementing a significant part of the replication protocol. I don't think we'd be able to accept an incremental backup solution that didn't offer this level of robustness.

wmbutler · 2018-12-03T16:56:41Z

@ricellis Would love to hear more detail. In reviewing the backup file, it appears to be an array of documents. All I'm doing is creating a series of text files each with an array of docs. It's basically a changelog driven means of creating multiple backup files. I'm not aware of any additional complexity regarding your statement:

i) guaranteeing that a set of incremental backup files is restorable

file_1

[
{},
{},
{}
]

file_2

[
{},
{}
]

I don't see how this is much different from
file

[
{},
{},
{},
{},
{}
]

It just seems to me that as a large company (IBM), it might make sense to dedicate a couple a hundred man hours to this pursuit. Failure to do so will mean losing customers to solutions that offer modern backup practices.

baversjo · 2018-12-03T17:22:15Z

Backup of our databases takes hours and lots of resources on our cluster. I agree with @wmbutler, an incremental backup solution is for my company, probably the most important missing feature in Cloudant. Especially as we've been told by our AEs to migrate from the managed / built in backup tool in Cloudant dedicated cluster to this open source solution. So indeed it would be nice for Cloudant to dedicate some resources to this. You'd think it can't be that complex as the couch database is an "append only" log file essentially.

ghost · 2019-04-24T08:55:50Z

Jumping in to say that would be nice too; maybe we can start with a basic implementation as @wmbutler proposed, activated with a flag and a warning in the README regarding that flag?

ricellis added the enhancement label Apr 27, 2017

ricellis added this to the Later milestone Apr 27, 2017

ricellis added this to New in couchbackup Triage Jul 12, 2017

ricellis moved this from New to Icebox in couchbackup Triage Sep 21, 2017

ricellis mentioned this issue Apr 16, 2018

Unable to resume backup #218

Closed

ricellis removed this from the Later milestone Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental mode #11

Incremental mode #11

xpqz commented Apr 27, 2017

mikerhodes commented May 10, 2017

pulkitanchalia commented Nov 27, 2017

jareware commented Apr 14, 2018

ricellis commented Apr 16, 2018

wmbutler commented Jul 2, 2018

ricellis commented Jul 3, 2018

wmbutler commented Oct 11, 2018

baversjo commented Oct 17, 2018 •

edited

wmbutler commented Dec 2, 2018 •

edited

wmbutler commented Dec 2, 2018

wmbutler commented Dec 2, 2018 •

edited

wmbutler commented Dec 3, 2018

emlaver commented Dec 3, 2018

ricellis commented Dec 3, 2018

wmbutler commented Dec 3, 2018 •

edited

baversjo commented Dec 3, 2018 •

edited

ghost commented Apr 24, 2019 •

edited by ghost

Incremental mode #11

Incremental mode #11

Comments

xpqz commented Apr 27, 2017

mikerhodes commented May 10, 2017

pulkitanchalia commented Nov 27, 2017

jareware commented Apr 14, 2018

ricellis commented Apr 16, 2018

wmbutler commented Jul 2, 2018

ricellis commented Jul 3, 2018

wmbutler commented Oct 11, 2018

baversjo commented Oct 17, 2018 • edited

wmbutler commented Dec 2, 2018 • edited

wmbutler commented Dec 2, 2018

wmbutler commented Dec 2, 2018 • edited

wmbutler commented Dec 3, 2018

emlaver commented Dec 3, 2018

ricellis commented Dec 3, 2018

wmbutler commented Dec 3, 2018 • edited

baversjo commented Dec 3, 2018 • edited

ghost commented Apr 24, 2019 • edited by ghost

baversjo commented Oct 17, 2018 •

edited

wmbutler commented Dec 2, 2018 •

edited

wmbutler commented Dec 2, 2018 •

edited

wmbutler commented Dec 3, 2018 •

edited

baversjo commented Dec 3, 2018 •

edited

ghost commented Apr 24, 2019 •

edited by ghost