Backup and Restore for etcd #36

technosophos · 2016-01-12T23:35:17Z

In the event of a catastrophic etcd cluster failure, etcd should be able to restart itself and initialize into a previous known-good state.

Cluster failure happens when all of the nodes on an etcd cluster are terminated.

Currently, when a cluster failures, the first node to recover will re-initialize the discovery process with the etcd-dicsover service. But it will not recover the data.

What we want is for one or more nodes in a cluster to ship WAL logs to a known location (and maybe full backups as well) at periodic intervals. Then, when a cluster fails, it should grab the last successful backup and import the data from that file.

I believe that the best way to accomplish this will be to use etcd's snapshot backup/restore system. https://github.com/coreos/etcd/blob/master/Documentation/04_to_2_snapshot_migration.md

technosophos · 2016-01-12T23:37:37Z

Fulfills the following requirements on deis/deis#4809

Recoverability: documented backup + restore
Availability: HA configuration deployed by default

And also it should cover the recovery tests in

Testing: single, multi-node failure, recovery

rimusz · 2016-03-01T20:05:22Z

this one should be closed, as we have no etcd anymore

technosophos added the enhancement label Jan 12, 2016

technosophos assigned smothiki Jan 12, 2016

technosophos added this to the v2.0-beta1 milestone Jan 12, 2016

slack removed this from the v2.0-beta1 milestone Feb 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup and Restore for etcd #36

Backup and Restore for etcd #36

technosophos commented Jan 12, 2016

technosophos commented Jan 12, 2016

rimusz commented Mar 1, 2016

Backup and Restore for etcd #36

Backup and Restore for etcd #36

Comments

technosophos commented Jan 12, 2016

technosophos commented Jan 12, 2016

rimusz commented Mar 1, 2016