Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup and Restore for etcd #36

Open
technosophos opened this issue Jan 12, 2016 · 2 comments
Open

Backup and Restore for etcd #36

technosophos opened this issue Jan 12, 2016 · 2 comments
Assignees

Comments

@technosophos
Copy link
Member

In the event of a catastrophic etcd cluster failure, etcd should be able to restart itself and initialize into a previous known-good state.

Cluster failure happens when all of the nodes on an etcd cluster are terminated.

Currently, when a cluster failures, the first node to recover will re-initialize the discovery process with the etcd-dicsover service. But it will not recover the data.

What we want is for one or more nodes in a cluster to ship WAL logs to a known location (and maybe full backups as well) at periodic intervals. Then, when a cluster fails, it should grab the last successful backup and import the data from that file.

I believe that the best way to accomplish this will be to use etcd's snapshot backup/restore system. https://github.com/coreos/etcd/blob/master/Documentation/04_to_2_snapshot_migration.md

@technosophos
Copy link
Member Author

Fulfills the following requirements on deis/deis#4809

  • Recoverability: documented backup + restore
  • Availability: HA configuration deployed by default

And also it should cover the recovery tests in

  • Testing: single, multi-node failure, recovery

@slack slack removed this from the v2.0-beta1 milestone Feb 22, 2016
@rimusz
Copy link
Contributor

rimusz commented Mar 1, 2016

this one should be closed, as we have no etcd anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants