You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the event of a catastrophic etcd cluster failure, etcd should be able to restart itself and initialize into a previous known-good state.
Cluster failure happens when all of the nodes on an etcd cluster are terminated.
Currently, when a cluster failures, the first node to recover will re-initialize the discovery process with the etcd-dicsover service. But it will not recover the data.
What we want is for one or more nodes in a cluster to ship WAL logs to a known location (and maybe full backups as well) at periodic intervals. Then, when a cluster fails, it should grab the last successful backup and import the data from that file.
In the event of a catastrophic etcd cluster failure, etcd should be able to restart itself and initialize into a previous known-good state.
Cluster failure happens when all of the nodes on an etcd cluster are terminated.
Currently, when a cluster failures, the first node to recover will re-initialize the discovery process with the etcd-dicsover service. But it will not recover the data.
What we want is for one or more nodes in a cluster to ship WAL logs to a known location (and maybe full backups as well) at periodic intervals. Then, when a cluster fails, it should grab the last successful backup and import the data from that file.
I believe that the best way to accomplish this will be to use etcd's snapshot backup/restore system. https://github.com/coreos/etcd/blob/master/Documentation/04_to_2_snapshot_migration.md
The text was updated successfully, but these errors were encountered: