Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

randomize locksmithd reboot window #2610

Open
dabeck opened this issue Aug 30, 2019 · 1 comment
Open

randomize locksmithd reboot window #2610

dabeck opened this issue Aug 30, 2019 · 1 comment

Comments

@dabeck
Copy link

dabeck commented Aug 30, 2019

Feature Request

I'm looking for a way to tell locksmithd to randomly execute the "reboot_strategy". For example I have a set of nodes in my environment which have the "reboot"-strategy with a reboot window configured via cloud-init. Now in some cases it happens that those nodes reboot at exactly the same time and my applications are down. What I'm looking for is a way to say something like: This is the reboot window but you should reboot at a random time in this window so the possibility of a concurrently reboot is minimal.
Is this already possible or do you have any recommendations on this?

I know about the etcd-lock option but my nodes don't have etcd setup, so etcd-lock is not an option for me.

Environment

OpenStack

@lucab
Copy link

lucab commented Aug 30, 2019

@dabeck thanks for the interesting feedback!

This is the reboot window but you should reboot at a random time in this window so the possibility of a concurrently reboot is minimal.

This same discussion recently came up in an offline architectural conversation around Zincati, and we reached the conclusion that we don't plan to implement this.
The rationale is that it would try to tackle an hybrid case between "reboots are independent" and "reboots are not independent (cluster-wise)". That comes with its own development, testing and maintenance costs. However, the main point is that node-rebooting "is-independent" property is a boolean, so the hybrid case should be properly folded into one of the two options.

Porting this to your specific case: your reboots are indeed NOT independent, and they need to be coordinated somehow based on information which is known by the cluster, but not by every single node.

Is this already possible or do you have any recommendations on this?
I know about the etcd-lock option but my nodes don't have etcd setup, so etcd-lock is not an option for me.

Recommendation would be to acknowledge that your reboots need to be orchestrated somehow, cluster-wide.
Locksmith only supports etcd2 for that, so you need to either provide an etcd2 cluster or come up with a similar solution.
You don't need to have etcd running on each node, as locksmith should allow you to specify which endpoint to use for etcd (could be also somewhere remote).
Alternatively if you are using kubernetes you may have a look at https://github.com/coreos/container-linux-update-operator.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants