Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randomize locksmithd reboot window #2610

Open
dabeck opened this issue Aug 30, 2019 · 1 comment

Comments

@dabeck
Copy link

commented Aug 30, 2019

Feature Request

I'm looking for a way to tell locksmithd to randomly execute the "reboot_strategy". For example I have a set of nodes in my environment which have the "reboot"-strategy with a reboot window configured via cloud-init. Now in some cases it happens that those nodes reboot at exactly the same time and my applications are down. What I'm looking for is a way to say something like: This is the reboot window but you should reboot at a random time in this window so the possibility of a concurrently reboot is minimal.
Is this already possible or do you have any recommendations on this?

I know about the etcd-lock option but my nodes don't have etcd setup, so etcd-lock is not an option for me.

Environment

OpenStack

@lucab

This comment has been minimized.

Copy link
Member

commented Aug 30, 2019

@dabeck thanks for the interesting feedback!

This is the reboot window but you should reboot at a random time in this window so the possibility of a concurrently reboot is minimal.

This same discussion recently came up in an offline architectural conversation around Zincati, and we reached the conclusion that we don't plan to implement this.
The rationale is that it would try to tackle an hybrid case between "reboots are independent" and "reboots are not independent (cluster-wise)". That comes with its own development, testing and maintenance costs. However, the main point is that node-rebooting "is-independent" property is a boolean, so the hybrid case should be properly folded into one of the two options.

Porting this to your specific case: your reboots are indeed NOT independent, and they need to be coordinated somehow based on information which is known by the cluster, but not by every single node.

Is this already possible or do you have any recommendations on this?
I know about the etcd-lock option but my nodes don't have etcd setup, so etcd-lock is not an option for me.

Recommendation would be to acknowledge that your reboots need to be orchestrated somehow, cluster-wide.
Locksmith only supports etcd2 for that, so you need to either provide an etcd2 cluster or come up with a similar solution.
You don't need to have etcd running on each node, as locksmith should allow you to specify which endpoint to use for etcd (could be also somewhere remote).
Alternatively if you are using kubernetes you may have a look at https://github.com/coreos/container-linux-update-operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.