Protect controller from becoming unscheduleable #14

jahkeup · 2019-11-11T19:55:34Z

Also, the Operator Controller should handle killing itself, especially so in a single noded cluster! It should not prevent itself from getting scheduled in a Cluster.

Originally posted by @jahkeup in bottlerocket-os/bottlerocket#239 (comment)

jahkeup · 2020-03-12T22:52:27Z

One way to protect the controller could be to have the controller save its hosting node for to be updated last. Once it updated through the other nodes, the controller would delete its Pod to be rescheduled and only once started elsewhere would it continue to update that last node.

The controller's deployment should then include bottlerocket.aws/update-available in its antiAffinity weighted selector (preferring update-available==false) so that it lands on updated hosts first.

This method wouldn't account for a single noded cluster or one where only a node was Ready and Schedulable. The controller will have to check that it considers itself to be reschedulable prior to stopping its Pod.

cbgbt · 2022-04-05T21:47:50Z

Some thoughts from conversation with @somnusfish:

Consider evicting the controller first when doing a drain
Add a timeout to drains, error out and trigger our new crash loop handling code (0.2.0: Handle update-reboot failures/ "crash loops" #123) if they get stuck. Drains should never roll forward on timeouts.
Add PDBs to apiserver deployment to ensure we always have at least 2 running in the cluster.

cbgbt · 2022-04-05T21:48:36Z

We want to add the ability to allow brupop to update many nodes simultaneously, which makes this more important. Adding this to the 1.0.0 release milestone.

jahkeup changed the title ~~dogswatch: prevent Controller from unscheduleable conditions~~ dogswatch: protect Controller from unscheduleable conditions Nov 11, 2019

webern transferred this issue from bottlerocket-os/bottlerocket Feb 26, 2020

jahkeup changed the title ~~dogswatch: protect Controller from unscheduleable conditions~~ Protect controller from becoming unscheduleable Feb 27, 2020

jhaynes added this to the Backlog milestone May 21, 2021

jhaynes added enhancement type/enhancement priority/p0 and removed enhancement labels May 21, 2021

jhaynes modified the milestones: Backlog, next May 21, 2021

jhaynes modified the milestones: next, next+1 Jul 28, 2021

Vaishvenk added this to Feature Backlog in Bottlerocket Roadmap Aug 6, 2021

cbgbt modified the milestones: brupop 0.1.x next, Backlog Feb 21, 2022

cbgbt removed the status/notstarted label Apr 5, 2022

cbgbt modified the milestones: Backlog, brupop 1.0.0 Apr 5, 2022

somnusfish assigned somnusfish and unassigned somnusfish May 2, 2022

gthao313 self-assigned this May 2, 2022

gthao313 mentioned this issue Jun 21, 2022

Protect controller from becoming unschedulable #214

Merged

3 tasks

gthao313 modified the milestones: brupop 1.0.0, brupop 0.2.2 Jul 13, 2022

gthao313 closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect controller from becoming unscheduleable #14

Protect controller from becoming unscheduleable #14

jahkeup commented Nov 11, 2019

jahkeup commented Mar 12, 2020

cbgbt commented Apr 5, 2022

cbgbt commented Apr 5, 2022

Protect controller from becoming unscheduleable #14

Protect controller from becoming unscheduleable #14

Comments

jahkeup commented Nov 11, 2019

jahkeup commented Mar 12, 2020

cbgbt commented Apr 5, 2022

cbgbt commented Apr 5, 2022