A Kubernetes operator to manage updates of Container Linux by CoreOS
Clone or download
Latest commit 4bb1486 Jun 14, 2018
Permalink
Failed to load latest commit information.
build Changes to documentation for the new examples tree Apr 6, 2018
cmd implement reboot window, using timeutil/periodic from locksmith Feb 28, 2018
doc doc/lables-and-annotations: Document new annotation Jun 13, 2018
examples version: bump to v0.7.0 Jun 13, 2018
pkg pkg/agent: Omit log statement if annotation does not exist. Only set … Jun 13, 2018
scripts/jenkins scripts: add jenkins release file Apr 27, 2017
vendor implement reboot window, using timeutil/periodic from locksmith Feb 28, 2018
.dockerignore Dockerfile: Update from alpine:3.4 to alpine:3.6 Jun 22, 2017
.editorconfig Add editorconfig Apr 25, 2017
.gitignore makefile,build: Add Makefile release-bin, image, and docker-push targets Jun 28, 2017
.travis.yml travis.yml: Test Go 1.9 and remove Go 1.7 testing Oct 10, 2017
CONTRIBUTING.md Add missing pieces of CoreOS project template Apr 20, 2017
DCO Add missing pieces of CoreOS project template Apr 20, 2017
Dockerfile Dockerfile: Bump base image from alpine:3.6 to alpine:3.7 Jan 16, 2018
Jenkinsfile Jenkinsfile: Fix orgWhitelist to be orgslist Nov 7, 2017
LICENSE add Apache-2.0 license Oct 6, 2016
Makefile vendor: Use glide-vc to strip unused dependencies Jul 10, 2017
NOTICE Add missing pieces of CoreOS project template Apr 20, 2017
README.md Changes to documentation for the new examples tree Apr 6, 2018
VERSION version: begin development on v0.8.0 Jun 13, 2018
code-of-conduct.md update CoC Jan 4, 2018
glide.lock implement reboot window, using timeutil/periodic from locksmith Feb 28, 2018
glide.yaml implement reboot window, using timeutil/periodic from locksmith Feb 28, 2018

README.md

Container Linux Update Operator

Container Linux Update Operator is a node reboot controller for Kubernetes running Container Linux images. When a reboot is needed after updating the system via update_engine, the operator will drain the node before rebooting it.

Container Linux Update Operator fulfills the same purpose as locksmith, but has better integration with Kubernetes by explicitly marking a node as unschedulable and deleting pods on the node before rebooting.

Design

Original proposal

Container Linux Update Operator is divided into two parts: update-operator and update-agent.

update-agent runs as a DaemonSet on each node, waiting for a UPDATE_STATUS_UPDATED_NEED_REBOOT signal via D-Bus from update_engine. It will indicate via node annotations that it needs a reboot.

update-operator runs as a Deployment, watching changes to node annotations and reboots the nodes as needed. It coordinates the reboots of multiple nodes in the cluster, ensuring that not too many are rebooting at once.

Currently, update-operator only reboots one node at a time.

Requirements

  • A Kubernetes cluster (>= 1.6) running on Container Linux
  • The update-engine.service systemd unit on each machine should be unmasked, enabled and started in systemd
  • The locksmithd.service systemd unit on each machine should be masked and stopped in systemd

To unmask a service, run systemctl unmask <name>. To enable a service, run systemctl enable <name>. To start/stop a service, run systemctl start <name> or systemctl stop <name> respectively.

Usage

Create the update-operator deployment and update-agent daemonset.

kubectl apply -f examples/deploy -R

Test

To test that it is working, you can SSH to a node and trigger an update check by running update_engine_client -check_for_update or simulate a reboot is needed by running locksmithctl send-need-reboot.