Skip to content

Netflix-Skunkworks/s3-flash-bootloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

This is a minimal in-memory operating system for flashing new disk images onto existing servers. It is especially useful for stateful services running on cloud instances which cannot preserve ephemeral state (e.g Amazon EC2 instance-store).

This system allows you to perform in-place upgrades of a server's entire software, producing a software configuration identical to that of a freshly launched server image. It may either complement or entirely replace a configuration management tool. When it replaces a configuration management tool, it achieves many of the benefits of immutable servers, even when hardware is long-lived.

To learn more about the development of s3-flash-bootloader, check out the introductory blog post.

Prerequisites

  • The existing system must have GRUB installed
  • The existing system cannot be running on a PV virtualized server
  • There must be enough RAM to store the compressed OS image, with some to spare for the bootloader

Installing

Installing s3-flash-bootloader on your system overwrites any existing boot configuration. After installation, your system will not be able to boot, except into s3-flash-bootloader. For this reason there are two safety mechanism built into the bootloader:

  1. If downloading fails (permissions, image doesn't exist, etc ...) the bootloader will reboot back into your original OS without flashing the root volume.
  2. You can include your public SSH keys in the bootloader so you can still SSH into the bootloader itself while it is running.

We publish a release tarball for x86_64. Installation of the bootloader will depend on your environment. A minimal install script might look like:

#!/bin/bash
set -e
# Make a backup so that the bootloader can restore the machine if downloading fails
cp -a /boot /boot.bak
# Install the bootloader
tar -C /boot -xvf bootloader-$(uname -m).tar.gz
cd /boot

# Tell the bootloader where to load the new AMI from
/boot/configure_bootloader.sh <bucket>/<key> /dev/disk/by-label/cloudimg-rootfs
# Add a SSH key so that you can access the bootloader while it is running
/boot/add_ssh_key.sh ~/.ssh/id_ed25519.pub

Producing flashable images

s3-flash-bootloader requires lz4-compressed full disk images stored in s3. Included in the examples directory of this repo is a script to upload the contents of an AMI to S3. We hope this script can serve as a useful template as you adapt it to fit your environment. At Netflix we tie similar automation into our AMI baking piplines which upload the disk image directly to S3 in addition to the normal snapshot and publish.

Usage note

Most stateful services depend on caches to achieve good performance. After a system is rebooted, it is likely that these caches will be cold, and unable to offer appropriate performance. We recommend re-warming caches before applying traffic to any rebooted stateful service.

  • within Netflix, we use happycache, which works for databases that use the Linux page cache (e.g. Cassandra, Postgres)
  • pgfadvise_loader is available as an extension to Postgres
  • MySQL can dump and reload its buffer pool

Similar software

Many operating systems have built-in mechanisms for performing in-place upgrades using full images, including most network-device OSes, Chromebooks, and container OSes like Container Linux. s3-flash-bootloader uses similar mechanisms, but intentionally does not require any integration with the OS being booted. As a result, we are compatible with any OS image.

Building

You may also build from source by cloning this repository and running ./build.sh. The build process uses debootstrap, and generally assumes a Debian-like OS.

Contributing

We hope this code is useful for you, and we are happy to accept bugfixes. If you prefer to build new features we encourage you to fork and adapt this to your needs.