Skip to content

Latest commit

 

History

History
212 lines (154 loc) · 9.17 KB

versioning.md

File metadata and controls

212 lines (154 loc) · 9.17 KB

Firecracker snapshot versioning

This document describes how Firecracker persists its state across multiple versions, diving deep into the snapshot format, encoding, compatibility and limitations.

Introduction

The design behind the snapshot implementation enables version tolerant save and restore across multiple Firecracker versions which we call a version space. For example, one can pause a microVM, save it to disk with Firecracker version 0.23.0 and later load it in Firecracker version 0.24.0. It also works in reverse: Firecracker version 0.23.0 loads what 0.24.0 saves.

Below is an example graph showing backward and forward snapshot compatibility. This is the general picture, but keep in mind that when adding new features some version translations would not be possible.

Version graph

A non-exhaustive list of how cross-version snapshot support can be used:

Example scenario #1 - load snapshot from older version:

  • Start Firecracker v0.23 → Boot microVM → Workload starts → Pause → CreateSnapshot(snap) → kill microVM
  • Start Firecracker v0.24 → LoadSnapshot → Resume → Workload continues

Example scenario #2 - load snapshot in older version:

  • Start Firecracker v0.24 → Boot microVM → Workload starts → Pause → CreateSnapshot(snap, “0.23”) → kill microVM
  • Start Firecracker v0.23 → LoadSnapshot(snap) → Resume → Workload continues

Example scenario #3 - load snapshot in older version:

  • Start Firecracker v0.24 → LoadSnapshot(older_snap) → Resume → Workload continues → Pause → CreateSnapshot(snap, “0.23”) → kill microVM
  • Start Firecracker v0.23 → LoadSnapshot(snap) → Resume → Workload continues

Overview

Firecracker persists the microVM state as 2 separate objects:

  • a guest memory file
  • a microVM state file.

The block devices attached to the microVM are not considered part of the state and need to be managed separately.

Guest memory

The guest memory file contains the microVM memory saved as a dump of all pages.

MicroVM state

In the VM state file, Firecracker stores the internal state of the VMM (device emulation, KVM and vCPUs) with 2 exceptions - serial emulation and vsock backend.

While we continuously improve and extend Firecracker's features by adding new capabilities, devices or enhancements, the microVM state file may change both structurally and semantically with each new release. The state file includes versioning information and each Firecracker release implements distinct save/restore logic for the supported version space.

MicroVM state file format

A microVM state file is further split into four different fields:

Field Bits Description
magic_id 64 Firecracker snapshot, architecture (x86_64/aarch64) and storage version.
version 16 The snapshot version number internally mapped 1:1 to a specific Firecracker version.
state N Bincode blob containing the microVM state.
crc 64 Optional CRC64 sum of magic_id, version and state fields.

Note: the last 16 bits of magic_id encode the storage version which specifies the encoding used for the version and state fields. The current implementation sets this field to 1, which identifies it as a Serde bincode compatible encoder/decoder.

Version tolerant ser/de

Firecracker reads and writes the state blob of the snapshot by using per version, separate serialization and deserialization logic. This logic is mostly autogenerated by a Rust procedural macro based on struct and enum annotations. Basically, one can say that these structures support versioning. The versioning logic is generated by parsing a structure's history log (encoded using Rust annotations) and emitting Rust code.

Versioned serialization and deserialization is divided into two translation layers:

  • field translator,
  • semantic translator.

The field translator implements the logic to convert between different versions of the same Rust POD structure: it can deserialize or serialize from source version to target. The translation is done field by field - the common fields are copied from source to target, and the fields that are unique to the target are (de)serialized with their default values.

The semantic translator is only concerned with translating the semantics of the serialized/deserialized fields.

The field translator is generated automatically through a procedural macro, and the semantic translation methods have to be annotated in the structure by the user.

This block diagram illustrates the concept:

Versionize

VM state encoding

During research and prototyping we considered multiple storage formats. The criteria used for comparing these are: performance, size, rust support, specification, versioning support, community and tooling. Performance, size and Rust support are hard requirements while all others can be the subject of trade offs. More info about this comparison can be found here.

Key benefits of using bincode:

  • Minimal snapshot size overhead
  • Minimal CPU overhead
  • Simple implementation

The current implementation relies on the Serde bincode encoder.

Versionize is compatible to Serde with bincode backend: structures serialized with versionize at a specific version can be deserialized with Serde. Also structures serialized with serde can be deserialized with versionize.

Snapshot compatibility

Host kernel

The minimum kernel version required by Firecracker snapshots is 4.14. Snapshots can be saved and restored on the same kernel version without any issues. There might be issues when restoring snapshots created on different host kernel version even when using the same Firecracker version.

SnapshotCreate and SnapshotLoad operations across different host kernels is considered unstable in Firecracker as the saved KVM state might have different semantics on different kernels.

Device model

The current Firecracker devices are backwards compatible up to the version that introduces them. Ideally this property would be kept over time, but there are situations when a new version of a device exposes new features to the guest that do not exist in an older version. In such cases restoring a snapshot at an older version becomes impossible without breaking the guest workload.

The microVM state file links some resources that are external to the snapshot:

  • tap devices by device name,
  • block devices by block file path,
  • vsock backing Unix domain socket by socket name.

To successfully restore a microVM one should check that:

  • tap devices are available, their names match their original names since these are the values saved in the microVM state file, and they are accessible to the Firecracker process where the microVM is being restored,
  • block devices are set up at their original relative or absolute paths with the proper permissions, as the Firecracker process with the restored microVM will attempt to access them exactly as they were accessed in the original Firecracker process,
  • the vsock backing Unix domain socket is available, its name matches the original name, and it is accessible to the new Firecracker process.

CPU model

Firecracker microVMs snapshot functionality is available for Intel/AMD/ARM64 CPU models that support the hardware virtualizations extensions, more details are available here. Snapshots are not compatible across CPU architectures and even across CPU models of the same architecture. They are only compatible if the CPU features exposed to the guest are an invariant when saving and restoring the snapshot. The trivial scenario is creating and restoring snapshots on hosts that have the same CPU model.

Restoring from an Intel snapshot on AMD (or vice-versa) is not supported.

It is important to note that guest workloads can still execute instructions that are being masked by CPUID and restoring and saving of such workloads will lead to undefined result. Firecracker retrieves the state of a discrete list of MSRs from KVM, more specifically, the MSRs corresponding to the guest exposed features.

Implementation

To enable Firecracker cross version snapshots we have designed and built two crates:

  • versionize - defines the Versionize trait, implements serialization of primitive types and provides a helper class to map Firecracker versions to individual structure versions.
  • versionize_derive - exports a procedural macro that consumes structures and enums and their annotations to produce an implementation of the Versionize trait.

The microVM state file format is implemented in the snapshot crate in the Firecracker repository. All Firecracker devices implement the Persist trait which exposes an interface that enables creating from and saving to the microVM state.