VM snapshot (checkpoint) and restore

## Summary

Add VM snapshot (checkpoint) and restore support, leveraging Cloud Hypervisor's native `vm.pause` + `vm.snapshot` API to capture full VM state (CPU, memory, devices) and COW disk, enabling fast resume from a saved point.

## Background

Cloud Hypervisor provides snapshot/restore via REST API:

```bash
# Snapshot (VM must be paused first)
ch-remote pause
ch-remote snapshot file:///path/to/snapshot

# Restore (new CH process, VM starts in paused state)
cloud-hypervisor --api-socket new.sock --restore source_url=file:///path/to/snapshot
ch-remote resume
```

Snapshot produces three files:
- `config.json` — VM configuration (human-editable, disk paths can be modified before restore)
- `state.json` — CPU registers, MSR, virtio device state
- `memory-ranges` — full guest RAM dump (size = guest memory)

Disks are NOT included in the snapshot — they must be copied separately.

### Key CH Behaviors

- `--restore` is mutually exclusive with all VM config flags (`--cpus`, `--memory`, `--disk`, `--net`, etc.) — config comes entirely from the snapshot's `config.json`
- `--api-socket` CAN be used alongside `--restore`
- If the original VM used `tap=<name>` (CH manages tap), CH recreates the tap automatically on restore — no `net_fds` needed
- After restore + resume, the VM is a normal running VM. Subsequent stop/start uses standard CLI args built from VMRecord (cold boot from disk, memory state is lost)
- `config.json` disk paths are absolute — must be updated if restoring to a different directory

## Proposed Design

### Storage Layout

```
runDir/{vmID}/
├── cow.raw                        # active COW disk
├── api.sock, ch.pid, ...          # runtime files
└── snapshots/
    └── {snapshot-name}/
        ├── meta.json              # cocoonv2 metadata (VMRecord + ImageBlobIDs)
        ├── config.json            # CH VM config (from vm.snapshot)
        ├── state.json             # CH device state (from vm.snapshot)
        ├── memory-ranges          # guest RAM dump (from vm.snapshot)
        └── cow.raw                # COW disk copy (reflink or rsync --sparse)
```

### Snapshot Flow

```
Snapshot(vmID, name):
  1. PUT /api/v1/vm.pause
  2. PUT /api/v1/vm.snapshot  destination_url = runDir/{vmID}/snapshots/{name}/
  3. Copy COW disk to snapshot dir:
     - btrfs/XFS: FICLONE ioctl (instant, zero extra space)
     - ext4: rsync --sparse (slower, VM stays paused during copy)
  4. Write meta.json (VMRecord + ImageBlobIDs + timestamp)
  5. PUT /api/v1/vm.resume
```

### Restore Flow

```
Restore(vmID, snapshotName):
  1. Read meta.json → full VMRecord with StorageConfigs, NetworkConfigs, BootConfig
  2. Create new VM directory, copy COW disk back
  3. Patch config.json disk paths to point to new VM directory
  4. recoverNetwork: CNI DEL + ADD with IP= arg + TC redirect setup
     (reuses existing recovery code from host-reboot network recovery)
  5. nsenter netns → cloud-hypervisor --api-socket X --restore source_url=...
  6. PUT /api/v1/vm.resume
  7. Write DB record (standard VMRecord, subsequent start/stop uses normal flow)
```

### Post-Restore Lifecycle

After restore, the VM has a complete VMRecord in the DB. Subsequent operations use the standard code path:

- **Stop then Start**: Normal cold boot from disk (memory state lost, uses `--kernel`/`--firmware` + full CLI args)
- **Host reboot then Start**: `recoverNetwork` + normal cold boot
- **Want to preserve memory state on every stop?**: Change Stop flow to `pause → snapshot → terminate`, and Start checks for existing snapshot before deciding `--restore` vs normal boot

## New REST API Calls Needed

| Endpoint | Method | Purpose |
|---|---|---|
| `/api/v1/vm.pause` | PUT | Pause VM before snapshot |
| `/api/v1/vm.resume` | PUT | Resume VM after snapshot |
| `/api/v1/vm.snapshot` | PUT | Create snapshot `{"destination_url":"file:///path"}` |

These are existing CH API endpoints not currently used by cocoonv2.

## Interface Changes

```go
type Hypervisor interface {
    // ... existing methods ...
    Snapshot(ctx context.Context, ref string, name string) error
    Restore(ctx context.Context, ref string, name string) (*types.VM, error)
}
```

## Open Questions

### 1. Snapshot lifecycle: tied to VM or independent?

**Option A — Tied to VM**: Snapshots live in `runDir/{vmID}/snapshots/`. Deleting the VM deletes all snapshots. Simple, no orphan cleanup needed.

**Option B — Independent storage**: Snapshots live in `rootDir/snapshots/{snapshotID}/`. Survive VM deletion, can rebuild a VM from snapshot. Requires separate GC module.

### 2. Pause duration during snapshot

The VM must stay paused until the COW disk copy completes. For large disks on ext4 (no reflink), this could take minutes.

Possible mitigations:
- Recommend btrfs/XFS for `runDir` (instant reflink clone)
- Accept the downtime for ext4 users
- Investigate if we can resume first and copy disk after (sacrifices strict consistency — guest may have written new data between snapshot and disk copy)

### 3. memory-ranges file size

`memory-ranges` = guest RAM size. A 4GB VM produces a 4GB file. Multiple snapshots multiply this.

Options:
- Accept as-is (disk is cheap)
- Compress with zstd (guest unused memory pages are mostly zeros, good compression ratio)
- Limit number of snapshots per VM

### 4. Image blob GC protection

Snapshots reference read-only layers (EROFS blobs for OCI, base qcow2 for cloudimg) that are NOT included in the snapshot. If the image is garbage-collected, restore fails.

Solution: Include `ImageBlobIDs` in `meta.json`. GC must check snapshot references before collecting blobs.

### 5. Scale-to-zero mode

Optional future enhancement: change Stop to `pause → snapshot → terminate` and Start to `check snapshot → --restore or cold boot`. This gives ~200ms wake-up time (per Koyeb benchmarks) but doubles stop time and disk usage.

## Known CH Limitations

- No incremental snapshots (always full memory dump)
- virtiofs root restore hangs (CH Issue #6931) — cocoonv2 doesn't use virtiofs
- Cross-CH-version restore not supported
- VFIO (GPU passthrough) VMs cannot snapshot
- Hot-plugged memory regions not restored (CH Issue #3165)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VM snapshot (checkpoint) and restore #6

Summary

Background

Key CH Behaviors

Proposed Design

Storage Layout

Snapshot Flow

Restore Flow

Post-Restore Lifecycle

New REST API Calls Needed

Interface Changes

Open Questions

1. Snapshot lifecycle: tied to VM or independent?

2. Pause duration during snapshot

3. memory-ranges file size

4. Image blob GC protection

5. Scale-to-zero mode

Known CH Limitations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Endpoint	Method	Purpose
`/api/v1/vm.pause`	PUT	Pause VM before snapshot
`/api/v1/vm.resume`	PUT	Resume VM after snapshot
`/api/v1/vm.snapshot`	PUT	Create snapshot `{"destination_url":"file:///path"}`

VM snapshot (checkpoint) and restore #6

Description

Summary

Background

Key CH Behaviors

Proposed Design

Storage Layout

Snapshot Flow

Restore Flow

Post-Restore Lifecycle

New REST API Calls Needed

Interface Changes

Open Questions

1. Snapshot lifecycle: tied to VM or independent?

2. Pause duration during snapshot

3. memory-ranges file size

4. Image blob GC protection

5. Scale-to-zero mode

Known CH Limitations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions