Skip to content

feat: add snapshot, export, import, and clone operations#217

Merged
DorianZheng merged 16 commits intoboxlite-ai:mainfrom
joeyaflores:feat/snapshot-export-import-clone
Feb 18, 2026
Merged

feat: add snapshot, export, import, and clone operations#217
DorianZheng merged 16 commits intoboxlite-ai:mainfrom
joeyaflores:feat/snapshot-export-import-clone

Conversation

@joeyaflores
Copy link
Copy Markdown
Contributor

@joeyaflores joeyaflores commented Feb 7, 2026

Summary

Adds four new capabilities to BoxliteRuntime for box state persistence and portability:

  • Export/Import: Package a stopped box as a portable .boxlite tar archive (both disks + config metadata). Import recreates the box with a new identity on any compatible installation.
  • Snapshot/Restore: Create and restore QCOW2 internal snapshots of stopped boxes, with metadata stored in SQLite. Includes list_snapshots() and delete_snapshot().
  • Clone: duplicate() creates an independent full-copy of a stopped box (no COW backing-file coupling).
  • Python SDK: All operations exposed as async methods on the Boxlite class, plus SnapshotRecord type.

Foundation changes

  • fix(disk): Unified QCOW2 disk filename constants — fixes a bug where init/mod.rs hardcoded "root.qcow2" while layout.rs used "disk.qcow2"
  • feat(db): Auto-migration support for SQLite schema (v4→v5) instead of erroring on version mismatch
  • feat(db): Snapshots table with SnapshotStore CRUD operations
  • feat(disk): qemu-img wrapper for convert, snapshot, and full-copy operations

Design decisions

  • All operations require the box to be stopped (no live snapshot/quiesce in v1)
  • Export flattens COW chains via qemu-img convert so archives are fully standalone
  • Clone uses full copy (not COW) to avoid lifecycle coupling between original and clone
  • Method named duplicate() to avoid conflict with Rust's Clone trait
  • Schema migration is additive (CREATE TABLE IF NOT EXISTS) with version tracking

Test plan

  • All 6 new DB tests pass (snapshot CRUD, migration v4→v5, version rejection)
  • All 33 DB + manager tests pass
  • Full workspace compiles clean (cargo check)
  • 290/312 tests pass (17 pre-existing flaky failures from fd exhaustion, unrelated)
  • Manual test: export a stopped box, import on another machine (verified via archive import round-trip)
  • Manual test: snapshot, modify box, restore, verify rollback (verified: create → list → restore → delete)
  • Manual test: duplicate a box, verify independence (verified: clone has own disk files, different ID)

Centralize QCOW2 disk filenames into disk::constants instead of
hardcoding strings across multiple modules. Fixes a bug where
init/mod.rs used "root.qcow2" while layout.rs used "disk.qcow2".

Also adds guest_rootfs_disk_path() to BoxFilesystemLayout.
Replace the hard error on schema version mismatch with automatic
forward migration. Older databases are now migrated in-place through
the full migration chain (v2→v3→v4→v5).

Adds migration 4→5 which creates the snapshots table for upcoming
snapshot functionality. Includes tests for v4→v5 migration and
rejection of newer schema versions.
Add SnapshotStore with CRUD operations for snapshot metadata
persistence. Each snapshot records a point-in-time capture of a
box's disk state with a name unique per box.

Also adds db() accessor to BoxStore for sibling modules.
Wrapper functions for qemu-img CLI operations: convert (flatten COW
chains), snapshot create/apply/delete, and full copy. Returns clear
errors if qemu-img is not installed.
- export: package a stopped box as a portable .boxlite tar archive
- import: recreate a box from a .boxlite archive with new identity
- snapshot/restore: QCOW2 internal snapshots for stopped boxes
- list_snapshots/delete_snapshot: snapshot management
- duplicate: full-copy clone of a stopped box (no COW coupling)

All operations require the box to be stopped. Export flattens COW
chains so archives are fully standalone.
Expose new runtime operations to the Python SDK:
- export(), import_archive()
- snapshot(), restore(), list_snapshots(), delete_snapshot()
- duplicate()

Adds PySnapshotRecord type for snapshot metadata.
Copilot AI review requested due to automatic review settings February 7, 2026 07:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds box state persistence/portability features to Boxlite (export/import archives, QCOW2 snapshots with SQLite metadata, and full-copy duplication) and exposes snapshot/export/import/clone APIs via the Python SDK.

Changes:

  • Introduces runtime operations for export/import (.boxlite tar archive), snapshot/restore/list/delete, and full-copy duplicate.
  • Adds SQLite schema v5 with a new snapshots table plus auto-migration support and snapshot CRUD store.
  • Adds a qemu-img wrapper to perform QCOW2 flattening/copying and internal snapshot operations.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
sdks/python/src/snapshots.rs Adds SnapshotRecord Python type for snapshot metadata.
sdks/python/src/runtime.rs Exposes async Python APIs: export/import, snapshot/restore/list/delete, duplicate.
sdks/python/src/lib.rs Registers snapshot bindings in the Python module.
boxlite/src/runtime/snapshots.rs Implements snapshot/restore/list/delete operations (QCOW2 + DB metadata).
boxlite/src/runtime/portability.rs Implements export/import archive format, plus stopped-box resolver helper.
boxlite/src/runtime/clone.rs Implements full-copy duplication of stopped boxes.
boxlite/src/runtime/mod.rs Wires in new runtime modules and re-exports ArchiveManifest.
boxlite/src/runtime/layout.rs Centralizes disk paths using shared filename constants; adds guest rootfs disk accessor.
boxlite/src/runtime/core.rs Makes rt_impl pub(crate) for internal module access.
boxlite/src/litebox/manager.rs Exposes DB handle to enable snapshot store usage from runtime.
boxlite/src/litebox/init/tasks/guest_rootfs.rs Uses the new guest rootfs disk path accessor.
boxlite/src/litebox/init/mod.rs Fixes hardcoded QCOW2 filename to use shared constant.
boxlite/src/lib.rs Re-exports SnapshotRecord from the crate for SDK consumers.
boxlite/src/disk/qemu_img.rs Adds qemu-img command wrapper for convert/snapshot/copy operations.
boxlite/src/disk/mod.rs Exposes qemu_img module internally.
boxlite/src/disk/constants.rs Adds centralized disk filename constants.
boxlite/src/db/schema.rs Bumps schema to v5 and adds snapshots table schema.
boxlite/src/db/mod.rs Adds snapshot store module, auto-migration to v5, and tests for migration/table creation.
boxlite/src/db/snapshots.rs Adds snapshot metadata store + tests.
boxlite/src/db/boxes.rs Exposes DB handle from BoxStore for upstream callers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +49
record.id,
record.box_id,
record.name,
record.description,
record.created_at,
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SnapshotStore::save takes &SnapshotRecord, but the rusqlite::params![record.id, ...] arguments attempt to move String fields out of a shared reference, which will not compile. Pass references (e.g., &record.id, &record.box_id, etc.) or clone explicitly.

Suggested change
record.id,
record.box_id,
record.name,
record.description,
record.created_at,
&record.id,
&record.box_id,
&record.name,
&record.description,
&record.created_at,

Copilot uses AI. Check for mistakes.
}
}

/// Get a reference to the underlying database.
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says "Get a reference to the underlying database", but this returns a cloned Database handle by value. Consider adjusting wording (e.g., "Get a handle to...") to avoid implying a borrowed reference.

Suggested change
/// Get a reference to the underlying database.
/// Get a handle to the underlying database.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +92
pub async fn duplicate(
&self,
id_or_name: &str,
new_name: Option<String>,
) -> BoxliteResult<BoxInfo> {
let rt = &self.rt_impl;

let (src_config, _state) = resolve_stopped_box(rt, id_or_name)?;

let src_home = &src_config.box_home;
let src_container_disk = src_home.join(disk_filenames::CONTAINER_DISK);
let src_guest_disk = src_home.join(disk_filenames::GUEST_ROOTFS_DISK);

if !src_container_disk.exists() {
return Err(BoxliteError::Storage(format!(
"Container disk not found at {}",
src_container_disk.display()
)));
}

// Generate new box identity
let box_id = BoxID::new();
let container_id = ContainerID::new();
let now = Utc::now();

let box_home = rt.layout.boxes_dir().join(box_id.as_str());
let socket_path = rt_filenames::unix_socket_path(rt.layout.home_dir(), box_id.as_str());
let ready_socket_path = box_home.join("sockets").join("ready.sock");

// Create box directory
std::fs::create_dir_all(&box_home).map_err(|e| {
BoxliteError::Storage(format!(
"Failed to create box directory {}: {}",
box_home.display(),
e
))
})?;

// Full-copy disks (flattens any COW chains)
let dst_container_disk = box_home.join(disk_filenames::CONTAINER_DISK);
if let Err(e) = qemu_img::full_copy(&src_container_disk, &dst_container_disk) {
let _ = std::fs::remove_dir_all(&box_home);
return Err(e);
}

if src_guest_disk.exists() {
let dst_guest_disk = box_home.join(disk_filenames::GUEST_ROOTFS_DISK);
if let Err(e) = qemu_img::full_copy(&src_guest_disk, &dst_guest_disk) {
let _ = std::fs::remove_dir_all(&box_home);
return Err(e);
}
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate() is async but uses blocking filesystem operations and runs qemu-img directly, which can block the Tokio runtime thread. Consider using tokio::task::spawn_blocking for disk copy / filesystem work, consistent with other runtime methods that offload blocking DB work.

Copilot uses AI. Check for mistakes.
Comment on lines +219 to +240
// Create box directory
std::fs::create_dir_all(&box_home).map_err(|e| {
BoxliteError::Storage(format!(
"Failed to create box directory {}: {}",
box_home.display(),
e
))
})?;

// Move disk files into box directory
std::fs::rename(&extracted_container, box_home.join(disk_filenames::CONTAINER_DISK))
.map_err(|e| {
BoxliteError::Storage(format!("Failed to install container disk: {}", e))
})?;

let extracted_guest = temp_dir.path().join(disk_filenames::GUEST_ROOTFS_DISK);
if extracted_guest.exists() {
std::fs::rename(&extracted_guest, box_home.join(disk_filenames::GUEST_ROOTFS_DISK))
.map_err(|e| {
BoxliteError::Storage(format!("Failed to install guest rootfs disk: {}", e))
})?;
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On failure to rename disk files into box_home (either container or guest), the function returns an error but does not clean up the partially created box_home directory and/or any disk file already moved. Consider adding cleanup (remove box_home and any moved files) before returning to avoid leaving orphaned directories/disks when import fails mid-way.

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +169
fn test_db() -> Database {
let temp_dir = TempDir::new().unwrap();
let db_path = temp_dir.path().join("test.db");
Database::open(&db_path).unwrap()
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_db() creates a TempDir and drops it at function exit, but returns a Database that points at a path inside that temp directory. This can cause tests to behave differently across platforms (e.g., Windows cannot delete open files) and can make the DB path invalid after return. Keep the TempDir alive for the test (return it alongside Database, store it in the test scope, or use tempfile::NamedTempFile/tempdir in each test).

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +78
// Create QCOW2 internal snapshots on both disks
qemu_img::snapshot_create(&container_disk, snapshot_name)?;

if guest_disk.exists() {
qemu_img::snapshot_create(&guest_disk, snapshot_name)?;
}

// Store metadata in database
let record = SnapshotStore::create_record(
config.id.as_str(),
snapshot_name,
description,
);
snapshot_store.save(&record)?;

Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If snapshot_store.save(&record) fails after the QCOW2 internal snapshots are created, the disks will still contain the new snapshot but there will be no corresponding DB record. Consider writing metadata first in a transaction and only creating disk snapshots after, or add rollback logic (delete the QCOW2 snapshots) on DB failure to keep disk + DB consistent.

Copilot uses AI. Check for mistakes.
Self { db }
}

/// Get a reference to the underlying database.
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says "Get a reference to the underlying database", but the method returns a cloned Database handle by value. Consider rewording to avoid implying a borrowed reference.

Suggested change
/// Get a reference to the underlying database.
/// Get a cloned handle to the underlying database.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +96
pub async fn export(
&self,
id_or_name: &str,
output_path: &Path,
) -> BoxliteResult<()> {
let rt = &self.rt_impl;

// Resolve box and verify it's stopped
let (config, _state) = resolve_stopped_box(rt, id_or_name)?;

let box_home = &config.box_home;
let container_disk = box_home.join(disk_filenames::CONTAINER_DISK);
let guest_disk = box_home.join(disk_filenames::GUEST_ROOTFS_DISK);

// Validate disks exist
if !container_disk.exists() {
return Err(BoxliteError::Storage(format!(
"Container disk not found at {}",
container_disk.display()
)));
}

// Create temp directory for flattened disks (same filesystem for efficiency)
let temp_dir = tempfile::tempdir_in(rt.layout.temp_dir()).map_err(|e| {
BoxliteError::Storage(format!("Failed to create temp directory: {}", e))
})?;

// Flatten COW disks to standalone images
let flat_container = temp_dir.path().join(disk_filenames::CONTAINER_DISK);
qemu_img::convert(&container_disk, &flat_container)?;

Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export() is declared async but performs blocking work (filesystem I/O, qemu-img subprocess execution, tar building) directly on the async task. This can block the Tokio runtime thread. Consider moving the heavy/blocking sections into tokio::task::spawn_blocking (or making the API synchronous) to avoid starving other async work.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +69
pub async fn snapshot(
&self,
id_or_name: &str,
snapshot_name: &str,
description: &str,
) -> BoxliteResult<SnapshotRecord> {
let rt = &self.rt_impl;

let (config, _state) = resolve_stopped_box(rt, id_or_name)?;

let box_home = &config.box_home;
let container_disk = box_home.join(disk_filenames::CONTAINER_DISK);
let guest_disk = box_home.join(disk_filenames::GUEST_ROOTFS_DISK);

// Validate container disk exists
if !container_disk.exists() {
return Err(BoxliteError::Storage(format!(
"Container disk not found at {}",
container_disk.display()
)));
}

// Check for duplicate snapshot name
let snapshot_store = SnapshotStore::new(rt.box_manager.db());
if snapshot_store
.get_by_name(config.id.as_str(), snapshot_name)?
.is_some()
{
return Err(BoxliteError::AlreadyExists(format!(
"snapshot '{}' already exists for box '{}'",
snapshot_name, id_or_name
)));
}

// Create QCOW2 internal snapshots on both disks
qemu_img::snapshot_create(&container_disk, snapshot_name)?;

if guest_disk.exists() {
qemu_img::snapshot_create(&guest_disk, snapshot_name)?;
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot() is async but runs blocking operations (disk existence checks, qemu-img subprocess calls, SQLite access) directly on the async task. Consider offloading the blocking portions to tokio::task::spawn_blocking to avoid blocking the async runtime, similar to the pattern used elsewhere in rt_impl for DB calls.

Copilot uses AI. Check for mistakes.
@DorianZheng
Copy link
Copy Markdown
Member

@joeyaflores Hi, thanks for the contribution! This is incredible. Will take a look ASAP.

@DorianZheng
Copy link
Copy Markdown
Member

Hi @joeyaflores. I just updated the interfaces. Could you take a look and update the current API based on it? It's updated based on the community feedbacks

#205 (comment)

rename snapshots table to box_snapshot with updated columns: snapshot_dir,
guest_disk_size_bytes, container_disk_size_bytes, size_bytes. rename
SnapshotRecord to SnapshotInfo. add v5->v6 migration path.
add SnapshotOptions, ExportOptions, CloneOptions types. add Snapshotting,
Restoring, Exporting transient states to BoxStatus with transitions from
Stopped. update status_to_string in c, node, and python sdks. add
SNAPSHOTS_DIR disk constant.
move snapshot/export/clone operations to LiteBox methods. snapshots use
external cow files instead of qcow2 internal snapshots. clones use cow by
default. export produces tar.zst archives (v2) with sha-256 checksums.
import auto-detects v1/v2 format and returns LiteBox handle.

add SnapshotHandle sub-resource on LiteBox, zstd dependency, snapshot
layout helpers. remove old runtime/snapshots.rs and runtime/clone.rs.
clean up qemu_img to only keep convert and full_copy.
add PySnapshotHandle with create/list/get/remove/restore methods as a
sub-resource on PyBox. add export() and clone_box() methods on PyBox.
add PySnapshotOptions, PyExportOptions, PyCloneOptions option types.
remove old runtime-level snapshot/export/duplicate methods. import_archive
now returns PyBox instead of PyBoxInfo.
LiteBox doesn't derive Clone so there's no conflict. The name
parameter is now mandatory per the approved API spec.
Add quiesce, quiesce_timeout_secs, and stop_on_quiesce_fail with
defaults matching the spec (true/30/true). Currently no-op; reserved
for future guest-side FIFREEZE support.
guest_disk_size_bytes -> guest_disk_bytes
container_disk_size_bytes -> container_disk_bytes

Applies to DB columns, SnapshotInfo struct, and Python bindings.
No migration needed since v6 schema is unreleased.
- original_name -> box_name
- options: BoxOptions -> image: String (extract from rootfs spec)
- checksums: HashMap -> guest_disk_checksum + container_disk_checksum
- remove files: Vec<String>
- add sha256: prefix to checksum values
- import reconstructs BoxOptions from image field

Import now also requires name as &str instead of Option<String>.
@joeyaflores
Copy link
Copy Markdown
Contributor Author

hi @DorianZheng — went through the spec in #205 and updated everything to match. changes:

Renames:

  • clone_box() to clone(), made name required
  • import() now takes name: &str instead of Option
  • python side: import_archive() to import_box()

SnapshotOptions - added the three quiesce fields (quiesce, quiesce_timeout_secs, stop_on_quiesce_fail) with
the spec defaults. No-op for now, but the API surface is ready for frifreeze.

SnapshotInfo - renamed guest_disk_size_bytes/container_disk_size_bytes to
guest_disk_bytes/container_disk_bytes. DB columns updated too (v6 is unreleased so no extra migration).

ArchiveManifest"

  • original_name -> box_name
  • Replaced options: BoxOptions with just image: String - export extracts it from the rootfs spec, import
    reconstructs default BoxOptions from it
  • flattened the checksums HashMap into guest_disk_checksum + container_disk_checksum with sha256
  • Dropped the files vec

everything compiles, tests pass, and clippy is clean

@DorianZheng
Copy link
Copy Markdown
Member

Hi @joeyaflores . There are three points I would like to discuss

  1. boxlite/src/litebox/clone.rs (clone_cow): the Disk returned by create_cow_child_disk is dropped immediately. Since it is non-persistent, Drop removes the newly created clone disk file.
  2. boxlite/src/litebox/snapshot.rs (do_create/do_restore): same RAII issue for active COW child disks; files can be deleted right after creation.
  3. boxlite/src/litebox/snapshot.rs (remove): guard only checks direct backing. It must validate the full qcow2 backing chain (direct + indirect) before allowing snapshot deletion.

Copilot AI review requested due to automatic review settings February 18, 2026 03:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@DorianZheng DorianZheng merged commit 55028f4 into boxlite-ai:main Feb 18, 2026
@uran0sH
Copy link
Copy Markdown
Contributor

uran0sH commented Feb 18, 2026

Hi @DorianZheng After this branch merged, the main branch can't be compiled successfully.

@DorianZheng
Copy link
Copy Markdown
Member

Hi @uran0sH. I'm fixing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants