feat: add snapshot, export, import, and clone operations#217
feat: add snapshot, export, import, and clone operations#217DorianZheng merged 16 commits intoboxlite-ai:mainfrom
Conversation
Centralize QCOW2 disk filenames into disk::constants instead of hardcoding strings across multiple modules. Fixes a bug where init/mod.rs used "root.qcow2" while layout.rs used "disk.qcow2". Also adds guest_rootfs_disk_path() to BoxFilesystemLayout.
Replace the hard error on schema version mismatch with automatic forward migration. Older databases are now migrated in-place through the full migration chain (v2→v3→v4→v5). Adds migration 4→5 which creates the snapshots table for upcoming snapshot functionality. Includes tests for v4→v5 migration and rejection of newer schema versions.
Add SnapshotStore with CRUD operations for snapshot metadata persistence. Each snapshot records a point-in-time capture of a box's disk state with a name unique per box. Also adds db() accessor to BoxStore for sibling modules.
Wrapper functions for qemu-img CLI operations: convert (flatten COW chains), snapshot create/apply/delete, and full copy. Returns clear errors if qemu-img is not installed.
- export: package a stopped box as a portable .boxlite tar archive - import: recreate a box from a .boxlite archive with new identity - snapshot/restore: QCOW2 internal snapshots for stopped boxes - list_snapshots/delete_snapshot: snapshot management - duplicate: full-copy clone of a stopped box (no COW coupling) All operations require the box to be stopped. Export flattens COW chains so archives are fully standalone.
Expose new runtime operations to the Python SDK: - export(), import_archive() - snapshot(), restore(), list_snapshots(), delete_snapshot() - duplicate() Adds PySnapshotRecord type for snapshot metadata.
There was a problem hiding this comment.
Pull request overview
Adds box state persistence/portability features to Boxlite (export/import archives, QCOW2 snapshots with SQLite metadata, and full-copy duplication) and exposes snapshot/export/import/clone APIs via the Python SDK.
Changes:
- Introduces runtime operations for export/import (
.boxlitetar archive), snapshot/restore/list/delete, and full-copy duplicate. - Adds SQLite schema v5 with a new
snapshotstable plus auto-migration support and snapshot CRUD store. - Adds a
qemu-imgwrapper to perform QCOW2 flattening/copying and internal snapshot operations.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
sdks/python/src/snapshots.rs |
Adds SnapshotRecord Python type for snapshot metadata. |
sdks/python/src/runtime.rs |
Exposes async Python APIs: export/import, snapshot/restore/list/delete, duplicate. |
sdks/python/src/lib.rs |
Registers snapshot bindings in the Python module. |
boxlite/src/runtime/snapshots.rs |
Implements snapshot/restore/list/delete operations (QCOW2 + DB metadata). |
boxlite/src/runtime/portability.rs |
Implements export/import archive format, plus stopped-box resolver helper. |
boxlite/src/runtime/clone.rs |
Implements full-copy duplication of stopped boxes. |
boxlite/src/runtime/mod.rs |
Wires in new runtime modules and re-exports ArchiveManifest. |
boxlite/src/runtime/layout.rs |
Centralizes disk paths using shared filename constants; adds guest rootfs disk accessor. |
boxlite/src/runtime/core.rs |
Makes rt_impl pub(crate) for internal module access. |
boxlite/src/litebox/manager.rs |
Exposes DB handle to enable snapshot store usage from runtime. |
boxlite/src/litebox/init/tasks/guest_rootfs.rs |
Uses the new guest rootfs disk path accessor. |
boxlite/src/litebox/init/mod.rs |
Fixes hardcoded QCOW2 filename to use shared constant. |
boxlite/src/lib.rs |
Re-exports SnapshotRecord from the crate for SDK consumers. |
boxlite/src/disk/qemu_img.rs |
Adds qemu-img command wrapper for convert/snapshot/copy operations. |
boxlite/src/disk/mod.rs |
Exposes qemu_img module internally. |
boxlite/src/disk/constants.rs |
Adds centralized disk filename constants. |
boxlite/src/db/schema.rs |
Bumps schema to v5 and adds snapshots table schema. |
boxlite/src/db/mod.rs |
Adds snapshot store module, auto-migration to v5, and tests for migration/table creation. |
boxlite/src/db/snapshots.rs |
Adds snapshot metadata store + tests. |
boxlite/src/db/boxes.rs |
Exposes DB handle from BoxStore for upstream callers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| record.id, | ||
| record.box_id, | ||
| record.name, | ||
| record.description, | ||
| record.created_at, |
There was a problem hiding this comment.
SnapshotStore::save takes &SnapshotRecord, but the rusqlite::params![record.id, ...] arguments attempt to move String fields out of a shared reference, which will not compile. Pass references (e.g., &record.id, &record.box_id, etc.) or clone explicitly.
| record.id, | |
| record.box_id, | |
| record.name, | |
| record.description, | |
| record.created_at, | |
| &record.id, | |
| &record.box_id, | |
| &record.name, | |
| &record.description, | |
| &record.created_at, |
| } | ||
| } | ||
|
|
||
| /// Get a reference to the underlying database. |
There was a problem hiding this comment.
Docstring says "Get a reference to the underlying database", but this returns a cloned Database handle by value. Consider adjusting wording (e.g., "Get a handle to...") to avoid implying a borrowed reference.
| /// Get a reference to the underlying database. | |
| /// Get a handle to the underlying database. |
boxlite/src/runtime/clone.rs
Outdated
| pub async fn duplicate( | ||
| &self, | ||
| id_or_name: &str, | ||
| new_name: Option<String>, | ||
| ) -> BoxliteResult<BoxInfo> { | ||
| let rt = &self.rt_impl; | ||
|
|
||
| let (src_config, _state) = resolve_stopped_box(rt, id_or_name)?; | ||
|
|
||
| let src_home = &src_config.box_home; | ||
| let src_container_disk = src_home.join(disk_filenames::CONTAINER_DISK); | ||
| let src_guest_disk = src_home.join(disk_filenames::GUEST_ROOTFS_DISK); | ||
|
|
||
| if !src_container_disk.exists() { | ||
| return Err(BoxliteError::Storage(format!( | ||
| "Container disk not found at {}", | ||
| src_container_disk.display() | ||
| ))); | ||
| } | ||
|
|
||
| // Generate new box identity | ||
| let box_id = BoxID::new(); | ||
| let container_id = ContainerID::new(); | ||
| let now = Utc::now(); | ||
|
|
||
| let box_home = rt.layout.boxes_dir().join(box_id.as_str()); | ||
| let socket_path = rt_filenames::unix_socket_path(rt.layout.home_dir(), box_id.as_str()); | ||
| let ready_socket_path = box_home.join("sockets").join("ready.sock"); | ||
|
|
||
| // Create box directory | ||
| std::fs::create_dir_all(&box_home).map_err(|e| { | ||
| BoxliteError::Storage(format!( | ||
| "Failed to create box directory {}: {}", | ||
| box_home.display(), | ||
| e | ||
| )) | ||
| })?; | ||
|
|
||
| // Full-copy disks (flattens any COW chains) | ||
| let dst_container_disk = box_home.join(disk_filenames::CONTAINER_DISK); | ||
| if let Err(e) = qemu_img::full_copy(&src_container_disk, &dst_container_disk) { | ||
| let _ = std::fs::remove_dir_all(&box_home); | ||
| return Err(e); | ||
| } | ||
|
|
||
| if src_guest_disk.exists() { | ||
| let dst_guest_disk = box_home.join(disk_filenames::GUEST_ROOTFS_DISK); | ||
| if let Err(e) = qemu_img::full_copy(&src_guest_disk, &dst_guest_disk) { | ||
| let _ = std::fs::remove_dir_all(&box_home); | ||
| return Err(e); | ||
| } | ||
| } |
There was a problem hiding this comment.
duplicate() is async but uses blocking filesystem operations and runs qemu-img directly, which can block the Tokio runtime thread. Consider using tokio::task::spawn_blocking for disk copy / filesystem work, consistent with other runtime methods that offload blocking DB work.
| // Create box directory | ||
| std::fs::create_dir_all(&box_home).map_err(|e| { | ||
| BoxliteError::Storage(format!( | ||
| "Failed to create box directory {}: {}", | ||
| box_home.display(), | ||
| e | ||
| )) | ||
| })?; | ||
|
|
||
| // Move disk files into box directory | ||
| std::fs::rename(&extracted_container, box_home.join(disk_filenames::CONTAINER_DISK)) | ||
| .map_err(|e| { | ||
| BoxliteError::Storage(format!("Failed to install container disk: {}", e)) | ||
| })?; | ||
|
|
||
| let extracted_guest = temp_dir.path().join(disk_filenames::GUEST_ROOTFS_DISK); | ||
| if extracted_guest.exists() { | ||
| std::fs::rename(&extracted_guest, box_home.join(disk_filenames::GUEST_ROOTFS_DISK)) | ||
| .map_err(|e| { | ||
| BoxliteError::Storage(format!("Failed to install guest rootfs disk: {}", e)) | ||
| })?; | ||
| } |
There was a problem hiding this comment.
On failure to rename disk files into box_home (either container or guest), the function returns an error but does not clean up the partially created box_home directory and/or any disk file already moved. Consider adding cleanup (remove box_home and any moved files) before returning to avoid leaving orphaned directories/disks when import fails mid-way.
| fn test_db() -> Database { | ||
| let temp_dir = TempDir::new().unwrap(); | ||
| let db_path = temp_dir.path().join("test.db"); | ||
| Database::open(&db_path).unwrap() | ||
| } |
There was a problem hiding this comment.
test_db() creates a TempDir and drops it at function exit, but returns a Database that points at a path inside that temp directory. This can cause tests to behave differently across platforms (e.g., Windows cannot delete open files) and can make the DB path invalid after return. Keep the TempDir alive for the test (return it alongside Database, store it in the test scope, or use tempfile::NamedTempFile/tempdir in each test).
boxlite/src/runtime/snapshots.rs
Outdated
| // Create QCOW2 internal snapshots on both disks | ||
| qemu_img::snapshot_create(&container_disk, snapshot_name)?; | ||
|
|
||
| if guest_disk.exists() { | ||
| qemu_img::snapshot_create(&guest_disk, snapshot_name)?; | ||
| } | ||
|
|
||
| // Store metadata in database | ||
| let record = SnapshotStore::create_record( | ||
| config.id.as_str(), | ||
| snapshot_name, | ||
| description, | ||
| ); | ||
| snapshot_store.save(&record)?; | ||
|
|
There was a problem hiding this comment.
If snapshot_store.save(&record) fails after the QCOW2 internal snapshots are created, the disks will still contain the new snapshot but there will be no corresponding DB record. Consider writing metadata first in a transaction and only creating disk snapshots after, or add rollback logic (delete the QCOW2 snapshots) on DB failure to keep disk + DB consistent.
| Self { db } | ||
| } | ||
|
|
||
| /// Get a reference to the underlying database. |
There was a problem hiding this comment.
Docstring says "Get a reference to the underlying database", but the method returns a cloned Database handle by value. Consider rewording to avoid implying a borrowed reference.
| /// Get a reference to the underlying database. | |
| /// Get a cloned handle to the underlying database. |
boxlite/src/runtime/portability.rs
Outdated
| pub async fn export( | ||
| &self, | ||
| id_or_name: &str, | ||
| output_path: &Path, | ||
| ) -> BoxliteResult<()> { | ||
| let rt = &self.rt_impl; | ||
|
|
||
| // Resolve box and verify it's stopped | ||
| let (config, _state) = resolve_stopped_box(rt, id_or_name)?; | ||
|
|
||
| let box_home = &config.box_home; | ||
| let container_disk = box_home.join(disk_filenames::CONTAINER_DISK); | ||
| let guest_disk = box_home.join(disk_filenames::GUEST_ROOTFS_DISK); | ||
|
|
||
| // Validate disks exist | ||
| if !container_disk.exists() { | ||
| return Err(BoxliteError::Storage(format!( | ||
| "Container disk not found at {}", | ||
| container_disk.display() | ||
| ))); | ||
| } | ||
|
|
||
| // Create temp directory for flattened disks (same filesystem for efficiency) | ||
| let temp_dir = tempfile::tempdir_in(rt.layout.temp_dir()).map_err(|e| { | ||
| BoxliteError::Storage(format!("Failed to create temp directory: {}", e)) | ||
| })?; | ||
|
|
||
| // Flatten COW disks to standalone images | ||
| let flat_container = temp_dir.path().join(disk_filenames::CONTAINER_DISK); | ||
| qemu_img::convert(&container_disk, &flat_container)?; | ||
|
|
There was a problem hiding this comment.
export() is declared async but performs blocking work (filesystem I/O, qemu-img subprocess execution, tar building) directly on the async task. This can block the Tokio runtime thread. Consider moving the heavy/blocking sections into tokio::task::spawn_blocking (or making the API synchronous) to avoid starving other async work.
boxlite/src/runtime/snapshots.rs
Outdated
| pub async fn snapshot( | ||
| &self, | ||
| id_or_name: &str, | ||
| snapshot_name: &str, | ||
| description: &str, | ||
| ) -> BoxliteResult<SnapshotRecord> { | ||
| let rt = &self.rt_impl; | ||
|
|
||
| let (config, _state) = resolve_stopped_box(rt, id_or_name)?; | ||
|
|
||
| let box_home = &config.box_home; | ||
| let container_disk = box_home.join(disk_filenames::CONTAINER_DISK); | ||
| let guest_disk = box_home.join(disk_filenames::GUEST_ROOTFS_DISK); | ||
|
|
||
| // Validate container disk exists | ||
| if !container_disk.exists() { | ||
| return Err(BoxliteError::Storage(format!( | ||
| "Container disk not found at {}", | ||
| container_disk.display() | ||
| ))); | ||
| } | ||
|
|
||
| // Check for duplicate snapshot name | ||
| let snapshot_store = SnapshotStore::new(rt.box_manager.db()); | ||
| if snapshot_store | ||
| .get_by_name(config.id.as_str(), snapshot_name)? | ||
| .is_some() | ||
| { | ||
| return Err(BoxliteError::AlreadyExists(format!( | ||
| "snapshot '{}' already exists for box '{}'", | ||
| snapshot_name, id_or_name | ||
| ))); | ||
| } | ||
|
|
||
| // Create QCOW2 internal snapshots on both disks | ||
| qemu_img::snapshot_create(&container_disk, snapshot_name)?; | ||
|
|
||
| if guest_disk.exists() { | ||
| qemu_img::snapshot_create(&guest_disk, snapshot_name)?; | ||
| } |
There was a problem hiding this comment.
snapshot() is async but runs blocking operations (disk existence checks, qemu-img subprocess calls, SQLite access) directly on the async task. Consider offloading the blocking portions to tokio::task::spawn_blocking to avoid blocking the async runtime, similar to the pattern used elsewhere in rt_impl for DB calls.
|
@joeyaflores Hi, thanks for the contribution! This is incredible. Will take a look ASAP. |
|
Hi @joeyaflores. I just updated the interfaces. Could you take a look and update the current API based on it? It's updated based on the community feedbacks |
rename snapshots table to box_snapshot with updated columns: snapshot_dir, guest_disk_size_bytes, container_disk_size_bytes, size_bytes. rename SnapshotRecord to SnapshotInfo. add v5->v6 migration path.
add SnapshotOptions, ExportOptions, CloneOptions types. add Snapshotting, Restoring, Exporting transient states to BoxStatus with transitions from Stopped. update status_to_string in c, node, and python sdks. add SNAPSHOTS_DIR disk constant.
move snapshot/export/clone operations to LiteBox methods. snapshots use external cow files instead of qcow2 internal snapshots. clones use cow by default. export produces tar.zst archives (v2) with sha-256 checksums. import auto-detects v1/v2 format and returns LiteBox handle. add SnapshotHandle sub-resource on LiteBox, zstd dependency, snapshot layout helpers. remove old runtime/snapshots.rs and runtime/clone.rs. clean up qemu_img to only keep convert and full_copy.
add PySnapshotHandle with create/list/get/remove/restore methods as a sub-resource on PyBox. add export() and clone_box() methods on PyBox. add PySnapshotOptions, PyExportOptions, PyCloneOptions option types. remove old runtime-level snapshot/export/duplicate methods. import_archive now returns PyBox instead of PyBoxInfo.
LiteBox doesn't derive Clone so there's no conflict. The name parameter is now mandatory per the approved API spec.
Add quiesce, quiesce_timeout_secs, and stop_on_quiesce_fail with defaults matching the spec (true/30/true). Currently no-op; reserved for future guest-side FIFREEZE support.
guest_disk_size_bytes -> guest_disk_bytes container_disk_size_bytes -> container_disk_bytes Applies to DB columns, SnapshotInfo struct, and Python bindings. No migration needed since v6 schema is unreleased.
- original_name -> box_name - options: BoxOptions -> image: String (extract from rootfs spec) - checksums: HashMap -> guest_disk_checksum + container_disk_checksum - remove files: Vec<String> - add sha256: prefix to checksum values - import reconstructs BoxOptions from image field Import now also requires name as &str instead of Option<String>.
|
hi @DorianZheng — went through the spec in #205 and updated everything to match. changes: Renames:
SnapshotOptions - added the three quiesce fields (quiesce, quiesce_timeout_secs, stop_on_quiesce_fail) with SnapshotInfo - renamed guest_disk_size_bytes/container_disk_size_bytes to ArchiveManifest"
everything compiles, tests pass, and clippy is clean |
|
Hi @joeyaflores . There are three points I would like to discuss
|
|
Hi @DorianZheng After this branch merged, the main branch can't be compiled successfully. |
|
Hi @uran0sH. I'm fixing it |
Summary
Adds four new capabilities to BoxliteRuntime for box state persistence and portability:
.boxlitetar archive (both disks + config metadata). Import recreates the box with a new identity on any compatible installation.list_snapshots()anddelete_snapshot().duplicate()creates an independent full-copy of a stopped box (no COW backing-file coupling).Boxliteclass, plusSnapshotRecordtype.Foundation changes
init/mod.rshardcoded"root.qcow2"whilelayout.rsused"disk.qcow2"SnapshotStoreCRUD operationsqemu-imgwrapper for convert, snapshot, and full-copy operationsDesign decisions
qemu-img convertso archives are fully standaloneduplicate()to avoid conflict with Rust'sClonetraitCREATE TABLE IF NOT EXISTS) with version trackingTest plan
cargo check)