refactor: clone/restore inherit all resources from snapshot#27
Conversation
Both hypervisors restore guest CPU and memory from the snapshot's binary memory image; values written into config.json or rejected at the FC API do not change what the guest sees. Removing the flags eliminates the illusion of overridable resources and the dead validation paths that went with them. CLI: removes --cpu/--memory from clone and restore. mergeResourceFlags collapses to mergeStorageFlag. CH: patchOptions drops cpu/memory/windows fields; patchCHConfig drops its rawData parameter (no callers passed bytes); patchMemoryAndBalloon and patchBalloonRaw deleted. FC: validateRestoreOverrides and the cloneAfterExtract CPU/memory mismatch checks deleted; RestoreSpec/DirectRestoreSpec.OverrideCheck removed. Docs: README clone/restore tables and snapshot section updated; KNOWN_ISSUES drops the entry from PR #26 and the stale FC bullet.
All clone/restore resources now inherit verbatim from the snapshot. Both hypervisors restore guest state from the snapshot's binary device state, so storage and NIC count are equally fixed at snapshot time — same root cause as CPU/memory. The hot-swap NIC logic in clone stays: clone always rebuilds NICs at the snapshot's count to fix the MAC binding issue (binary device state holds the snapshot-time MACs). CLI: drops --storage from restore and --storage/--nics from clone. mergeStorageFlag and validateFCCloneOverrides deleted (fully dead). RestoreVMConfigFromFlags moves the NIC-count guard inside, eliminating two duplicate checks in run.go. Backends: removes the four storage-resize call sites in clone/restore (CH and FC). qemuExpandImage / ExpandRawImage stay — still used by vm run for fresh COW sizing. CH restoreAfterExtract loses an unused cowPath parameter. Docs: README clone/restore tables and KNOWN_ISSUES updated to state that all resources inherit from the snapshot.
RestoreVMConfigFromFlags previously copied vm.Config (current record) and
passed it to FinalizeRestore, which writes it back as the persisted
config. With a same-lineage snapshot the values match, but two paths
break the invariant:
- vm restore --from-dir --force on a foreign lineage with the same
NIC count silently overlays snapshot resources onto the guest while
the record retains the old VM's CPU/memory/storage.
- Records carrying drift from older builds (when --cpu / --memory on
restore was a CLI knob that never actually grew the guest) keep
propagating the wrong numbers through every subsequent restore.
Fix: build VMConfig from snapCfg.Config so the persisted record matches
what the hypervisor reconstructs in the guest. VM identity (Name) stays;
--on-demand still comes from the flag.
There was a problem hiding this comment.
Pull request overview
Removes the (previously ineffective) resource override flags from cocoon vm clone / cocoon vm restore, aligning CLI and hypervisor behavior with the reality that snapshots restore fixed binary device state (CPU/memory/storage/NIC count).
Changes:
- Dropped
--cpu,--memory,--storage,--nicsfrom clone/restore and removed now-dead flag merge/override-validation logic. - Simplified Cloud Hypervisor clone/restore patching to stop attempting CPU/memory/storage resizing and simplified
parseCHConfig/patchCHConfigplumbing. - Updated docs to reflect strict resource inheritance semantics for clone/restore.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Removes clone/restore resource flags from docs and updates constraints/capability matrix wording. |
| KNOWN_ISSUES.md | Rewrites known issue to reflect clone/restore resource inheritance; removes obsolete CPU/memory override notes. |
| hypervisor/backend.go | Removes OverrideCheck hooks from restore spec types. |
| hypervisor/restore.go | Deletes OverrideCheck invocation from restore sequences. |
| hypervisor/firecracker/clone.go | Removes FC clone-time CPU/memory override rejection and storage expansion. |
| hypervisor/firecracker/direct.go | Removes FC direct-restore override checking hook wiring. |
| hypervisor/firecracker/restore.go | Removes FC restore override checking hook and storage expansion; deletes override validator. |
| hypervisor/cloudhypervisor/clone.go | Stops trying to resize storage and patch CPU/memory; updates CH config parsing + patch call sites. |
| hypervisor/cloudhypervisor/clone_test.go | Updates tests for new patchCHConfig signature; removes CPU/memory/balloon patch tests. |
| hypervisor/cloudhypervisor/direct.go | Updates direct clone/restore to new parseCHConfig signature and restoreAfterExtract call. |
| hypervisor/cloudhypervisor/helper.go | Updates snapshot integrity helper to new parseCHConfig signature. |
| hypervisor/cloudhypervisor/patch.go | Simplifies patch options and removes CPU/memory/balloon patching logic. |
| hypervisor/cloudhypervisor/restore.go | Removes storage resizing and CPU/memory patching; simplifies restoreAfterExtract signature. |
| hypervisor/cloudhypervisor/snapshot.go | Updates snapshot meta building to new parseCHConfig signature. |
| cmd/vm/commands.go | Removes clone/restore resource flags from Cobra command definitions. |
| cmd/vm/run.go | Removes clone/restore resource override handling and duplicate NIC-count checks; keeps NIC hot-swap behavior at snapshot NIC count. |
| cmd/core/helpers.go | Makes clone inherit CPU/memory/storage from snapshot; refactors restore config construction and moves NIC-count guard inside. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Restoring from a snapshot reuses the VM's existing CNI namespace, TAPs, and IP allocation — those were bound to vm.Config.Network at create time. The previous fix copied snapCfg.Config wholesale into the persisted record, which would overwrite Network too. With a foreign --from-dir --force snapshot whose source VM ran on a different CNI conflist, the record would advertise a CNI selection that the live namespace doesn't actually use. Other Config fields (CPU/Memory/Storage, Image*, QueueSize, Windows, SharedMemory, NoDirectIO) genuinely follow the snapshot — the guest is literally reconstructed from snapshot state — so they keep coming from snapCfg.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…dation Three review fixes: 1. Clone Flags table missed --on-demand and --from-dir even though addCloneFlags registers both — added rows so the README matches the CLI surface. 2. KNOWN_ISSUES'\'' resource section claimed restore inherits NIC count from the snapshot. Restore actually requires the snapshot NIC count to match the target VM (mismatch is rejected); reworded. 3. RestoreVMConfigFromFlags now calls VMConfig.Validate() on the assembled config so a tampered or malformed --from-dir --force snapshot.json (negative CPU, zero memory, etc.) errors out before FinalizeRestore overwrites the VM record.
CloneVMConfigFromFlags called Validate() before prepareClone applied the 'cocoon-clone-<id>' default name, so 'cocoon vm clone <snap>' without --name failed with 'vm name cannot be empty' instead of taking the default. Move Validate to prepareClone, right after the default is filled in. Other Config fields are still pre-validated transitively (snapshot save validates them at write time).
Summary
Removes
--cpu,--memory,--storage, and--nicsfromcocoon vm cloneandcocoon vm restore. Both hypervisors reconstruct the guest from the snapshot's binary device state, so none of these can actually be grown — the previous flag layer was Potemkin validation.The hot-swap NIC logic stays: clone still removes the snapshot's NICs and adds fresh ones at the same count, because the binary state binds the old MACs/TAPs to the source VM.
What changed
CLI
addCloneFlagsandrestoreCmdlose--cpu,--memory,--storage,--nics.mergeResourceFlags→mergeStorageFlag→ deleted (fully dead).CloneVMConfigFromFlagsinherits CPU/memory/storage from snapshot, NIC count fromcfg.NICs.RestoreVMConfigFromFlagsbuilds the persisted record fromsnapCfg.Config(so the record matches what the hypervisor reconstructs) except forNetwork, which stays fromvm.Config: the CNI namespace, TAPs, and IPs were allocated at create time and survive restore. NIC count must match the target VM (gated inside the helper, eliminating duplicate checks in run.go).validateFCCloneOverridesdeleted (fully dead).Cloud Hypervisor
patchOptionsdropscpu,memory,windowsfields.patchCHConfigno longer patches CPU/memory; signature simplified.patchMemoryAndBalloonandpatchBalloonRawdeleted.parseCHConfigreturns(*chVMConfig, error)instead of(*chVMConfig, []byte, error).qemuExpandImagecalls in clone/restore are gone (storage strictly inherits).restoreAfterExtractloses an unusedcowPathparameter.Firecracker
validateRestoreOverrides, the CPU/memory checks incloneAfterExtract, and theExpandRawImagecalls in clone/restore are all gone.RestoreSpec.OverrideCheckandDirectRestoreSpec.OverrideCheckfields removed.Docs
What Gets Capturedmade explicit that snapshot metadata still records the full resource topology.Persisted record after restore
Net diff
Test plan
make lintclean on darwin + linuxgo test ./...all passvm clone/vm restorereject--cpu,--memory,--storage,--nicsasunknown flagvm run --helpstill has the four resource flagsvm inspectshows snapshot's resources; running guest matches; Network field preservedWhy drop
--storagetooSame root cause as CPU/memory. Host file resizes (qemu-img reports new size), but virtio-blk on
vm.restoreis reconstructed from the snapshot's binary device state with original capacity baked in. Plus--storagewas ambiguous in multi-data-disk setups (only addressed COW). Dropping the illusion is the cleaner contract.