Skip to content

refactor: clone/restore inherit all resources from snapshot#27

Merged
CMGS merged 10 commits into
masterfrom
feat/clone-restore-fixed-resources
May 6, 2026
Merged

refactor: clone/restore inherit all resources from snapshot#27
CMGS merged 10 commits into
masterfrom
feat/clone-restore-fixed-resources

Conversation

@CMGS
Copy link
Copy Markdown
Contributor

@CMGS CMGS commented May 6, 2026

Summary

Removes --cpu, --memory, --storage, and --nics from cocoon vm clone and cocoon vm restore. Both hypervisors reconstruct the guest from the snapshot's binary device state, so none of these can actually be grown — the previous flag layer was Potemkin validation.

The hot-swap NIC logic stays: clone still removes the snapshot's NICs and adds fresh ones at the same count, because the binary state binds the old MACs/TAPs to the source VM.

What changed

CLI

  • addCloneFlags and restoreCmd lose --cpu, --memory, --storage, --nics.
  • mergeResourceFlagsmergeStorageFlag → deleted (fully dead).
  • CloneVMConfigFromFlags inherits CPU/memory/storage from snapshot, NIC count from cfg.NICs.
  • RestoreVMConfigFromFlags builds the persisted record from snapCfg.Config (so the record matches what the hypervisor reconstructs) except for Network, which stays from vm.Config: the CNI namespace, TAPs, and IPs were allocated at create time and survive restore. NIC count must match the target VM (gated inside the helper, eliminating duplicate checks in run.go).
  • validateFCCloneOverrides deleted (fully dead).

Cloud Hypervisor

  • patchOptions drops cpu, memory, windows fields.
  • patchCHConfig no longer patches CPU/memory; signature simplified.
  • patchMemoryAndBalloon and patchBalloonRaw deleted.
  • parseCHConfig returns (*chVMConfig, error) instead of (*chVMConfig, []byte, error).
  • qemuExpandImage calls in clone/restore are gone (storage strictly inherits).
  • restoreAfterExtract loses an unused cowPath parameter.

Firecracker

  • validateRestoreOverrides, the CPU/memory checks in cloneAfterExtract, and the ExpandRawImage calls in clone/restore are all gone.
  • RestoreSpec.OverrideCheck and DirectRestoreSpec.OverrideCheck fields removed.

Docs

  • README clone-flag table reduced to non-resource knobs; restore section gets its own flag table; What Gets Captured made explicit that snapshot metadata still records the full resource topology.
  • KNOWN_ISSUES updated.

Persisted record after restore

Field Source after restore Why
CPU, Memory, Storage snapshot guest is reconstructed from snapshot binary state
Image, ImageDigest, ImageType snapshot rootfs lineage matches the snapshot's
Windows, SharedMemory snapshot baked into the snapshot's config
QueueSize, DiskQueueSize, NoDirectIO snapshot virtio device shape baked in; future cold boots read from patched config.json
Network target VM CNI namespace, TAPs, IPs were allocated at create time and survive restore
Name target VM VM identity is preserved

Net diff

17 files, +80 / -446

Test plan

  • make lint clean on darwin + linux
  • go test ./... all pass
  • /code senior review walkthrough
  • /simplify three-agent review (round 1) + combined review (round 2)
  • Testbed E2E (35.240.182.52)
    • vm clone / vm restore reject --cpu, --memory, --storage, --nics as unknown flag
    • vm run --help still has the four resource flags
    • Clone with no flags inherits CPU/memory/storage/NICs verbatim from snapshot
    • Restore: post-restore vm inspect shows snapshot's resources; running guest matches; Network field preserved
    • NIC hot-swap still produces correct fresh MACs on clone

Why drop --storage too

Same root cause as CPU/memory. Host file resizes (qemu-img reports new size), but virtio-blk on vm.restore is reconstructed from the snapshot's binary device state with original capacity baked in. Plus --storage was ambiguous in multi-data-disk setups (only addressed COW). Dropping the illusion is the cleaner contract.

CMGS added 2 commits May 6, 2026 11:22
Both hypervisors restore guest CPU and memory from the snapshot's binary
memory image; values written into config.json or rejected at the FC API
do not change what the guest sees. Removing the flags eliminates the
illusion of overridable resources and the dead validation paths that
went with them.

CLI: removes --cpu/--memory from clone and restore. mergeResourceFlags
collapses to mergeStorageFlag.

CH: patchOptions drops cpu/memory/windows fields; patchCHConfig drops
its rawData parameter (no callers passed bytes); patchMemoryAndBalloon
and patchBalloonRaw deleted.

FC: validateRestoreOverrides and the cloneAfterExtract CPU/memory mismatch
checks deleted; RestoreSpec/DirectRestoreSpec.OverrideCheck removed.

Docs: README clone/restore tables and snapshot section updated;
KNOWN_ISSUES drops the entry from PR #26 and the stale FC bullet.
All clone/restore resources now inherit verbatim from the snapshot. Both
hypervisors restore guest state from the snapshot's binary device state,
so storage and NIC count are equally fixed at snapshot time — same root
cause as CPU/memory.

The hot-swap NIC logic in clone stays: clone always rebuilds NICs at the
snapshot's count to fix the MAC binding issue (binary device state holds
the snapshot-time MACs).

CLI: drops --storage from restore and --storage/--nics from clone.
mergeStorageFlag and validateFCCloneOverrides deleted (fully dead).
RestoreVMConfigFromFlags moves the NIC-count guard inside, eliminating
two duplicate checks in run.go.

Backends: removes the four storage-resize call sites in clone/restore
(CH and FC). qemuExpandImage / ExpandRawImage stay — still used by vm
run for fresh COW sizing. CH restoreAfterExtract loses an unused
cowPath parameter.

Docs: README clone/restore tables and KNOWN_ISSUES updated to state
that all resources inherit from the snapshot.
@CMGS CMGS changed the title refactor: drop --cpu/--memory from clone/restore refactor: clone/restore inherit all resources from snapshot May 6, 2026
CMGS added 2 commits May 6, 2026 12:09
RestoreVMConfigFromFlags previously copied vm.Config (current record) and
passed it to FinalizeRestore, which writes it back as the persisted
config. With a same-lineage snapshot the values match, but two paths
break the invariant:

  - vm restore --from-dir --force on a foreign lineage with the same
    NIC count silently overlays snapshot resources onto the guest while
    the record retains the old VM's CPU/memory/storage.
  - Records carrying drift from older builds (when --cpu / --memory on
    restore was a CLI knob that never actually grew the guest) keep
    propagating the wrong numbers through every subsequent restore.

Fix: build VMConfig from snapCfg.Config so the persisted record matches
what the hypervisor reconstructs in the guest. VM identity (Name) stays;
--on-demand still comes from the flag.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the (previously ineffective) resource override flags from cocoon vm clone / cocoon vm restore, aligning CLI and hypervisor behavior with the reality that snapshots restore fixed binary device state (CPU/memory/storage/NIC count).

Changes:

  • Dropped --cpu, --memory, --storage, --nics from clone/restore and removed now-dead flag merge/override-validation logic.
  • Simplified Cloud Hypervisor clone/restore patching to stop attempting CPU/memory/storage resizing and simplified parseCHConfig / patchCHConfig plumbing.
  • Updated docs to reflect strict resource inheritance semantics for clone/restore.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
README.md Removes clone/restore resource flags from docs and updates constraints/capability matrix wording.
KNOWN_ISSUES.md Rewrites known issue to reflect clone/restore resource inheritance; removes obsolete CPU/memory override notes.
hypervisor/backend.go Removes OverrideCheck hooks from restore spec types.
hypervisor/restore.go Deletes OverrideCheck invocation from restore sequences.
hypervisor/firecracker/clone.go Removes FC clone-time CPU/memory override rejection and storage expansion.
hypervisor/firecracker/direct.go Removes FC direct-restore override checking hook wiring.
hypervisor/firecracker/restore.go Removes FC restore override checking hook and storage expansion; deletes override validator.
hypervisor/cloudhypervisor/clone.go Stops trying to resize storage and patch CPU/memory; updates CH config parsing + patch call sites.
hypervisor/cloudhypervisor/clone_test.go Updates tests for new patchCHConfig signature; removes CPU/memory/balloon patch tests.
hypervisor/cloudhypervisor/direct.go Updates direct clone/restore to new parseCHConfig signature and restoreAfterExtract call.
hypervisor/cloudhypervisor/helper.go Updates snapshot integrity helper to new parseCHConfig signature.
hypervisor/cloudhypervisor/patch.go Simplifies patch options and removes CPU/memory/balloon patching logic.
hypervisor/cloudhypervisor/restore.go Removes storage resizing and CPU/memory patching; simplifies restoreAfterExtract signature.
hypervisor/cloudhypervisor/snapshot.go Updates snapshot meta building to new parseCHConfig signature.
cmd/vm/commands.go Removes clone/restore resource flags from Cobra command definitions.
cmd/vm/run.go Removes clone/restore resource override handling and duplicate NIC-count checks; keeps NIC hot-swap behavior at snapshot NIC count.
cmd/core/helpers.go Makes clone inherit CPU/memory/storage from snapshot; refactors restore config construction and moves NIC-count guard inside.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/core/helpers.go Outdated
Comment thread hypervisor/cloudhypervisor/clone.go
Restoring from a snapshot reuses the VM's existing CNI namespace, TAPs,
and IP allocation — those were bound to vm.Config.Network at create
time. The previous fix copied snapCfg.Config wholesale into the
persisted record, which would overwrite Network too. With a foreign
--from-dir --force snapshot whose source VM ran on a different CNI
conflist, the record would advertise a CNI selection that the live
namespace doesn't actually use.

Other Config fields (CPU/Memory/Storage, Image*, QueueSize, Windows,
SharedMemory, NoDirectIO) genuinely follow the snapshot — the guest is
literally reconstructed from snapshot state — so they keep coming from
snapCfg.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread cmd/core/helpers.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment thread KNOWN_ISSUES.md Outdated
Comment thread cmd/core/helpers.go Outdated
…dation

Three review fixes:

1. Clone Flags table missed --on-demand and --from-dir even though
   addCloneFlags registers both — added rows so the README matches the
   CLI surface.

2. KNOWN_ISSUES'\'' resource section claimed restore inherits NIC count
   from the snapshot. Restore actually requires the snapshot NIC count
   to match the target VM (mismatch is rejected); reworded.

3. RestoreVMConfigFromFlags now calls VMConfig.Validate() on the
   assembled config so a tampered or malformed --from-dir --force
   snapshot.json (negative CPU, zero memory, etc.) errors out before
   FinalizeRestore overwrites the VM record.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Comment thread cmd/core/helpers.go Outdated
CloneVMConfigFromFlags called Validate() before prepareClone applied the
'cocoon-clone-<id>' default name, so 'cocoon vm clone <snap>' without
--name failed with 'vm name cannot be empty' instead of taking the
default. Move Validate to prepareClone, right after the default is
filled in. Other Config fields are still pre-validated transitively
(snapshot save validates them at write time).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

@CMGS CMGS merged commit 50df13c into master May 6, 2026
4 checks passed
@CMGS CMGS deleted the feat/clone-restore-fixed-resources branch May 6, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants