docs(known-issues): clarify clone/restore --cpu/--memory cannot grow#26
Merged
Conversation
Previous entry implied --cpu changes vCPU count but disk num_queues follows. Empirical verification (E2E run) shows neither vCPU count nor memory size actually grow on the guest side: cocoon patches config.json correctly but CH's vm.restore reconstructs from binary state at snapshot's original size. Update the entry with a clear table and the corrected consequence.
10 tasks
CMGS
added a commit
that referenced
this pull request
May 6, 2026
* refactor: drop --cpu/--memory from clone/restore Both hypervisors restore guest CPU and memory from the snapshot's binary memory image; values written into config.json or rejected at the FC API do not change what the guest sees. Removing the flags eliminates the illusion of overridable resources and the dead validation paths that went with them. CLI: removes --cpu/--memory from clone and restore. mergeResourceFlags collapses to mergeStorageFlag. CH: patchOptions drops cpu/memory/windows fields; patchCHConfig drops its rawData parameter (no callers passed bytes); patchMemoryAndBalloon and patchBalloonRaw deleted. FC: validateRestoreOverrides and the cloneAfterExtract CPU/memory mismatch checks deleted; RestoreSpec/DirectRestoreSpec.OverrideCheck removed. Docs: README clone/restore tables and snapshot section updated; KNOWN_ISSUES drops the entry from PR #26 and the stale FC bullet. * refactor: drop --storage and --nics from clone/restore too All clone/restore resources now inherit verbatim from the snapshot. Both hypervisors restore guest state from the snapshot's binary device state, so storage and NIC count are equally fixed at snapshot time — same root cause as CPU/memory. The hot-swap NIC logic in clone stays: clone always rebuilds NICs at the snapshot's count to fix the MAC binding issue (binary device state holds the snapshot-time MACs). CLI: drops --storage from restore and --storage/--nics from clone. mergeStorageFlag and validateFCCloneOverrides deleted (fully dead). RestoreVMConfigFromFlags moves the NIC-count guard inside, eliminating two duplicate checks in run.go. Backends: removes the four storage-resize call sites in clone/restore (CH and FC). qemuExpandImage / ExpandRawImage stay — still used by vm run for fresh COW sizing. CH restoreAfterExtract loses an unused cowPath parameter. Docs: README clone/restore tables and KNOWN_ISSUES updated to state that all resources inherit from the snapshot. * docs: add Restore Flags table to README * fix: restore record reflects snapshot resources, not stale VM record RestoreVMConfigFromFlags previously copied vm.Config (current record) and passed it to FinalizeRestore, which writes it back as the persisted config. With a same-lineage snapshot the values match, but two paths break the invariant: - vm restore --from-dir --force on a foreign lineage with the same NIC count silently overlays snapshot resources onto the guest while the record retains the old VM's CPU/memory/storage. - Records carrying drift from older builds (when --cpu / --memory on restore was a CLI knob that never actually grew the guest) keep propagating the wrong numbers through every subsequent restore. Fix: build VMConfig from snapCfg.Config so the persisted record matches what the hypervisor reconstructs in the guest. VM identity (Name) stays; --on-demand still comes from the flag. * docs: clarify restore resources come from snapshot, NICs must match VM * fix: preserve VM Network on restore Restoring from a snapshot reuses the VM's existing CNI namespace, TAPs, and IP allocation — those were bound to vm.Config.Network at create time. The previous fix copied snapCfg.Config wholesale into the persisted record, which would overwrite Network too. With a foreign --from-dir --force snapshot whose source VM ran on a different CNI conflist, the record would advertise a CNI selection that the live namespace doesn't actually use. Other Config fields (CPU/Memory/Storage, Image*, QueueSize, Windows, SharedMemory, NoDirectIO) genuinely follow the snapshot — the guest is literally reconstructed from snapshot state — so they keep coming from snapCfg. * docs: clarify snapshot metadata records resource topology * fix: PR review fallout — clone flag table, restore wording, snap validation Three review fixes: 1. Clone Flags table missed --on-demand and --from-dir even though addCloneFlags registers both — added rows so the README matches the CLI surface. 2. KNOWN_ISSUES'\'' resource section claimed restore inherits NIC count from the snapshot. Restore actually requires the snapshot NIC count to match the target VM (mismatch is rejected); reworded. 3. RestoreVMConfigFromFlags now calls VMConfig.Validate() on the assembled config so a tampered or malformed --from-dir --force snapshot.json (negative CPU, zero memory, etc.) errors out before FinalizeRestore overwrites the VM record. * fix: defer Validate to after clone name default CloneVMConfigFromFlags called Validate() before prepareClone applied the 'cocoon-clone-<id>' default name, so 'cocoon vm clone <snap>' without --name failed with 'vm name cannot be empty' instead of taking the default. Move Validate to prepareClone, right after the default is filled in. Other Config fields are still pre-validated transitively (snapshot save validates them at write time). * docs: trim comments in clone/restore helpers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Empirical verification during the overnight E2E regression showed that the existing "Clone/restore disk queue count is immutable" entry was misleading: the implication was that `--cpu` does change vCPU count and only disk `num_queues` lags. Actual behavior — cocoon patches `config.json` correctly (`boot_vcpus: 4`, `memory.size: 2147483648`) but the guest still sees the snapshot's original `nproc` and `MemTotal`. CH's `vm.restore` reconstructs the guest from binary memory file and saved CPU state, both sized at snapshot time.
Update the entry to:
No code change.
Test plan