feat(network): always provision per-VM netns under CNI, even with 0 NICs#44
Merged
Conversation
33d55a3 to
f0cb8e5
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a networking edge case where running/cloning a VM with --nics 0 under CNI would leave the hypervisor process in the host network namespace, preventing later vm net resize --nics N. It does so by making per-VM network namespace provisioning independent of NIC count and by persisting per-VM network identity at the VM level (with backward-compatible fallbacks).
Changes:
- Add
Network.Prepare(...)and call it unconditionally so CNI can provision a per-VM netns even whennics == 0, while bridge remains a no-op. - Promote
net_backend/netns_path/net_bridge_devto VM-level persisted fields and addResolved*()accessors for backward compatibility. - Update hypervisor launch paths and
vm net resizeplumbing selection to use VM-level resolved network identity instead oflen(NetworkConfigs) > 0.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
types/vm.go |
Adds VM-level persisted network identity fields and Resolved*() accessors; introduces types.NetSetup. |
types/vm_test.go |
Adds unit tests for the new Resolved*() accessors. |
README.md |
Updates NIC hot-resize documentation, including 0→N resize behavior under CNI/bridge. |
network/network.go |
Extends the Network interface with Prepare(...). |
network/cni/create.go |
Implements CNI.Prepare to idempotently ensure the per-VM netns exists when configured. |
network/bridge/bridge_other.go |
Implements Prepare for non-Linux bridge builds (unsupported). |
network/bridge/bridge_linux.go |
Implements bridge Prepare as a no-op for host-netns operation. |
KNOWN_ISSUES.md |
Updates documentation to reflect longer eject wait timeout. |
hypervisor/hypervisor.go |
Updates hypervisor interface to accept types.NetSetup for create/clone/direct-clone. |
hypervisor/firecracker/start.go |
Removes withNetwork gating and uses resolved netns path for process launch. |
hypervisor/firecracker/restore.go |
Aligns restore launch with new launchProcess signature. |
hypervisor/firecracker/direct.go |
Updates DirectClone signature to accept types.NetSetup. |
hypervisor/firecracker/create.go |
Updates Create signature to accept types.NetSetup. |
hypervisor/firecracker/clone.go |
Updates Clone flow to accept types.NetSetup and persist VM-level network identity. |
hypervisor/create.go |
Persists VM-level network identity fields during create. |
hypervisor/cloudhypervisor/start.go |
Removes withNetwork gating and uses resolved netns path for process launch. |
hypervisor/cloudhypervisor/restore.go |
Aligns restore launch with new launchProcess signature. |
hypervisor/cloudhypervisor/netresize.go |
Increases NIC eject wait timeout to 30s. |
hypervisor/cloudhypervisor/direct.go |
Updates DirectClone signature to accept types.NetSetup. |
hypervisor/cloudhypervisor/create.go |
Updates Create signature to accept types.NetSetup. |
hypervisor/cloudhypervisor/clone.go |
Updates Clone flow to accept types.NetSetup and persist VM-level network identity. |
hypervisor/clone.go |
Threads types.NetSetup through clone helpers. |
hypervisor/backend.go |
Replaces NetworkConfigs field in CreateSpec with Net types.NetSetup. |
cmd/vm/run.go |
Makes initNetwork always call Prepare and passes types.NetSetup into hypervisor create/clone. |
cmd/vm/netresize.go |
Selects plumbing based on resolved VM-level network identity rather than NIC presence. |
cmd/vm/lifecycle.go |
Uses resolved VM-level network identity for provider selection and adds Prepare to recovery. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f0cb8e5 to
ce50b6e
Compare
CH/FC born from `vm run --nics 0` (or `vm clone --nics 0`) used to land
in the host netns because initNetwork short-circuited and no per-VM
netns was created. A later `vm net resize --nics N` was rejected by
plumbingForVM's zero-NICs guard, because the CH process couldn't see
NICs added into a netns it doesn't live in. The interface implied 0→N
was supported by symmetry but it wasn't — that asymmetry is the bug.
Move netns ownership onto the network module via a new Prepare step:
- network.Network gains Prepare(ctx, vmID, vmCfg) (netnsPath, error).
- CNI.Prepare ensureNetns idempotently and returns the path; "" when no
conflist is configured.
- Bridge.Prepare is a no-op (bridge stays in host netns by design).
- initNetwork always calls Prepare regardless of NIC count.
- Add is only called when nics > 0.
Carry the result via a single types.NetSetup{Backend, NetnsPath,
BridgeDev, NICs} struct. Hypervisor.Create/Clone/DirectClone now take
NetSetup in place of `[]*NetworkConfig` so the same value flows from
the network module to the persisted VM record without a transient
duplicate slot on VMConfig.
Persist NetBackend / NetnsPath / NetBridgeDev at types.VM level (per
VM, not per NIC), with ResolvedNetX() fallbacks for old records that
still carry these on NetworkConfigs[0]. start.go / restore.go /
clone.go read rec.ResolvedNetnsPath() directly; the withNetwork
len(rec.NetworkConfigs) > 0 gate is gone — it conflated "has NICs"
with "needs netns entry", same bug one layer down.
netresize/plumbingForVM and lifecycle/providerForVM dispatch via
vm.ResolvedNetBackend() so a 0-NIC VM picks the right provider on
resize-up; the "zero NICs; resize up not supported" hard error is gone.
Bridge mode was always mechanically capable of 0→N (no netns concept);
it now shares the same code path as CNI without the spurious guard.
README updated to drop the "cannot resize up from zero" caveat. Adds
TestVMResolvedNetFields covering VM-level, NIC[0]-fallback, and empty
cases.
ce50b6e to
bd9b2be
Compare
…ess takes netns directly Senior + /simplify pass on PR #44: - VM.{IsBridge,IsCNI} helpers; Resolved*/ApplyNetSetup are nil-safe - launchProcess(netns) directly drops the partial-VM ferry in CH/FC clone - printPostCloneHints reads vm.NetworkConfigs (no second arg) - initNetwork rolls back on Prepare failure; caches netProvider.Type() - recoverNetwork splits nil/empty-backend checks - Tighten godoc and error strings
Single source of truth for VM network state. Anonymous embed preserves JSON wire format (net_backend, netns_path, net_bridge_dev, network_configs all stay flat on VM). Old records load unchanged. - NetSetup gains JSON tags matching VM's existing field names - VM drops the four duplicate fields; embeds NetSetup - info.ApplyNetSetup(net) → info.NetSetup = net (direct assignment) - setup.NICs → setup.NetworkConfigs (rename in NetSetup)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
cocoon vm run --nics 0(orvm clone --nics 0) under CNI silently put the CH/FC process in the host netns becauseinitNetworkshort-circuited onnics <= 0. A subsequentvm net resize --nics Nwas then rejected byplumbingForVM:This hard-codes an asymmetry users can't see from the CLI surface —
--nics 0looks like a valid lower bound on--nics N, but it actually puts the VM in a different mode (no per-VM netns, can't be lifted back up).Bridge mode was mechanically fine (no netns concept at all) but got blocked by the same guard.
Fix
Move netns ownership entirely into the network module:
network.NetworkgainsPrepare(ctx, vmID, vmCfg) (netnsPath, error).CNI.Prepareidempotently creates the per-VM netns and returns its path. Returns""when no CNI conflist is configured.Bridge.Prepareis a no-op (bridge stays in host netns by design).initNetworkalways callsPrepare, regardless of NIC count, and writes the result ontovmCfg.{NetBackend, NetnsPath, NetBridgeDev}(transientjson:"-"slots).Addis only called whennics > 0.Persistence:
NetBackend/NetnsPath/NetBridgeDevfrom per-NICNetworkConfigto VM-leveltypes.VM. They are the per-VM identity of "which netns CH lives in" — were always denormalized onto NIC[0]. The persisted JSON keeps the new top-level fields (net_backend,netns_path,net_bridge_dev).Resolved*()accessors fall back toNetworkConfigs[0]for pre-PR records so existing VMs and snapshots keep working without migration.Hypervisor consumers:
CH.launchProcess/FC.launchProcessno longer take thewithNetwork boolparameter — thelen(rec.NetworkConfigs) > 0gate conflated "has NICs" with "needs netns entry", which is the same bug at the hypervisor layer. They readrec.ResolvedNetnsPath()instead.cmd/vm/netresize.plumbingForVMandcmd/vm/lifecycle.providerForVMdispatch viavm.ResolvedNetBackend()and readvm.ResolvedNetBridgeDev()for bridge mode; the zero-NICs hard error is gone.Result
vm run --nics 0(CNI)cocoon-<vmID>netnsvm run --nics 0(bridge)vm run --nics N(CNI)cocoon-<vmID>netnsvm net resize --nics Non 0-NIC CNI VMvm net resize --nics Non 0-NIC bridge VMvm net resize --nics 0(any)Test plan
make lintlinux + darwin: 0 issuesgo test ./...: greencocoon vm run --nics 0under CNI →ip netns ls | grep cocoon-shows the netns;cocoon vm net resize --nics 2succeeds and NICs land in the right netns--bridge cni0→ no netns, TAPs land on host bridge