Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion KNOWN_ISSUES.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ Control-plane traffic from cocoon-managed hosts (vk-cocoon, `cocoon vm exec`) go

## NIC hot-remove leaves the PCI slot pending on Cloud Hypervisor

`cocoon vm net --nics N` waits for CH's `device_tree` to drop the removed device (polled via `vm.info` until the entry disappears, max 10 s) before tearing down host plumbing. CH only frees the PCI slot — and unregisters the device's ioeventfds — when the guest writes to the ACPI hot-plug controller (B0EJ) in response to the SCI raised by `remove-device`. If the guest never ACKs (driver wedged, paused, NDIS halted), the wait times out, the cocoon record is left intact, and the command returns an error so the user can quiesce the guest and retry.
`cocoon vm net --nics N` waits for CH's `device_tree` to drop the removed device (polled via `vm.info` until the entry disappears, max 30 s) before tearing down host plumbing. CH only frees the PCI slot — and unregisters the device's ioeventfds — when the guest writes to the ACPI hot-plug controller (B0EJ) in response to the SCI raised by `remove-device`. If the guest never ACKs (driver wedged, paused, NDIS halted), the wait times out, the cocoon record is left intact, and the command returns an error so the user can quiesce the guest and retry.

If the host plumbing teardown itself fails after a successful eject, cocoon truncates the record anyway and surfaces a warning — the orphan TAP / veth / CNI lease is cleaned up by `cocoon vm rm` or the next gc cycle. The user still has to consider:

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -547,9 +547,9 @@ cocoon vm net my-vm --nics 2
cocoon vm net my-vm --nics 1
```

Cocoon manages **host-side** plumbing only. CH's `vm.remove-device` marks the slot for ejection but the actual eject only happens when the guest cooperates via ACPI (B0EJ write). The host TAP / veth / CNI lease are torn down immediately after the API call regardless. Quiesce in-guest NIC state (driver unbind, NetworkManager removal, Windows NDIS halt) **before** reducing the count, or the in-guest driver will reference plumbing that no longer exists.
On NIC removal, cocoon waits for the guest to ACK B0EJ (CH polls `device_tree` until the device disappears) before tearing down the host TAP / veth / CNI lease. If the guest never ACKs within the eject timeout, the command fails and leaves the cocoon record + host plumbing intact so the operator can quiesce the guest (driver unbind, NetworkManager removal, Windows NDIS halt) and retry.

A VM started with zero NICs cannot be resized up — CH was launched in the host netns (no `NetworkConfigs` to derive a per-VM netns from), so later plumbing can't reach it. To recover networking on a 0-NIC snapshot, clone with `cocoon vm clone --nics 1 --network <conflist>` (or `--bridge <dev>`): the clone starts with NICs from the start, putting CH in the right netns from boot.
Resize from zero is supported: under CNI, `--nics 0` still provisions a per-VM netns at boot (CH lives in it from the start), so a later `cocoon vm net --nics N` hot-plugs into the same namespace. Bridge mode keeps CH in the host netns regardless of NIC count, so 0→N adds TAPs onto the configured bridge.

## Windows Support

Expand Down
41 changes: 27 additions & 14 deletions cmd/vm/lifecycle.go
Original file line number Diff line number Diff line change
Expand Up @@ -373,11 +373,18 @@ func (h Handler) recoverNetwork(ctx context.Context, conf *config.Config, hyper

for _, ref := range refs {
vm := byID[ref]
if vm == nil || len(vm.NetworkConfigs) == 0 {
if vm == nil {
continue
}

netProvider, provErr := providerForVM(conf, cniProvider, bridgeProviders, vm.NetworkConfigs)
backend := vm.ResolvedNetBackend()
if backend == "" {
continue
}
// Bridge 0-NIC: no TAP, no netns — nothing to recover.
if backend == types.BackendBridge && len(vm.NetworkConfigs) == 0 {
continue
}
netProvider, provErr := providerForVM(conf, cniProvider, bridgeProviders, vm)
if provErr != nil {
logger.Warnf(ctx, "skip recovery for VM %s: %v", vm.ID, provErr)
continue
Expand All @@ -386,31 +393,37 @@ func (h Handler) recoverNetwork(ctx context.Context, conf *config.Config, hyper
continue
}
logger.Warnf(ctx, "network missing for VM %s, recovering", vm.ID)
if _, prepErr := netProvider.Prepare(ctx, vm.ID, &vm.Config); prepErr != nil {
logger.Warnf(ctx, "prepare netns for VM %s: %v (start will fail)", vm.ID, prepErr)
continue
}
if len(vm.NetworkConfigs) == 0 {
Comment thread
CMGS marked this conversation as resolved.
continue
}
if _, recoverErr := netProvider.Add(ctx, vm.ID, &vm.Config, network.AddRecover(vm.NetworkConfigs)...); recoverErr != nil {
logger.Warnf(ctx, "recover network for VM %s: %v (start will fail)", vm.ID, recoverErr)
}
}
}

// providerForVM picks the network provider from persisted NetworkConfig; cniProvider may be nil (lazy-init), bridgeCache must be non-nil.
func providerForVM(conf *config.Config, cniProvider network.Network, bridgeCache map[string]network.Network, configs []*types.NetworkConfig) (network.Network, error) {
if len(configs) == 0 {
return nil, fmt.Errorf("no network configs")
// providerForVM picks the provider from VM state. cniProvider may be nil; bridgeCache must be non-nil.
func providerForVM(conf *config.Config, cniProvider network.Network, bridgeCache map[string]network.Network, vm *types.VM) (network.Network, error) {
if vm == nil {
return nil, fmt.Errorf("no VM record")
}
// All NICs on a VM share the same backend.
cfg := configs[0]
if cfg.Backend == types.BackendBridge {
if cfg.BridgeDev == "" {
if vm.ResolvedNetBackend() == types.BackendBridge {
dev := vm.ResolvedNetBridgeDev()
if dev == "" {
return nil, fmt.Errorf("bridge backend but no bridge device persisted")
}
if cached, ok := bridgeCache[cfg.BridgeDev]; ok {
if cached, ok := bridgeCache[dev]; ok {
return cached, nil
}
p, err := cmdcore.InitBridgeNetwork(conf, cfg.BridgeDev)
p, err := cmdcore.InitBridgeNetwork(conf, dev)
if err != nil {
return nil, err
}
bridgeCache[cfg.BridgeDev] = p
bridgeCache[dev] = p
return p, nil
}
// "cni" or empty (backward compat).
Expand Down
16 changes: 10 additions & 6 deletions cmd/vm/netresize.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ func (h Handler) NetResize(cmd *cobra.Command, args []string) error {
if err != nil {
return fmt.Errorf("vm net: %w", err)
}
plumbing, err := plumbingForVM(conf, vm.NetworkConfigs)
plumbing, err := plumbingForVM(conf, vm)
if err != nil {
return fmt.Errorf("vm net: %w", err)
}
Expand All @@ -43,10 +43,14 @@ func (h Handler) NetResize(cmd *cobra.Command, args []string) error {
return nil
}

// plumbingForVM picks the provider matching the VM's existing NICs; zero NICs is fatal (use `vm clone --nics N` instead).
func plumbingForVM(conf *config.Config, configs []*types.NetworkConfig) (network.Network, error) {
if len(configs) == 0 {
return nil, fmt.Errorf("zero NICs; resize up not supported (use `vm clone --nics N` instead)")
// plumbingForVM picks the provider from persisted VM state; 0-NIC works because NetBackend persists.
func plumbingForVM(conf *config.Config, vm *types.VM) (network.Network, error) {
backend := vm.ResolvedNetBackend()
if backend == "" {
return nil, fmt.Errorf("no network backend on VM; cannot resize")
}
return providerForVM(conf, nil, map[string]network.Network{}, configs)
if backend == types.BackendCNI && vm.ResolvedNetnsPath() == "" {
return nil, fmt.Errorf("CNI backend but no netns; resize would target host netns")
}
return providerForVM(conf, nil, map[string]network.Network{}, vm)
}
72 changes: 42 additions & 30 deletions cmd/vm/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,14 +122,14 @@ func (h Handler) Clone(cmd *cobra.Command, args []string) error {
})
defer stop()

vmCfg, vmID, netProvider, networkConfigs, err := h.prepareClone(ctx, cmd, conf, cfg)
vmCfg, vmID, netProvider, netSetup, err := h.prepareClone(ctx, cmd, conf, cfg)
if err != nil {
return err
}

logger.Infof(ctx, "cloning VM from snapshot %s ...", snapRef)

vm, cloneErr := hyper.Clone(ctx, vmID, vmCfg, networkConfigs, &cfg, stream)
vm, cloneErr := hyper.Clone(ctx, vmID, vmCfg, netSetup, &cfg, stream)
if cloneErr != nil {
rollbackNetwork(ctx, netProvider, vmID)
return fmt.Errorf("clone VM: %w", cloneErr)
Expand All @@ -139,7 +139,7 @@ func (h Handler) Clone(cmd *cobra.Command, args []string) error {
return jsonErr
}
logger.Infof(ctx, "VM cloned: %s (name: %s)", vm.ID, vm.Config.Name)
printPostCloneHints(vm, networkConfigs)
printPostCloneHints(vm)
return nil
}

Expand Down Expand Up @@ -285,7 +285,7 @@ func (h Handler) cloneFromDir(ctx context.Context, cmd *cobra.Command, conf *con
}

func (h Handler) cloneFromSrcDir(ctx context.Context, cmd *cobra.Command, conf *config.Config, dcr hypervisor.Direct, cfg types.SnapshotConfig, srcDir, sourceLabel string, logger *log.Fields) error {
vmCfg, vmID, netProvider, networkConfigs, err := h.prepareClone(ctx, cmd, conf, cfg)
vmCfg, vmID, netProvider, netSetup, err := h.prepareClone(ctx, cmd, conf, cfg)
if err != nil {
return err
}
Expand All @@ -295,7 +295,7 @@ func (h Handler) cloneFromSrcDir(ctx context.Context, cmd *cobra.Command, conf *
logger.Infof(ctx, "cloning VM from %s ...", sourceLabel)
}

vm, cloneErr := dcr.DirectClone(ctx, vmID, vmCfg, networkConfigs, &cfg, srcDir)
vm, cloneErr := dcr.DirectClone(ctx, vmID, vmCfg, netSetup, &cfg, srcDir)
if cloneErr != nil {
rollbackNetwork(ctx, netProvider, vmID)
return fmt.Errorf("clone VM: %w", cloneErr)
Expand All @@ -305,7 +305,7 @@ func (h Handler) cloneFromSrcDir(ctx context.Context, cmd *cobra.Command, conf *
return cmdcore.OutputJSON(vm)
}
logger.Infof(ctx, "VM cloned: %s (name: %s)", vm.ID, vm.Config.Name)
printPostCloneHints(vm, networkConfigs)
printPostCloneHints(vm)
return nil
}

Expand All @@ -328,24 +328,24 @@ func snapshotSource(cmd *cobra.Command, args []string, baseArgs int) (string, st
return "", args[baseArgs], nil
}

func (h Handler) prepareClone(ctx context.Context, cmd *cobra.Command, conf *config.Config, cfg types.SnapshotConfig) (*types.VMConfig, string, network.Network, []*types.NetworkConfig, error) {
func (h Handler) prepareClone(ctx context.Context, cmd *cobra.Command, conf *config.Config, cfg types.SnapshotConfig) (*types.VMConfig, string, network.Network, types.NetSetup, error) {
vmCfg, err := cmdcore.CloneVMConfigFromFlags(cmd, cfg)
if err != nil {
return nil, "", nil, nil, err
return nil, "", nil, types.NetSetup{}, err
}
vmID := utils.GenerateID()
if vmCfg.Name == "" {
vmCfg.Name = "cocoon-clone-" + network.VMIDPrefix(vmID)
}
if err = vmCfg.Validate(); err != nil {
return nil, "", nil, nil, err
return nil, "", nil, types.NetSetup{}, err
}

// Auto-pull base image if --pull is set (cross-node clone).
if pull, _ := cmd.Flags().GetBool("pull"); pull && vmCfg.Image != "" && vmCfg.ImageType != "" {
backends, initErr := cmdcore.InitImageBackends(ctx, conf)
if initErr != nil {
return nil, "", nil, nil, fmt.Errorf("init image backends: %w", initErr)
return nil, "", nil, types.NetSetup{}, fmt.Errorf("init image backends: %w", initErr)
}
cmdcore.EnsureImage(ctx, backends, vmCfg)
}
Expand All @@ -354,16 +354,16 @@ func (h Handler) prepareClone(ctx context.Context, cmd *cobra.Command, conf *con
nics := cfg.NICs
if cmd.Flags().Changed("nics") {
if conf.UseFirecracker {
return nil, "", nil, nil, fmt.Errorf("--nics override on clone is Cloud Hypervisor only (FC network_overrides retargets existing NICs, not resize)")
return nil, "", nil, types.NetSetup{}, fmt.Errorf("--nics override on clone is Cloud Hypervisor only (FC network_overrides retargets existing NICs, not resize)")
}
nics, _ = cmd.Flags().GetInt("nics")
}
netProvider, networkConfigs, err := initNetwork(ctx, conf, vmID, nics, vmCfg, tapQueues(vmCfg.CPU, conf.UseFirecracker), bridgeDev)
netProvider, netSetup, err := initNetwork(ctx, conf, vmID, nics, vmCfg, tapQueues(vmCfg.CPU, conf.UseFirecracker), bridgeDev)
if err != nil {
return nil, "", nil, nil, err
return nil, "", nil, types.NetSetup{}, err
}

return vmCfg, vmID, netProvider, networkConfigs, nil
return vmCfg, vmID, netProvider, netSetup, nil
}

func (h Handler) restoreDirect(ctx context.Context, cmd *cobra.Command, snapRef, vmRef string, vmCfg *types.VMConfig, snapBackend snapshot.Snapshot, hyper hypervisor.Hypervisor, logger *log.Fields) (bool, error) {
Expand Down Expand Up @@ -448,12 +448,12 @@ func (h Handler) createVM(cmd *cobra.Command, image string) (context.Context, *t
vmID := utils.GenerateID()

nics, _ := cmd.Flags().GetInt("nics")
netProvider, networkConfigs, err := initNetwork(ctx, conf, vmID, nics, vmCfg, tapQueues(vmCfg.CPU, conf.UseFirecracker), bridgeDev)
netProvider, netSetup, err := initNetwork(ctx, conf, vmID, nics, vmCfg, tapQueues(vmCfg.CPU, conf.UseFirecracker), bridgeDev)
if err != nil {
return nil, nil, nil, err
}

info, createErr := hyper.Create(ctx, vmID, vmCfg, storageConfigs, networkConfigs, bootCfg)
info, createErr := hyper.Create(ctx, vmID, vmCfg, storageConfigs, netSetup, bootCfg)
if createErr != nil {
rollbackNetwork(ctx, netProvider, vmID)
return nil, nil, nil, fmt.Errorf("create VM: %w", createErr)
Expand All @@ -469,10 +469,7 @@ func tapQueues(cpu int, useFC bool) int {
return cpu
}

func initNetwork(ctx context.Context, conf *config.Config, vmID string, nics int, vmCfg *types.VMConfig, queues int, bridgeDev string) (network.Network, []*types.NetworkConfig, error) {
if nics <= 0 {
return nil, nil, nil
}
func initNetwork(ctx context.Context, conf *config.Config, vmID string, nics int, vmCfg *types.VMConfig, queues int, bridgeDev string) (network.Network, types.NetSetup, error) {
var netProvider network.Network
var err error
if bridgeDev != "" {
Expand All @@ -481,18 +478,33 @@ func initNetwork(ctx context.Context, conf *config.Config, vmID string, nics int
netProvider, err = cmdcore.InitNetwork(conf)
}
if err != nil {
return nil, nil, fmt.Errorf("init network: %w", err)
return nil, types.NetSetup{}, fmt.Errorf("init network: %w", err)
}
nsPath, err := netProvider.Prepare(ctx, vmID, vmCfg)
if err != nil {
rollbackNetwork(ctx, netProvider, vmID)
return nil, types.NetSetup{}, fmt.Errorf("prepare network: %w", err)
}
// Override CPU for TAP queue count — FC uses single-queue, CH uses per-vCPU queues.
// The network layer derives TAP queues from vmCfg.CPU.
backend := netProvider.Type()
// CNI no-conflist + 0 NICs runs in host netns; empty backend so resize won't mispick CNI.
if nics <= 0 && backend == types.BackendCNI && nsPath == "" {
return netProvider, types.NetSetup{}, nil
}
setup := types.NetSetup{NetBackend: backend, NetnsPath: nsPath, NetBridgeDev: bridgeDev}
if nics <= 0 {
return netProvider, setup, nil
}
// Override CPU for TAP queue count (FC=1, CH=per-vCPU); network reads vmCfg.CPU.
origCPU := vmCfg.CPU
vmCfg.CPU = queues
configs, err := netProvider.Add(ctx, vmID, vmCfg, network.AddRange(0, nics)...)
vmCfg.CPU = origCPU
if err != nil {
Comment thread
CMGS marked this conversation as resolved.
return nil, nil, fmt.Errorf("configure network: %w", err)
rollbackNetwork(ctx, netProvider, vmID)
return nil, types.NetSetup{}, fmt.Errorf("configure network: %w", err)
}
return netProvider, configs, nil
setup.NetworkConfigs = configs
return netProvider, setup, nil
}

func rollbackNetwork(ctx context.Context, netProvider network.Network, vmID string) {
Expand All @@ -504,7 +516,7 @@ func rollbackNetwork(ctx context.Context, netProvider network.Network, vmID stri
}
}

func printPostCloneHints(vm *types.VM, networkConfigs []*types.NetworkConfig) {
func printPostCloneHints(vm *types.VM) {
if vm.Config.Windows {
fmt.Println()
fmt.Println("Windows clone: NICs hot-swapped with new MAC addresses.")
Expand All @@ -530,7 +542,7 @@ func printPostCloneHints(vm *types.VM, networkConfigs []*types.NetworkConfig) {
// FC clone: guest MAC is baked in vmstate (source VM's MAC).
// Must change guest MAC before networkd config takes effect.
if vm.Hypervisor == string(config.HypervisorFirecracker) {
printFCMACHints(networkConfigs)
printFCMACHints(vm.NetworkConfigs)
}

fmt.Println()
Expand All @@ -540,7 +552,7 @@ func printPostCloneHints(vm *types.VM, networkConfigs []*types.NetworkConfig) {
if isCloudimg {
printCloudimgNetworkHints()
} else {
printOCINetworkHints(vm, networkConfigs)
printOCINetworkHints(vm)
}
fmt.Println()
}
Expand All @@ -563,14 +575,14 @@ func printCloudimgNetworkHints() {
fmt.Println(" cloud-init modules --mode=config && systemctl restart systemd-networkd")
}

func printOCINetworkHints(vm *types.VM, networkConfigs []*types.NetworkConfig) {
func printOCINetworkHints(vm *types.VM) {
fmt.Println()
fmt.Printf(" # Set hostname\n")
fmt.Printf(" hostnamectl set-hostname %s\n", vm.Config.Name)

var staticNICs []nicHint
var dhcpMACs []string
for _, nc := range networkConfigs {
for _, nc := range vm.NetworkConfigs {
if nc == nil || nc.MAC == "" {
continue
}
Expand Down
8 changes: 5 additions & 3 deletions cmd/vm/status_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,11 @@ func TestVMIPsAndSort(t *testing.T) {
},
},
CreatedAt: now.Add(-time.Minute),
NetworkConfigs: []*types.NetworkConfig{
{Network: &types.Network{IP: "10.0.0.2"}},
{Network: &types.Network{IP: "10.0.0.3"}},
NetSetup: types.NetSetup{
NetworkConfigs: []*types.NetworkConfig{
{Network: &types.Network{IP: "10.0.0.2"}},
{Network: &types.Network{IP: "10.0.0.3"}},
},
},
},
}
Expand Down
2 changes: 1 addition & 1 deletion hypervisor/backend.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ type DirectRestoreSpec struct {
type CreateSpec struct {
VMCfg *types.VMConfig
StorageConfigs []*types.StorageConfig
NetworkConfigs []*types.NetworkConfig
Net types.NetSetup
BootConfig *types.BootConfig
Prepare func(ctx context.Context, vmID string, vmCfg *types.VMConfig, storageConfigs []*types.StorageConfig, networkConfigs []*types.NetworkConfig, boot *types.BootConfig) ([]*types.StorageConfig, error)
}
Expand Down
12 changes: 6 additions & 6 deletions hypervisor/clone.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ func (b *Backend) CloneSetup(ctx context.Context, vmID string, vmCfg *types.VMCo
// snapshot lives on the same host (no tar streaming needed).
func (b *Backend) DirectCloneBase(
ctx context.Context, vmID string, vmCfg *types.VMConfig,
networkConfigs []*types.NetworkConfig, snapshotConfig *types.SnapshotConfig, srcDir string,
net types.NetSetup, snapshotConfig *types.SnapshotConfig, srcDir string,
cloneFiles func(dstDir, srcDir string) error,
afterExtract func(ctx context.Context, vmID string, vmCfg *types.VMConfig, networkConfigs []*types.NetworkConfig, runDir, logDir string, now time.Time) (*types.VM, error),
afterExtract func(ctx context.Context, vmID string, vmCfg *types.VMConfig, net types.NetSetup, runDir, logDir string, now time.Time) (*types.VM, error),
) (_ *types.VM, err error) {
runDir, logDir, now, cleanup, err := b.CloneSetup(ctx, vmID, vmCfg, snapshotConfig)
if err != nil {
Expand All @@ -57,15 +57,15 @@ func (b *Backend) DirectCloneBase(
return nil, fmt.Errorf("clone snapshot files: %w", err)
}

return afterExtract(ctx, vmID, vmCfg, networkConfigs, runDir, logDir, now)
return afterExtract(ctx, vmID, vmCfg, net, runDir, logDir, now)
}

// CloneFromStream clones from a tar stream into a fresh runDir. Used when
// the snapshot arrives over the network (cross-node clone).
func (b *Backend) CloneFromStream(
ctx context.Context, vmID string, vmCfg *types.VMConfig,
networkConfigs []*types.NetworkConfig, snapshotConfig *types.SnapshotConfig, snapshot io.Reader,
afterExtract func(ctx context.Context, vmID string, vmCfg *types.VMConfig, networkConfigs []*types.NetworkConfig, runDir, logDir string, now time.Time) (*types.VM, error),
net types.NetSetup, snapshotConfig *types.SnapshotConfig, snapshot io.Reader,
afterExtract func(ctx context.Context, vmID string, vmCfg *types.VMConfig, net types.NetSetup, runDir, logDir string, now time.Time) (*types.VM, error),
) (_ *types.VM, err error) {
runDir, logDir, now, cleanup, err := b.CloneSetup(ctx, vmID, vmCfg, snapshotConfig)
if err != nil {
Expand All @@ -81,7 +81,7 @@ func (b *Backend) CloneFromStream(
return nil, fmt.Errorf("extract snapshot: %w", err)
}

return afterExtract(ctx, vmID, vmCfg, networkConfigs, runDir, logDir, now)
return afterExtract(ctx, vmID, vmCfg, net, runDir, logDir, now)
}

// FinalizeClone updates the cloned VM's record in place after restore-and-resume.
Expand Down
Loading
Loading