Skip to content

fix(metadata): netplan name fallback for cloudimg clone DHCP recovery#34

Merged
CMGS merged 1 commit intomasterfrom
chore/clone-recovery-cidata
May 7, 2026
Merged

fix(metadata): netplan name fallback for cloudimg clone DHCP recovery#34
CMGS merged 1 commit intomasterfrom
chore/clone-recovery-cidata

Conversation

@CMGS
Copy link
Copy Markdown
Contributor

@CMGS CMGS commented May 7, 2026

Summary

5-line metadata template addition: append a `zfallback` netplan ethernet block (match by name=`e*`, dhcp4: true) after the per-NIC MAC-keyed `idN` blocks.

systemd-networkd evaluates `.network` files in lexicographic order and the first matching file wins. The lexical `z` prefix ensures the MAC-matched `10-netplan-id*.network` files are always picked first when their MAC matches; the name-matched `10-netplan-zfallback.network` only kicks in when none of the `idN` matches — i.e. on a clone whose new MAC isn't in the snapshot's netplan.

Effect

  • Source VMs: no change. `idN.match.macaddress` matches their MAC, takes precedence over zfallback.
  • DHCP-network cloudimg clones: previously had no working network (netplan still keyed to source MAC, no fallback). Now zfallback DHCPs the new interface automatically. Self-healing without manual hints.
  • Static-IP cloudimg clones: unchanged. zfallback wants DHCP but cocoon static networks have no DHCP server, so fallback can't get an address. Still requires console-driven hints (`cloud-init clean + reapply`). This is a network-mode limitation, not a software gap, and out of scope for this PR.

Verified on testbed (35.240.182.52, cocoon master + this fix)

Backend Image Network Result
CH ubuntu cloudimg 22.04 cocoon (static) ✅ src OK, clone unreachable post-clone (expected — no DHCP server to fall back to)
CH ubuntu cloudimg 22.04 dnsmasq-dhcp ✅ src OK; clone DHCPs at 10.99.0.69 via zfallback; reboot persists
CH ubuntu OCI 24.04 cocoon (static) ✅ regression: src+clone+hints+reboot all green (20/20)
CH ubuntu OCI 24.04 dnsmasq-dhcp ✅ regression: src+clone+hints+reboot all green

What was reverted from earlier exploration

I initially also added `r.FirstBooted = false` to FinalizeClone and `dbus` to OCI Dockerfiles, but:

  • FirstBooted=true on clone is intentional per design (the clone IS booted post-restore — it's already running). Reverted.
  • `dbus` is already pulled in transitively by `systemd` in the existing OCI images. The "Failed to connect to bus" warning seen in earlier tests was a timing race (cocoon-agent invoking hostnamectl before dbus.socket was ready), not a missing package. Reverted.

So the final diff is exactly 5 lines.

Test plan

  • go test ./metadata/...
  • go test ./...
  • make lint (linux + darwin)
  • testbed regression: ubuntu OCI 24.04 static + dhcp (20/20)
  • testbed cloudimg 22.04 dhcp: clone auto-recovers via zfallback
  • testbed cloudimg 22.04 static: src OK, clone documented limitation confirmed
  • CI build-os-images
  • (Optional) confirm zfallback doesn't break Windows cloudimg or other guest distros — Windows uses different network-config emission, not exercised here

…covery

Append a "zfallback" netplan ethernet block (match by name=e*, dhcp4:true)
after the per-NIC MAC-keyed ones. The lexicographic 'z' prefix ensures
systemd-networkd processes the MAC-matched 10-netplan-id*.network files
first; the name-matched 10-netplan-zfallback.network only kicks in when
no idN matches — i.e. on a clone whose new MAC isn't in the snapshot's
netplan. DHCP-network cloudimg clones now self-heal at next boot;
static-IP cloudimg clones remain unchanged (no DHCP server to fall back
to is a network-mode limitation, not a software gap).
@CMGS CMGS merged commit be35341 into master May 7, 2026
4 checks passed
@CMGS CMGS deleted the chore/clone-recovery-cidata branch May 7, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant