Skip to content

appliance primitives#19

Merged
solsson merged 3 commits intomainfrom
appliance-primitives
May 4, 2026
Merged

appliance primitives#19
solsson merged 3 commits intomainfrom
appliance-primitives

Conversation

@solsson
Copy link
Copy Markdown
Contributor

@solsson solsson commented May 4, 2026

stuff we used to do in bash

Yolean k8s-qa and others added 3 commits May 4, 2026 14:22
  - lifecycle subcommands: pause / resume / stop / start, plus
    graceful Stop on docker / multipass / qemu (qemu via ssh
    poweroff with SIGKILL fallback).
  - cross-provisioner preflight: host-port and kubeconfig-context
    collision checks before provision.
  - prepare-export via virt-customize + a shared in-guest script
    (identity reset, netplan generic-NIC match, cloud-init clean,
    fstrim, systemd-timesyncd enable). Same script also runs
    inline on Hetzner Packer.
  - export <bundle-dir> --format=qcow2|raw|vmdk|ova|gcp-tar.
    --vmdk-subformat picks streamOptimized (ESXi) vs
    monolithicSparse (VirtualBox-friendly). gcp-tar produces the
    tar.gz-of-disk.raw layout Compute Engine custom images
    expect, with FormatGNU pinning to avoid PAX header rejection
    for >8 GiB disks.
  - bundled local-path-provisioner with appliance defaults
    (path /data/yolean, namespace_name pattern, Retain reclaim).
    Wired into qemu / docker / multipass at provision time.
  - bundled echo workload using the envoy echo filter.
  - state sidecar so cluster.Lookup picks the right qemu SSH
    port (the field already landed in #18; this commit adds the
    sidecar that produces it).
  - first-boot data-seed via y-cluster-data-seed.service so an
    appliance shipped with /data/yolean populated re-creates that
    data on the customer's empty drive at first boot. Marker-
    based four-state decision (no-op / seed / conflict-bail);
    k3s.service Requires= drop-in blocks startup on conflict.
  - manifests staging: `y-cluster manifests add <name> <path|->`
    writes to /var/lib/y-cluster/manifests-staging/ on the
    cluster node; prepare-export moves the staged dir to k3s's
    auto-apply path so the customer's first boot runs them.
  - cluster.RunShell helper for arbitrary node command execution
    (used by manifests-add; available to future subcommands).
  - appliance-stateful test fixture split into yconverge namespace
    + workload modules to demonstrate the cue dep contract.

Test coverage: every Go package has unit tests; embedded shell
scripts (data_seed_check.sh) have driver tests that exercise
the marker / conflict / no-op state machine. e2e/qemu_test.go
exercises the full provision -> prepare-export -> export flow.
The bash e2e drivers under scripts/ are NOT in this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues caught by golangci-lint v2.11.4 on PR #19:

- pkg/provision/docker/docker.go: defer cli.Close() drops the
  error return. Wrap in a func to silence errcheck, matching the
  pattern already used elsewhere in the file (CheckPrerequisites,
  Provision).
- pkg/cluster/lookup.go: ResolveClusterName passed a literal nil
  to readClusterName's context.Context parameter (SA1012).
  context.TODO() is the documented placeholder when the caller
  has no real ctx to plumb.
- pkg/provision/qemu/export.go: convert the if/else if/else chain
  on opts.Format into a tagged switch (QF1003). Pure refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two gaps in the appliance-primitives PR's own coverage that the
existing e2e suite does not address:

- pkg/provision/docker.Stop ships with no e2e gate. TestDocker_Stop
  proves it preserves the container as exited (not removed) and
  that the host-side apiserver port forward dies with the
  container -- mirroring TestQemu_StopStart's regression posture
  for the docker half of the appliance lifecycle.
- testdata/appliance-stateful/ ships referenced only by an
  export.go comment. TestDocker_ApplianceStateful makes the
  fixture useful: provision -> yconverge the namespace + base
  modules -> assert the StatefulSet rolls out (via the cue
  rollout check) and the PVC binds against k3s's bundled
  local-path provisioner. Closes the localstorage runtime gap
  that pkg/provision/localstorage/install_test.go can't reach.

Both are //go:build e2e && docker, run on ubuntu-latest in CI's
existing e2e job. Local runtime ~70s + ~80s respectively.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@solsson
Copy link
Copy Markdown
Contributor Author

solsson commented May 4, 2026

Two coverage additions pushed to this branch as 5e4830f:

  • TestDocker_Stop — exercises the new pkg/provision/docker.Stop (preserves container as exited, breaks the host port forward).
  • TestDocker_ApplianceStateful — converges testdata/appliance-stateful/ end-to-end, asserts the StatefulSet rolls out and the PVC binds via k3s's local-path provisioner. Both pass locally; the fixture ships otherwise unused.

A few open questions / follow-up items I'd flag rather than block on:

Q1 (asymmetry). pkg/provision/qemu/lifecycle.go exports both Start and Stop; pkg/provision/docker/docker.go:423 exports only Stop. Is the docker Start half intentionally deferred (the docstring on docker.Stop says "the y-cluster equivalent isn't wired yet for the docker backend"), or worth filing as a tracking issue so the lifecycle is symmetric across providers? TestDocker_Stop will need a _StopStart sibling once Start lands.

Q2 (released-binary smoke for new subcommands). The new start, stop, manifests, localstorage, echo subcommands are unit-tested in-process but never exercised through the released binary. The same risk class as #16 — bugs that only manifest in the released-binary-from-bash invocation. Worth a follow-up CI job that downloads the freshly-built linux/amd64 artifact and smoke-runs each new subcommand. Cheap; reuses the artifact build (linux, amd64) already uploads.

Q3 (qemu export structural validation on cron). TestQemu_ExportImport and the new export unit tests cover the format invariants, but no GHA-hosted lane verifies the produced .ova/.vmdk/.raw/.gcp-tar artifacts are structurally valid (qemu-img info, tar -tvf, OVF schema check). A nightly cron job using TCG (slow but no /dev/kvm needed) would catch artifact-shape regressions without requiring real ESXi/cloud targets. Boot-test against real targets is genuinely out of GHA-hosted scope and would need cloud creds.

Q4 (preflight at provision time). preflight_test.go covers the matrix of failure modes in isolation. Are the preflight checks actually invoked at the start of Provision (docker / qemu / multipass)? If yes, a one-line comment in the test would make that obvious; if no, that's a wiring gap.

@solsson
Copy link
Copy Markdown
Contributor Author

solsson commented May 4, 2026

replies from maintainer agent:

Q1 (docker Start). Not deferred for any architectural reason, just unimplemented — file as a tracking issue and TestDocker_Stop grows a _StopStart sibling once it lands.

Q2 (released-binary smoke). Agree — same risk class as #16. Cheap follow-up CI job that downloads the existing build (linux, amd64) artifact and --help / dry-runs each new subcommand. Tracking issue.

Q3 (export structural validation on cron). Agree. Nightly TCG lane, no /dev/kvm: qemu-img info on .qcow2/.raw/.vmdk, tar -tvf on .ova/.gcp-tar, OVF schema check on the .ova manifest. Not boot-test (that needs real targets). Tracking issue.

Q4 (preflight at provision time). Already wired:

  • pkg/provision/docker/docker.go:118 — HostPorts + ContextName + ContextCluster + KubeconfigPath, before kubeconfig and CheckPrerequisites.
  • pkg/provision/qemu/qemu.go:210 — same shape, HostPorts via preflightHostPorts(cfg).
  • pkg/provision/multipass/multipass.go:123 — context check only (multipass binds on the bridge, not host ports).

The gap is purely documentation — preflight_test.go doesn't link to its callers. Happy to push a one-line // Wired into <pkg>.Provision header on this branch before merge.

@solsson solsson merged commit c2fc073 into main May 4, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant