Appliance primitives followup#21
Merged
Merged
Conversation
…E doc
Two squashed commits:
- adds APPLIANCE_MAINTENANCE.md (the appliance maintenance design doc)
- implements the mount-required seed gate so the data-seed unit
fails fast when /data/yolean is not yet mounted, the cloud-init
bypass on customer first boot, and aligned k3s.service drop-in.
Fields:
- pkg/provision/qemu/data_seed.{service,go} + data_seed_check.sh +
k3s_data_seed.conf -- the seed unit + check script + k3s drop-in
- pkg/provision/qemu/prepare_inguest.sh -- cloud-init bypass
- pkg/provision/qemu/lifecycle.go + prepare_export_test.go +
data_seed_test.go -- coverage and diagnostic-start path
- cmd/y-cluster/manifests.go -- staged-manifest auto-apply at
customer first boot
- e2e/qemu_test.go -- e2e cover for the mount-required gate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a typed gateway check alongside wait / rollout / exec. Solves
the false-positive class that motivated specs/y-cluster/
FEATURE_REQUEST_HTTP_CHECK.md: an `exec curl ... | grep 302`
silently passes against the wrong cluster when /etc/hosts /
upstream DNS resolves the configured hostname to a remote IP that
also happens to redirect.
Why "gateway" and not "http"
The check's value comes from auto-discovering the local Gateway's
programmed address (Gateway.status.addresses) and pinning curl's
dial target via --resolve, so the request actually traverses
Gateway -> HTTPRoute -> backend instead of whatever the host's
resolver thinks the hostname means. Calling that "http" would
invite authors to expect a generic curl wrapper. Naming it
"gateway" makes the contract explicit and lets the runtime stay
opinionated.
How it runs
- discoverGatewayAddress shells out to `kubectl get gateway -A
-o json`, filters by gatewayClassName (or accepts any if
empty), picks the first programmed Status.Addresses[].Value.
- runGatewayProbe launches `kubectl run --rm --restart=Never
--image=curlimages/curl:8.10.1` with curl args:
--resolve <host>:<port>:<gateway-addr>
-H "Host: <host>"
-w 'HTTP_CODE:%{http_code}\nLOCATION:%{redirect_url}\n'
The Pod runs in the cluster's network so cluster DNS / Service
routing reflect the real consumer-side path. --resolve forces
the dial target so the probe can't accidentally hit prod even
if upstream DNS resolves the hostname to a public IP.
- parseGatewayProbeOutput + validateGatewayProbeResult are
pure functions on the curl -w output; unit tests cover them
without spinning up kubectl.
Cue surface
#Gateway lands as the fourth variant of #Check:
kind: "gateway"
url: string
expectCode: *[200] | [...int]
expectLocation?: string
resolve?: string
gatewayClassName?: string
timeout: *"60s" | string
description: *"" | string
expectCode is always a list -- single-status callers write
[302]. expectLocation is a Go regexp matched against the
response Location header (the canonical use case: pin the
oauth2 redirect realm).
Coverage
- parseGatewayProbeOutput: happy path, missing code, malformed,
no-Location.
- validateGatewayProbeResult: default code, list match, regex
match, regex compile failure.
- splitURLHostPort: http/https default ports, explicit port,
IPv6, missing host, unsupported scheme.
- pickGatewayAddress: class-narrowed, any-class, no-match,
none-programmed.
- ParseChecks_GatewayCheck: cue -> Go round-trip with the
canonical 302 + Location-regex shape.
- ParseChecks_GatewayCheck_DefaultCode: omitted expectCode
yields [200].
Deferred to follow-ups
- method / headers / expectBody / tlsInsecure (canonical
redirect reproducer doesn't need them).
- e2e against a real cluster (would need an envoy-gateway lane
in the e2e suite).
- Migrating existing exec-curl-grep checks in checkit /
cluster-local to kind: "gateway".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original TestQemu_Seed_GateAndBypass exercised states 3 (gate
fires when no volume is attached), 4 (bypass extracts), and 7
(sshd unaffected) -- the regression-posture half of the seed
contract. The production happy path -- a labeled `y-cluster-data`
volume attached at boot, the pre-baked LABEL fstab entry mounting
it, and the upgrade fast path on the next boot -- was only covered
manually via the GCP appliance run.
The GCP test caught three real bugs (label mismatch, e2label
idempotency, boot-disk-size constraint) that no other test caught,
so the gap was real -- but GCP costs money + minutes per
iteration.
TestQemu_Seed_VolumeAttached closes that gap deterministically
and locally:
* Provision + plant sentinel + stop + prepare-export (~2 min,
same shape as the existing TestQemu_Seed_GateAndBypass setup).
* Build a labeled ext4 qcow2 via `qemu-img create` +
`virt-format --filesystem=ext4 --label=y-cluster-data`.
* Boot 1 (state 1): StartForDiagnosticWithDisks attaches the
labeled disk, fstab mounts it, seed unit sees mountpoint +
only lost+found + extracts. Assert /data/yolean is a
mountpoint, sentinel restored, marker present, NO bypass
sentinel, k3s reaches Ready via Requires=.
* Boot 2 (state 5): same disk, marker now present from boot 1.
Assert seed unit reaches active via the marker-respect path
(journal mentions "marker present", does NOT mention
"extracting"), sentinel unchanged, k3s reaches Ready.
Supporting plumbing:
* `qemu.StartForDiagnosticWithDisks(ctx, cacheDir, name,
extraDisks, logger)` -- the diagnostic boot path with extra
qcow2/raw drives appended after the boot+seed disks. The
qemu provisioner deliberately doesn't manage data disks
in production (the appliance contract is "customer attaches a
labeled volume"), so this stays test-facing for now; the
StartForDiagnostic non-Disks variant is unchanged.
* `Cluster.extraDisks` is the unexported field threaded into
`startVM`; production calls leave it nil.
* Test helpers `makeLabeledDataDisk` (qemu-img + virt-format)
and `waitForK3sReady` (in-VM kubectl polling, since
diagnostic boots don't import the kubeconfig host-side).
Runtime: ~3 min on a tuned local box, ~150 MB extra disk file in
t.TempDir() (cleaned up at test exit). Skip-on-prereq for /dev/kvm,
virt-customize, virt-format. Now states 1, 3, 4, 5, 7 of the seed
matrix are deterministic-locally; states 2 and 6 stay unit-tested
via the embedded shell script under pkg/provision/qemu.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Synthetic Pod sleeps 15s in its SIGTERM handler while writing one
marker file per second to a local-path PVC, ending with done.txt.
Local-path PVs land under /data/yolean which prepare-export packs
into /var/lib/y-cluster/data-seed.tar.zst. The test cracks the
tarball open via guestfish + zstd + tar and asserts step-15.txt +
done.txt are present in the seed bundle, which proves the kubelet
honoured the full 30s terminationGracePeriodSeconds across the
qemu shutdown path.
Catches:
- SIGTERM not delivered to pods on `y-cluster stop` -> no
markers past started.txt.
- Grace period cut short (kubelet kill at <15s) -> step-N.txt
for some N<15, no step-15.txt, no done.txt.
- PVC didn't reach /data/yolean -> started.txt timeout in the
pre-stop wait loop, fail before stop.
Why it matters: mariadb's grastate.dat write happens inside its
own SIGTERM handler. A torn shutdown means grastate.dat is not in
the seed; the customer-side first boot (fresh disk + extracted
seed) ends up in galera force-bootstrap with a missing-file sed
crash, and keycloak-admin returns the envoy `UD` response flag
("unconditional drop overload") because no upstream is healthy.
This synthetic workload tests the same property (full grace
period elapsed) without depending on mariadb specifically.
Path parity with appliance-qemu-to-gcp.sh: the script's local
stages 1-3 invoke `y-cluster provision/stop/prepare-export` over
the CLI; those subcommands call qemu.Provision/Stop/PrepareExport
from pkg/provision/qemu, which is what this test calls directly.
No CLI vs library divergence -- it's the same code path.
Build-tagged `e2e && kvm` to match the rest of the qemu e2e
suite. Adds ~5 min to a serial run of the e2e suite (in line
with what other qemu tests cost).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously teardown unconditionally logged "teardown complete, disk and keypair deleted" regardless of what actually happened. The os.Remove calls swallowed all errors (including IsNotExist) with `_ =`, so a teardown against an already-empty cache dir -- common after prepare-export consumed the qcow2, or when an operator pointed at the wrong --cacheDir -- still claimed deletion that never occurred. Now we track which artefacts the loop actually removed, log "teardown complete, no artefacts found to delete" when the list is empty, and emit a `removed` field listing the basenames when it isn't. Real os.Remove failures (anything other than IsNotExist) surface as Warn lines instead of disappearing. Two new unit tests pin both halves of the contract via zaptest/observer: empty-cache must NOT claim deletion; a cache with disk + keypair must list both in the `removed` field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
tweaks for #19 and #20