Skip to content

Harden rollout VM lab startup#96

Merged
chrisbliss18 merged 3 commits intov2from
feature/rollout-vm-lab-self-start
May 1, 2026
Merged

Harden rollout VM lab startup#96
chrisbliss18 merged 3 commits intov2from
feature/rollout-vm-lab-self-start

Conversation

@chrisbliss18
Copy link
Copy Markdown
Contributor

Summary

This hardens the rollout VM lab so snapshot-backed rehearsal targets can start from an offline lab topology without hanging in the staging step.

The lab harness now adds start-topology and calls it from install-v2, so the db/v1/v2 lab guests are started before SSH waits and artifact staging begin. The helper is intentionally narrow: it only touches the three prefix-derived lab domains, validates that the complete db/v1/v2 topology exists before starting anything, auto-starts only cleanly shut off VMs, and refuses unexpected states such as crashed, paused, or suspended for operator inspection.

The VM lab docs were updated to make that behavior explicit near the setup instructions and snapshot-backed flow notes.

Why

During rollout readiness rehearsal, make rollout-vm-lab-snapshot-all-smoke assumed the VMs were already running before staging v2. When the baseline snapshots left domains shut off, the target could sit quietly waiting for SSH. This change makes the rehearsal target self-contained while keeping VM lifecycle automation tightly scoped and easy to audit.

Example output

Powered-off lab topology now starts explicitly before staging:

INFO start_topology prefix=jetmon-rollout vms=jetmon-rollout-db,jetmon-rollout-v1,jetmon-rollout-v2
PASS vm_started=jetmon-rollout-db previous_state="shut off"
PASS vm_started=jetmon-rollout-v1 previous_state="shut off"
PASS vm_started=jetmon-rollout-v2 previous_state="shut off"

Wrong prefix or missing topology fails before any start attempts:

INFO start_topology prefix=jetmon-missing vms=jetmon-missing-db,jetmon-missing-v1,jetmon-missing-v2
WARN missing_vm_domain=jetmon-missing-db
WARN missing_vm_domain=jetmon-missing-v1
WARN missing_vm_domain=jetmon-missing-v2
FAIL topology is incomplete; run create-topology first or check JETMON_ROLLOUT_PREFIX=jetmon-missing

Unsafe libvirt states are refused for inspection:

FAIL VM is not in a safe auto-start state: jetmon-rollout-v2 state="paused"

Validation

  • bash -n scripts/rollout-vm-lab.sh
  • git diff --check
  • make rollout-vm-lab-stage-v2 from a powered-off topology
  • bad-prefix start-topology simulation
  • paused-VM start-topology simulation
  • make rollout-vm-lab-snapshot-execute-smoke
  • make rollout-docs-verify

Chris Jean added 3 commits April 30, 2026 22:36
Make the VM lab staging path start db, v1, and v2 guests before waiting for SSH. This keeps snapshot-backed smoke targets self-contained when the baseline snapshots leave the domains shut off, which avoids a quiet install-v2 wait during rollout rehearsals.

Document the explicit start-topology command and clarify that the Makefile VM lab targets cover VM startup before staging artifacts and running snapshot flows.
Refine the VM lab startup helper so it only auto-starts domains in known safe inactive states and refuses ambiguous libvirt states like paused or suspended. The previous version relied on virsh start failing, which was correct but less useful for an operator running a rollout rehearsal.

Also document the exact prefix-derived lab domains that start-topology may touch so the VM lifecycle boundary is visible near the setup instructions.
Tighten start-topology after repeated rollout rehearsal review. The helper now validates the complete db/v1/v2 topology before starting any domain, refuses crashed or otherwise unexpected libvirt states instead of restarting them, and keeps the operator-facing error focused on the prefix or setup step to fix.

This prevents partial auto-start behavior when the lab prefix is wrong or the topology is incomplete, while preserving the intended self-start behavior for cleanly shut-off snapshot-backed lab VMs.
@chrisbliss18 chrisbliss18 merged commit 22af147 into v2 May 1, 2026
@chrisbliss18 chrisbliss18 deleted the feature/rollout-vm-lab-self-start branch May 1, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant