Plan: Launchplane-owned runner host hygiene

## Objective
Make runner-host hygiene a Launchplane-owned operational workflow instead of scattered repo-local Docker pruning.

## Finish Line

chris-testing hygiene is globally scheduled and auditable via Launchplane

## Current Status

State: Phase-one mutating hygiene completed; typed Docker reclaimable evidence and structured timeout handling are implemented and merged. Runner-host hygiene report, dry-run apply planning, adapter-boundary planning, Launchplane-owned audit storage, service audit evidence ingress, response summary fields, dedicated self-hosted ops executor lane, typed report counters, and command-timeout normalization are implemented. PR #886 added typed observation counters to `RunnerHostHygieneReport`, parses live `docker system df --format` reclaimable values into `docker_reclaimable_bytes`, and fails closed if Docker summary evidence cannot produce reclaimable bytes. PR #887 converts local runner command timeouts into structured `RemoteCommandResult(returncode=124)` failures with captured stdout/stderr instead of uncaught exceptions. The late auto-review's active-build self-match finding was stale for current main: PR #885 already fixed the probe with bracketed patterns, and mutate run `26366919592` proved the lane can execute.
Next action: Add the next read-only inventory layer for per-resource image/volume facts: image repository/tag/id/age/size/dangling/in-use hints, volume name/driver/labels/mountpoint/container references/size where feasible, and explicit no-touch classification for warm builders and runner/bootstrap state. Separately draft the `chris-testing` replacement/runbook.
Blocked by: No native issue blocker.
Waiting for: Operator decision after read-only inventory shows concrete image/volume candidates; do not approve phase-two mutation from aggregate reclaimable totals alone.
Last verified: 2026-05-24 after PR #887 merge commit `8f20676205b4a5ca660ca37a2566b7df46231c4c`; main CI, Security, CodeQL passed and live health returned `status: ok` with `storage_backend: postgres`. The most recent live hygiene dry-run proof remains Runner Host Hygiene run `26367597023` from PR #886, which wrote typed pre-apply evidence with `free_disk_bytes=331590270976`, `docker_reclaimable_bytes=148520000000`, `runner_workdir_bytes=0`, `orphan_buildkit_containers=0`, `orphan_buildkit_volumes=0`, and preserved warm builders `odoo-docker:verify-devtools` and `odoo-docker:verify-runtime`.

## Scope
- Add a Launchplane-owned global maintenance model for self-hosted runner hosts such as `chris-testing`.
- Preserve a narrow host-side executor/script for privileged Docker operations.
- Record before/after disk and Docker evidence, cleanup mode, retained builder budgets, and any skipped legacy state.
- Move shared host Docker pruning away from product repos, especially the existing `verireel` runner Docker prune workflow.
- Keep product-specific GHCR retention and preview lifecycle cleanup in their product/Launchplane domains.

## Acceptance Criteria

- There is one canonical scheduled owner for `chris-testing` Docker/BuildKit hygiene.
- Routine hygiene is bounded and preserves known warm builders: `odoo-docker-chris-testing` and `odoo-enterprise-chris-testing`.
- The first mutating pass is treated as phase-one bounded BuildKit pruning; post-run evidence decides whether to push forward into broader cleanup.
- Legacy/orphan BuildKit containers, images, and volumes are reported by default and removed only through an explicit reviewed retirement mode.
- Launchplane stores or exposes durable evidence for each run: host, caller, mode, before/after `df`, Docker summary, builder volumes, reclaimed estimate, and failures.
- Product repos no longer run broad shared-host Docker prune jobs.
- Docs describe when to use Launchplane hygiene versus repo-specific cleanup.
- The plan includes a `chris-testing` fragility/replacement runbook: what roles the host performs, what labels/service users/config it needs, what caches are intentionally warm, and how to stand up a replacement or parallel runner if the host fails.

## Relationships

- Related to `cbusillo/claude-local-machine` runner-cache docs and `scripts/chris-testing-docker-hygiene.sh`.
- Conflicts with `cbusillo/verireel` `.github/workflows/runner-docker-prune.yml`, which currently performs shared `chris-testing` Docker pruning from a product repo. The Launchplane-owned hygiene path should replace this workflow before broad scheduled pruning is enabled globally.
- Product repos may keep product-scoped cleanup such as preview teardown and GHCR retention, but they should not own shared runner-host Docker/BuildKit pruning.
- related: cbusillo/launchplane#636 - https://github.com/cbusillo/launchplane/issues/636

## Validation
- Run report mode against `chris-testing` without mutation.
- Run apply mode on schedule or manual dispatch and verify bounded cleanup only.
- Verify Odoo warm builders remain after cleanup and warm publish stays fast.
- Verify Launchplane evidence/audit records are written and visible.

## Decisions
- Prefer Launchplane as the global control plane for runner-host hygiene.
- Keep privileged host mutation narrow and explicit; Launchplane should own intent, authorization, schedule, and evidence.
- Do not make this repo-by-repo cleanup.

## Open Questions

- After the first `mutate=true` bounded BuildKit prune, does post-run evidence show enough reclaimed disk, or should Launchplane add an explicitly reviewed second mode for orphan image/volume cleanup?
- What retention budgets should be encoded after the Odoo consolidation: images, generic BuildKit, Odoo builders, action runner `_work`, and logs?
- How fragile is `chris-testing` now that it has grown from a basic runner into a multi-role host for Odoo verification, warm builders, and Launchplane hygiene operations?
- What is the target recovery design: rebuild `chris-testing` from documented steps, maintain a warm standby, or split responsibilities across dedicated runner hosts?
- What minimum replacement runbook is required before relying on the host for scheduled hygiene: OS/packages, Docker/BuildKit setup, GitHub runner registration, labels, service users, Launchplane repo variables, OIDC grants, warm builder seeding, and dry-run validation?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan: Launchplane-owned runner host hygiene #474

Objective

Finish Line

Current Status

Scope

Acceptance Criteria

Relationships

Validation

Decisions

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Plan: Launchplane-owned runner host hygiene #474

Description

Objective

Finish Line

Current Status

Scope

Acceptance Criteria

Relationships

Validation

Decisions

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions