Skip to content

Remove STONITH eviction + self-watchdog (keep poweroff)#271

Merged
posix4e merged 1 commit into
mainfrom
chore/remove-stonith
May 30, 2026
Merged

Remove STONITH eviction + self-watchdog (keep poweroff)#271
posix4e merged 1 commit into
mainfrom
chore/remove-stonith

Conversation

@posix4e
Copy link
Copy Markdown
Member

@posix4e posix4e commented May 30, 2026

Simplification: rip out the STONITH fencing for now.

kill_old_tunnels + self_watchdog churned the Cloudflare tunnel hand-off (the 502 flap during cutover that the no-retry Verify CP health payload deploy step fails on). On the SSH/prod path the relaunch already destroys the old CP VM before booting the new one, so the fencing was redundant.

  • Removed stonith::kill_old_tunnels, stonith::self_watchdog, their cp::run spawns, and the broadcast / with_graceful_shutdown plumbing only the watchdog drove.
  • Kept stonith::poweroff (fatal-boot-error paths still use it).
  • Trade-off: orphaned old CP tunnels linger in CF until force-cleanup-tunnels.yml reaps them — acceptable for now.

clippy clean, 75 tests pass.

🤖 Generated with Claude Code

The new-CP-evicts-old-CP fencing — `kill_old_tunnels` (delete the old
CP's tunnel by name prefix) and `self_watchdog` (poll CF, poweroff the
old CP when its tunnel vanishes) — churned the Cloudflare tunnel
hand-off, producing the 502 flap that the brittle "Verify CP health
payload" deploy step trips on. On the SSH/prod path the relaunch already
destroys the old CP VM before booting the new one, so the fencing was
redundant there anyway.

Removed: `stonith::kill_old_tunnels`, `stonith::self_watchdog`, their
spawns in `cp::run`, and the broadcast/`with_graceful_shutdown` plumbing
that only the watchdog drove. Kept `stonith::poweroff` (used by the
fatal-boot-error paths). Orphaned old CP tunnels now linger in CF until
`force-cleanup-tunnels.yml` reaps them — acceptable for now.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@posix4e posix4e merged commit bff610a into main May 30, 2026
1 check passed
@posix4e posix4e deleted the chore/remove-stonith branch May 31, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant