chore(runner): pin image digest, add healthcheck, preflight token check#237
Merged
Conversation
This was referenced May 24, 2026
There was a problem hiding this comment.
Code Review
This pull request introduces a preflight check to verify the presence of the GitHub runner token and adds container health checks to both the Docker Compose configuration and the Makefile's diagnostic tools. Additionally, the runner image is now pinned by digest, and automatic updates for the runner agent have been enabled. Feedback was provided regarding an inconsistency in the Makefile's health check validation logic, where the default status value should be unified to ensure accurate reporting and validation.
fb2be64 to
8ec28ed
Compare
JacobPEvans
added a commit
to dryvist/ai-assistant-instructions
that referenced
this pull request
May 24, 2026
The existing Runner Choice rule covered RunsOn vs github-hosted but said nothing about on-prem self-hosted runners. Two repos use them (orbstack-kubernetes, ansible-proxmox-apps), and the recurring token-refresh failure in orbstack-kubernetes (PR #234, dryvist/orbstack-kubernetes#237) shows what happens when the on-prem path has no documented requirements. - Reframe the lead paragraph so RunsOn is the default and on-prem is an explicit exception for hardware-bound jobs, not a routine choice. - Add an on-prem row to the decision table with the actual in-use labels. - Add an "On-prem runner requirements" subsection listing the five non-negotiables for any on-prem runner: GitHub App auth (not PAT), digest-pinned image, healthcheck, dead-man's-switch heartbeat, pre-flight secret check. Reference implementation called out as the orbstack-kubernetes runner once dryvist/orbstack-kubernetes#237 and #238 land. Assisted-by: Claude <noreply@anthropic.com>
This was referenced May 24, 2026
JacobPEvans
added a commit
that referenced
this pull request
May 24, 2026
The ignorePatterns entry pointed at `kubernetes-monitoring` — the repo's previous name before c21d9ca renamed it to `orbstack-kubernetes`. The pattern hasn't matched any actual self-references since that rename, which means every github.com self-ref in CHANGELOG.md (release-please compare links), docs, and READMEs has been getting hit by markdown-link-check and randomly failing pre-commit with transient GitHub 502s. PRs #234, #237, #240, #241 have all hit this flake in the last 24 hours. Fix: update the pattern to the current repo name. Self-references are safe to skip because the CHANGELOG compare links and issue links are machine-generated by release-please and can't have typos; cross-repo links are still validated. Assisted-by: Claude <noreply@anthropic.com>
JacobPEvans
added a commit
that referenced
this pull request
May 24, 2026
…242) The ignorePatterns entry pointed at `kubernetes-monitoring` — the repo's previous name before c21d9ca renamed it to `orbstack-kubernetes`. The pattern hasn't matched any actual self-references since that rename, which means every github.com self-ref in CHANGELOG.md (release-please compare links), docs, and READMEs has been getting hit by markdown-link-check and randomly failing pre-commit with transient GitHub 502s. PRs #234, #237, #240, #241 have all hit this flake in the last 24 hours. Fix: update the pattern to the current repo name. Self-references are safe to skip because the CHANGELOG compare links and issue links are machine-generated by release-please and can't have typos; cross-repo links are still validated. Assisted-by: Claude <noreply@anthropic.com>
JacobPEvans
added a commit
that referenced
this pull request
May 24, 2026
The configmap template for Cribl Stream wraps the HEC URL in double
quotes (k8s/monitoring/cribl-stream-standalone/configmap-cribl-config.yaml
line 12), so after sed substitution the deployed outputs.yml renders as:
url: "https://10.0.1.200:8088/services/collector"
The helper's regex required strictly unquoted values, giving a false
negative on every run and failing test_splunk_hec_url_matches_secret.
This has blocked PR #234 and #237 on E2E for two days.
Fix: allow an optional matching pair of single or double quotes around
the URL value. Tested against six representative cases (double-quoted,
single-quoted, unquoted, mismatched-quote, unsubstituted placeholder,
different URL) — all behave correctly.
Assisted-by: Claude <noreply@anthropic.com>
JacobPEvans
added a commit
that referenced
this pull request
May 24, 2026
The configmap template for Cribl Stream wraps the HEC URL in double
quotes (k8s/monitoring/cribl-stream-standalone/configmap-cribl-config.yaml
line 12), so after sed substitution the deployed outputs.yml renders as:
url: "https://10.0.1.200:8088/services/collector"
The helper's regex required strictly unquoted values, giving a false
negative on every run and failing test_splunk_hec_url_matches_secret.
This has blocked PR #234 and #237 on E2E for two days.
Fix: allow an optional matching pair of single or double quotes around
the URL value. Tested against six representative cases (double-quoted,
single-quoted, unquoted, mismatched-quote, unsubstituted placeholder,
different URL) — all behave correctly.
Assisted-by: Claude <noreply@anthropic.com>
JacobPEvans
added a commit
that referenced
this pull request
May 24, 2026
…241) The configmap template for Cribl Stream wraps the HEC URL in double quotes (k8s/monitoring/cribl-stream-standalone/configmap-cribl-config.yaml line 12), so after sed substitution the deployed outputs.yml renders as: url: "https://10.0.1.200:8088/services/collector" The helper's regex required strictly unquoted values, giving a false negative on every run and failing test_splunk_hec_url_matches_secret. This has blocked PR #234 and #237 on E2E for two days. Fix: allow an optional matching pair of single or double quotes around the URL value. Tested against six representative cases (double-quoted, single-quoted, unquoted, mismatched-quote, unsubstituted placeholder, different URL) — all behave correctly. Assisted-by: Claude <noreply@anthropic.com>
Three additive hardening changes to the self-hosted runner; addresses the recurring token-refresh failure loop that blocked #234 at merge. - Pin myoung34/github-runner to the current ubuntu-jammy multi-arch manifest digest (sha256:0d48...) so pulls are deterministic and Renovate's docker-compose manager tracks future builds. - Drop DISABLE_AUTO_UPDATE=1 so the actions/runner binary self-updates on registration; combined with the Renovate-managed image digest this keeps both layers current without manual docker pull cycles. - Add a Docker healthcheck (curl https://api.github.com/zen) and surface it through runner-doctor-container so reachability regressions show up in `docker compose ps` and the doctor target instead of failing silently. - Add runner-preflight as a dependency of runner-foreground and runner-start; doppler-injected GH_PAT_RUNNER_TOKEN is asserted non-empty before docker compose up so the LaunchAgent stops the silent 30s retry loop and surfaces the actionable error in stderr.log. Follow-up: migrate from fine-grained PAT to GitHub App auth (myoung34/github-runner natively supports APP_ID/APP_PRIVATE_KEY and mints installation tokens internally — no expiry, no manual rotation). Captured as a separate issue because it requires creating a GitHub App in the org and updating Doppler. Assisted-by: Claude <noreply@anthropic.com>
8ec28ed to
219fcaf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three additive hardening changes to the self-hosted runner so the recurring token-refresh / silent-failure loop that blocked #234 has fewer ways to bite. Per the request to make the runner reliable through best practices and latest versions — not by piling on validation logic.
myoung34/github-runner:ubuntu-jammy@sha256:0d48…so pulls are deterministic and Renovate's docker-compose manager auto-bumps the digest when the upstream ubuntu-jammy build moves.DISABLE_AUTO_UPDATE=1. Lets the actions/runner binary self-update on registration; combined with the Renovate-managed image digest both layers stay current without manualdocker pullcycles.curl https://api.github.com/zenevery 60s. Surfaces indocker compose psandmake runner-doctor-containerso DNS/reachability regressions fail loud instead of silent.runner-preflightis now a prerequisite ofrunner-foregroundandrunner-start. AssertsGH_PAT_RUNNER_TOKENis non-empty in the Doppler-injected env beforedocker compose up— the LaunchAgent's silent 30s retry loop becomes a single actionable line in~/Library/Logs/orbstack-runner/stderr.log.Out of scope (separate follow-up)
Migration from fine-grained PAT to GitHub App auth. The myoung34 image natively supports
APP_ID+APP_PRIVATE_KEYand mints installation tokens internally (auto-refreshing, no rotation), but it requires creating a GitHub App in the org and updating Doppler — captured as its own issue. This PR keeps the existing PAT path intact.Test plan
pre-commit run --files Makefile docker/actions-runner/docker-compose.yml— passesmake -n runner-foreground— dependency chain renders correctly (kubeconfig → preflight → compose up)make runner-preflightagainst live Doppler — exits 0make runner-doctor-containeragainst the currently-running legacy container — passes (the{{if .State.Health}}…{{else}}healthy{{end}}template gracefully handles the no-healthcheck container during the transition)launchctl kickstart -k gui/\$(id -u)/\$(make -s runner-print-label); confirmmake runner-doctorreportshealth: healthyon the new containerAssisted-by: Claude noreply@anthropic.com