feat(ci): add _ci-gate.yml shared workflow with queue watchdog#247
feat(ci): add _ci-gate.yml shared workflow with queue watchdog#247JacobPEvans merged 2 commits intomainfrom
Conversation
Centralizes the conditional-required-check / Merge Gatekeeper pattern that previously lived as a duplicated ~110-line `ci-gate.yml` in every consumer repo. New reusable workflow with four jobs: - changes: dorny/paths-filter on caller-supplied filter YAML - conditional checks (nix_validate, markdown_lint, file_size, python_security): each gated on (input toggle && filter output) - watchdog: sleeps queue_timeout_minutes (default 10) then cancels any sibling job still in `queued` state via the GitHub API - gate (name "Merge Gate"): re-actors/alls-green aggregator The watchdog closes a class of bugs first observed on nix-home#205: when a `needs:` dependency is stuck queued (e.g. a self-hosted runner never claims it), the dependent gate job — even with `if: !cancelled()` — never schedules, leaving the required `Merge Gate` status absent forever and the PR unmergeable. By forcing a terminal state, the watchdog guarantees the gate always evaluates within queue_timeout_minutes + watchdog runtime. Also adds `timeout-minutes: 30` to _nix-validate.yml's validate job so a wedged running job (different failure mode, same symptom) also terminates cleanly. Watchdog logic lives in scripts/ci-gate-watchdog.sh, sparse-checked-out at runtime so the YAML stays under the inline-script-guard threshold and the logic is shellcheck-testable in isolation. Branch protection unchanged: the gate job's `name: Merge Gate` matches existing required_status_checks rulesets across all consumer repos. Follow-up PRs will migrate consumer repos to thin callers of this workflow, starting with ai-assistant-instructions as the canary. (claude)
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a centralized CI gatekeeper mechanism to standardize and improve the reliability of required status checks across multiple repositories. By implementing a watchdog job that monitors and cancels stalled queued jobs, it prevents PRs from becoming unmergeable due to stuck workflow dependencies, ensuring that the 'Merge Gate' status consistently reports a terminal result. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. The queue was stuck in endless wait, No runner found to seal its fate. But now the watchdog guards the gate, To clear the path and keep it straight. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Adds a reusable CI “Merge Gatekeeper” workflow to centralize conditional required checks across consumer repos, including a watchdog to prevent required status checks from getting stuck when self-hosted runner jobs remain queued.
Changes:
- Introduce reusable workflow
.github/workflows/_ci-gate.ymlwith change detection, conditional check jobs, a queue watchdog, and an aggregate “Merge Gate”. - Add
scripts/ci-gate-watchdog.shto cancel sibling jobs stuck inqueuedafter a configurable timeout. - Add
timeout-minutes: 30to_nix-validate.yml’s validate job as defense-in-depth for wedged running jobs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
scripts/ci-gate-watchdog.sh |
Implements the queue watchdog logic used by the new CI gate workflow. |
.github/workflows/_nix-validate.yml |
Adds a hard timeout to avoid indefinitely running Nix validation blocking the gate. |
.github/workflows/_ci-gate.yml |
New reusable workflow coordinating conditional checks + watchdog + required “Merge Gate” aggregator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- gate: `always() && !cancelled()` ensures the gate runs even when a dependency fails (not just when not cancelled) - watchdog: add `contents: read` permission so sparse-checkout succeeds when job-level perms override workflow-level perms - watchdog script: use awk for float-safe minute→second conversion - watchdog script: `gh api --paginate` to catch queued jobs beyond the first 100 in large runs (claude)
Summary
_ci-gate.ymlreusable workflow centralizing the conditional-required-check / Merge Gatekeeper pattern that currently lives as ~110 lines of duplicated boilerplate in each of 8 consumer repos.Queue Watchdogjob cancels sibling jobs still inqueuedstate after a configurable timeout, so the requiredMerge Gatestatus can never be stuck-pending indefinitely.scripts/ci-gate-watchdog.shcarries the watchdog logic (extracted from inline YAML so it's shellcheck-testable and stays under the inline-script-guard threshold).timeout-minutes: 30to_nix-validate.yml's validate job — defense-in-depth against the running-job variant of the same symptom.Why
nix-home#205exposed the exact failure mode the watchdog now prevents: when a self-hosted runner failed to claim theNix Validate / Validatejob, that job sat inqueuedfor the entire PR lifetime. Because GitHub Actions only schedules a `needs:` dependent once every upstream job reaches a terminal state (success / failure / cancelled / skipped) — and `queued` is not terminal — the dependent `Merge Gate` job never scheduled, even with `if: ${{ !cancelled() }}`. The required check stayed permanently absent and the PR was unmergeable.The watchdog closes the loop by waiting `queue_timeout_minutes` (default 10) and then cancelling any sibling job still in `queued`. Cancellation is terminal, so the gate always evaluates within a bounded time and `Merge Gate` always reports — pass, fail, or "stuck job cancelled → fail" (the right semantic; investigate the runner instead of letting the PR rot).
A follow-up DRY win: each consumer repo's `ci-gate.yml` will collapse to ~25 lines of thin caller.
Filter convention
Callers define filters under conventional names; the shared workflow ignores anything outside this set:
Each conditional check is also gated by an opt-in input boolean (`nix_validate: true`, etc.). All default to `false` so a caller only pays for what it wants.
Branch protection
No rule changes needed. The gate job keeps `name: Merge Gate`, matching existing required_status_checks rulesets across all consumer repos.
Test plan
Rollback
Callers reference `@main`, so reverting rolls every consumer back automatically.
🤖 Generated with Claude Code