Skip to content

feat(ci): on-demand golden regenerate workflow#577

Merged
TaprootFreak merged 4 commits into
developfrom
feat/golden-regenerate-workflow
May 25, 2026
Merged

feat(ci): on-demand golden regenerate workflow#577
TaprootFreak merged 4 commits into
developfrom
feat/golden-regenerate-workflow

Conversation

@TaprootFreak
Copy link
Copy Markdown
Contributor

What

Permanent `workflow_dispatch`-only workflow `.github/workflows/golden-regenerate.yaml` that regenerates visual-regression baselines on the dfx01 self-hosted runner and commits them back to the dispatched branch as `github-actions[bot]`.

Replaces the previous pattern of introducing and removing a temporary `golden-bootstrap.yaml` per regen cycle (documented in `docs/visual-regression-tests.md` until this PR).

Trigger

```bash
gh workflow run golden-regenerate.yaml --ref
```

No inputs needed — the workflow uses `github.ref` as the checkout ref.

Behaviour

  • Runs on `[self-hosted, macOS, ARM64, m3-ultra, realunit-app]` — same labels as the `golden-tests` job in `pull-request.yaml`.
  • Setup steps mirror `golden-tests` 1:1 (Flutter 3.41.6 via subosito, `flutter pub get`, generators, build_runner).
  • Regen: `flutter test test/goldens --update-goldens`.
  • Auto-commit: git as `github-actions[bot]` (`41898282+github-actions[bot]@users.noreply.github.com`), `git add test/goldens/`, clean-exit on no-diff, otherwise commit `test(goldens): regenerate baselines on dfx01` and push.
  • Concurrency group `golden-regenerate-` with cancel-in-progress so two parallel dispatches on the same ref don't race.
  • 30 min timeout (matches `golden-tests`).
  • `permissions: contents: write` and `token: ${{ secrets.GITHUB_TOKEN }}` on checkout — load-bearing for the push.

Failure mode on protected branches

`develop` and `main` are protected by ruleset. A dispatch against either fails cleanly on the `git push` step — no force-push, no bypass. To still recover the regen output, the workflow uploads the regenerated PNGs as a `golden-baselines` artifact whenever the push step fails; rsync them onto a feature branch and commit there.

Doc updates

  • `docs/visual-regression-tests.md` — replaced the "Initial bootstrap" + "Adding a new golden test" bootstrap-pattern sections (steps 1-4 of bootstrap) with the one-command `gh workflow run golden-regenerate.yaml --ref ` flow. Updated the "Reacting to a CI drift", "Flutter SDK bumps", and "dfx01 outage fallback" sections to reference the new workflow.
  • `docs/handbook/README.md` — handbook-screenshot regen now points at the same workflow instead of asking for a local `flutter test --update-goldens` run.

Out of scope

  • `DFXServer/server/infrastructure/dfx01/actions-runners/golden-tests-recipe.md` (different repo) still references the old bootstrap pattern. Separate PR there.

Verification

  • `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/golden-regenerate.yaml'))"` passes.
  • `actionlint` clean except for the two expected "unknown label" warnings on `m3-ultra` and `realunit-app` (same as the existing `golden-tests` job — custom self-hosted runner labels are not in actionlint's default known-labels list).

Permanent workflow_dispatch workflow that regenerates the visual-
regression baselines on the dfx01 self-hosted runner and commits them
back to the dispatched branch as github-actions[bot]. Replaces the
previous pattern of introducing and removing a temporary
golden-bootstrap.yaml per regen cycle.

Usage: gh workflow run golden-regenerate.yaml --ref <feature-branch>

Setup steps mirror the golden-tests job in pull-request.yaml so the
regenerated baselines render under the exact toolchain that validates
them. Concurrency group golden-regenerate-<ref> with cancel-in-progress
keeps two parallel dispatches on the same ref from racing.

On a protected ref (develop, main) the push fails by design — no
force-push, no bypass. The regenerated PNGs are still uploaded as a
golden-baselines artifact so they can be rsynced onto a feature branch.

docs/visual-regression-tests.md replaces the bootstrap/download/rsync
section with the one-command flow. docs/handbook/README.md picks up
the same change for handbook screenshot regeneration.
- golden-regenerate.yaml: narrow `git add` to `test/goldens/screens/*/goldens/`
  so alchemist's transient `failures/` dirs never accidentally land in a
  bot-pushed commit (MINOR 3 from review).
- golden-regenerate.yaml: tighten fallback artifact `if-no-files-found`
  from `warn` to `error`. An empty fallback artifact is worse than no
  artifact at all — the user expects the artifact precisely because the
  push failed; a silent empty is a footgun (MINOR 5).
- docs/visual-regression-tests.md: rewrite the stale intro that still
  claimed "5 screens, 8 baseline PNGs" (left over from PR #541's pilot
  description) → 57 page files / 68 Golden PNGs, validated by the
  required `Visual Regression` check (MINOR 4).
- docs/visual-regression-tests.md: extend the "On a protected ref the
  push fails by design" paragraph to also cover the parallel-human-push
  / non-fast-forward race — same artifact-fallback path, same recovery
  (MAJOR 1).
Round-2 review caught that `test/goldens/screens/*/goldens/` (the narrow
pattern from round 1) silently matches zero files — git pathspecs are
not shell globs, the trailing `/goldens/` anchors a directory exactly
and `*` does not expand through it. Result: every workflow_dispatch run
would have aborted at `git add` with `fatal: pathspec ... did not match
any files` before reaching the no-diff early-exit. Regression of the
round-1 narrow-add fix.

Switch to `'test/goldens/screens/**/goldens/**'` — verified locally:
matches the same files as the artifact path on line 92 and excludes
alchemist's transient `failures/` directories (alchemist writes them
one level up at `<feature>/failures/...`, not inside `goldens/`).

Also: rewrite stale "Pilot scope" section in visual-regression-tests.md
that still listed only the 5 pilot screens — replaced with a generic
"Layout" section describing the per-feature directory convention, which
is now accurate at 57 page files / 68 PNGs.
Final stale-pilot reference on docs/visual-regression-tests.md:80
caught by subagent round-3 review. Was inside the "Adding a new
golden test" how-to bullet, not a scope claim, but the term "pilot"
is no longer accurate now that every page has a Golden.
@TaprootFreak TaprootFreak marked this pull request as ready for review May 25, 2026 18:34
@TaprootFreak TaprootFreak merged commit 62b966d into develop May 25, 2026
12 checks passed
@TaprootFreak TaprootFreak deleted the feat/golden-regenerate-workflow branch May 25, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant