Skip to content

chore(ci): shadow CI jobs on blacksmith for runner trial#54559

Merged
gantoine merged 8 commits intomasterfrom
georges/blacksmith
Apr 16, 2026
Merged

chore(ci): shadow CI jobs on blacksmith for runner trial#54559
gantoine merged 8 commits intomasterfrom
georges/blacksmith

Conversation

@gantoine
Copy link
Copy Markdown
Member

@gantoine gantoine commented Apr 14, 2026

Problem

We want to evaluate Blacksmith runners but need apples-to-apples performance data to make an informed cost/performance decision. A one-shot cut-over would make it impossible to compare runners on identical commits and workloads, and a narrow shadow of a few jobs would only cover part of our CI spend.

Changes

For most runs-on job that does actual compute on the runner, this PR runs a parallel Blacksmith shadow on the same commit. The trial is gated behind repo variable BLACKSMITH_SHADOW_ENABLED so it can be toggled without code changes, and every shadow is continue-on-error: true and excluded from required-check lists.

Approach chosen per job shape:

  • Single-shot jobs → matrix-expanded with a 2-entry runner dimension. ~6 lines of diff each; side effects (artifact uploads, cache saves) gated to the entry so shadows don't collide.
    • ci-python.yml, ci-proto.yml (3 jobs), ci-mcp.yml, ci-dagster.yml (added runner dim to existing CH-version matrix), ci-rust-flags-integration.yml, ci-nodejs.yml (3 jobs including sharded tests), ci-rust.yml (build + linting), ci-backend.yml (repo-checks + async-migrations), ci-storybook.yml
  • Jobs with dynamic / complex matrices → separate -blacksmith shadow job with a trimmed hardcoded subset (1-2 shards) — fusing dimensions into a fromJson matrix breaks downstream needs.X.outputs semantics.
    • ci-rust.ymltest-blacksmith (2 representative packages)
    • ci-blacksmith-shadow.yml (new file) → turbo-discover-shadow, turbo-tests-shadow, check-migrations-shadow, django-shadow

Deliberately skipped: the 11 container-build / CD workflows (_rust-build-images.yml, cd-*-image.yml, container-images-*.yml, ci-*-container.yml, livestream-docker-image.yml, llm-gateway-cd.yml). They use build-push-action, which offloads the build to a cloud builder — the runner is just orchestrator, so Blacksmith would produce near-identical timings. Shadowing them would also risk double-pushing images.

Apples-to-apples comparison tool: .github/scripts/compare-ci-runners.py scrapes job durations via gh, pairs (workflow, base_job_name, sha) across runners, and reports median / p95 / speedup in markdown or CSV. Pairs work because shadows run on the same SHA as originals and encode the runner label in the job name (for matrix-expanded jobs) or use a -blacksmith / shadow suffix (for dedicated shadow jobs).

Enable: set repo variable BLACKSMITH_SHADOW_ENABLED=true in Settings → Variables → Actions. Flip to false to halt shadows mid-trial without a code change. After a few days, run:

python3 .github/scripts/compare-ci-runners.py --days 7 > report.md

How did you test this code?

  • YAML validated with yaml.safe_load across all 10 edited workflow files
  • Ran repo's timeout linter (.github/scripts/check-ci-timeouts.py) — all jobs have timeout-minutes
  • actionlint via docker — no new errors introduced (only pre-existing shellcheck/label warnings the repo already tolerates)
  • Not yet tested end-to-end on CI. I'm an agent and haven't validated that the matrix-expanded jobs actually allocate runners on Blacksmith and produce paired data — that requires enabling the repo var. Worth running with the flag on against a sacrificial PR first.

Open items for human follow-up:

  • Run compare-ci-runners.py --days 1 after one day's data to sanity-check pairing before leaving shadows on for the full week.

Publish to changelog?

no

Docs update

skip-inkeep-docs

🤖 LLM context

Authored with Claude Opus 4.6 (1M context)

Runs each Depot-labeled CI job in parallel on a matching Blacksmith tier,
gated by repo variable BLACKSMITH_SHADOW_ENABLED so the trial can be
flipped off without a code change. Shadows are continue-on-error and not
wired into any required-check needs list.

Approach:
- Single-shot jobs use a 2-entry matrix on `runner` (adds ~6 lines each).
- Jobs with dynamic/complex matrices get a separate `-blacksmith` or
  `-shadow` job with a trimmed matrix in ci-blacksmith-shadow.yml.
- Artifact uploads, cache saves, and side effects are gated to depot only.
- Container-build / CD workflows are intentionally skipped because
  depot/build-push-action offloads the build to Depot's cloud builder, so
  the runner is just an orchestrator and timing would be near-identical.

Post-trial pairing via .github/scripts/compare-ci-runners.py, which scrapes
gh job durations, pairs (workflow, job, sha) across runners, and reports
median/p95 speedup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gantoine gantoine changed the title chore(ci): shadow depot jobs on blacksmith for runner trial chore(ci): shadow CI jobs on blacksmith for runner trial Apr 14, 2026
gantoine and others added 2 commits April 14, 2026 19:28
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Folds the playwright shadow into the main workflow via a 2-entry runner
matrix gated on BLACKSMITH_SHADOW_ENABLED. Shadow leg is
continue-on-error; artifact uploads, Visual Review run, Cloudflare
deploy, PR comment, and screenshot patches remain depot-only via
!matrix.is_shadow. Drops the redundant playwright-shadow stub.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 15, 2026

🎭 Playwright report · View test results →

⚠️ 3 flaky tests:

  • Logout in another tab results in logout in the current tab too (chromium)
  • Save view (chromium)
  • Materialize view pane (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

@gantoine gantoine marked this pull request as ready for review April 15, 2026 12:51
Copilot AI review requested due to automatic review settings April 15, 2026 12:51
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team April 15, 2026 12:51
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 15, 2026

Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/scripts/compare-ci-runners.py
Line: 59-68

Comment:
**`-L 10` will truncate data to hours, not 7 days**

The fetch limit of `10` means only the last 10 workflow runs are retrieved before the `since` date filter is applied. On a repo running CI many times per day, 10 runs covers a few hours at most, making the `--days 7` window useless in practice. The comment directly above (which says `-L 1000`) suggests this was the intended value.

```suggestion
            "-L",
            "1000",
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/ci-e2e-playwright.yml
Line: 173-174

Comment:
**Matrix output race condition silently breaks Visual Review**

`playwright` is now a 2-entry matrix job. GitHub Actions resolves matrix job-level outputs from whichever entry finishes last. The shadow entry's `vr-create` step is skipped (`!matrix.is_shadow`), so its `vr_run_id` is `""`. If the shadow entry completes after the depot entry — which is unpredictable — `needs.playwright.outputs.vr_run_id` becomes `""`, and `handle-screenshots` skips both the "Complete Visual Review run" step and the VR CLI install (both gated on `needs.playwright.outputs.vr_run_id != ''`), silently abandoning every VR run.

A safe fix is to pin the output to only the depot matrix entry using a dedicated `outputs` step that is skipped for shadows, or move the VR run ID to a separate job output artifact instead of a matrix output.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/ci-e2e-playwright.yml
Line: 741-742

Comment:
**Unrelated job rename may break branch protection**

`capture-run-time` has been renamed to `calculate-running-time`. This change is unrelated to the shadow trial but would silently remove the old job name from any branch protection required-status-check list. If `capture-run-time` was a required check, merging to master would become unguarded until the protection rule is updated.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "cleanup" | Re-trigger Greptile

Comment thread .github/scripts/compare-ci-runners.py
Comment thread .github/workflows/ci-e2e-playwright.yml Outdated
Comment thread .github/workflows/ci-e2e-playwright.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a repo-variable–gated “shadow” lane to run most compute-heavy CI jobs on Blacksmith in parallel with the existing Depot runners, enabling apples-to-apples timing comparisons on the same SHAs.

Changes:

  • Matrix-expands many existing CI jobs to run on both Depot and Blacksmith (shadow runs are continue-on-error and avoid conflicting side effects like artifact uploads / cache saves).
  • Adds a dedicated ci-blacksmith-shadow.yml workflow for jobs that can’t be safely matrix-expanded (complex/dynamic matrices, side effects).
  • Adds .github/scripts/compare-ci-runners.py to scrape GitHub Actions job durations and report paired Depot vs Blacksmith statistics.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
.github/workflows/ci-storybook.yml Matrix-expands Storybook build to add an optional Blacksmith shadow and gates cache/artifact side effects.
.github/workflows/ci-rust.yml Matrix-expands Rust build/lint; adds a separate Blacksmith-only shadow test job subset.
.github/workflows/ci-rust-flags-integration.yml Matrix-expands Rust /flags integration tests to add an optional Blacksmith shadow.
.github/workflows/ci-python.yml Matrix-expands Python quality job; gates cache/test-result artifact writes for shadows.
.github/workflows/ci-proto.yml Matrix-expands proto lint/breaking/codegen checks to add optional Blacksmith shadows.
.github/workflows/ci-nodejs.yml Matrix-expands Node lint/build and adds a runner dimension to sharded tests for optional Blacksmith shadows.
.github/workflows/ci-mcp.yml Matrix-expands MCP integration tests to add an optional Blacksmith shadow.
.github/workflows/ci-e2e-playwright.yml Matrix-expands Playwright job to add an optional Blacksmith shadow and disables shadow side effects (VR, artifacts, comments, deploy).
.github/workflows/ci-dagster.yml Adds runner dimension to Dagster test matrix and avoids artifact writes for shadows.
.github/workflows/ci-blacksmith-shadow.yml New workflow containing Blacksmith-only trimmed shadows for complex backend jobs.
.github/workflows/ci-backend.yml Matrix-expands select backend jobs (repo checks, async migrations) for optional Blacksmith shadows and gates artifact writes.
.github/scripts/compare-ci-runners.py New CLI to fetch CI job durations via gh and compute paired Depot vs Blacksmith medians/p95/speedup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/ci-blacksmith-shadow.yml Outdated
Comment thread .github/workflows/ci-nodejs.yml Outdated
Comment thread .github/workflows/ci-nodejs.yml Outdated
Comment thread .github/workflows/ci-rust.yml
Comment thread .github/scripts/compare-ci-runners.py Outdated
Comment thread .github/scripts/compare-ci-runners.py
Comment thread .github/scripts/compare-ci-runners.py
Speedup was silently contaminated by failed runs — a crash-fast job
paired against a successful slow job was reported as a speedup. Now:

- Speedup uses only success↔success pairs
- Per-runner failure counts (x/y) surface runner-specific instability
- Mixed-conclusion pairs are reported separately
- New "Unpaired runs" table shows where the shadow didn't fire
- CSV gains depot_conclusion/blacksmith_conclusion columns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@rnegron rnegron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving to unblock!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/ci-blacksmith-shadow.yml
Comment thread .github/workflows/ci-blacksmith-shadow.yml Outdated
Comment thread .github/workflows/ci-blacksmith-shadow.yml Outdated
Comment thread .github/scripts/compare-ci-runners.py Outdated
@gantoine gantoine merged commit cb1fd0b into master Apr 16, 2026
248 of 251 checks passed
@gantoine gantoine deleted the georges/blacksmith branch April 16, 2026 15:31
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented Apr 16, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-04-16 16:02 UTC Run
prod-us ✅ Deployed 2026-04-16 16:12 UTC Run
prod-eu ✅ Deployed 2026-04-16 16:18 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants