chore(ci): shadow CI jobs on blacksmith for runner trial by gantoine · Pull Request #54559 · PostHog/posthog

gantoine · 2026-04-14T18:57:53Z

Problem

We want to evaluate Blacksmith runners but need apples-to-apples performance data to make an informed cost/performance decision. A one-shot cut-over would make it impossible to compare runners on identical commits and workloads, and a narrow shadow of a few jobs would only cover part of our CI spend.

Changes

For most runs-on job that does actual compute on the runner, this PR runs a parallel Blacksmith shadow on the same commit. The trial is gated behind repo variable BLACKSMITH_SHADOW_ENABLED so it can be toggled without code changes, and every shadow is continue-on-error: true and excluded from required-check lists.

Approach chosen per job shape:

Single-shot jobs → matrix-expanded with a 2-entry runner dimension. ~6 lines of diff each; side effects (artifact uploads, cache saves) gated to the entry so shadows don't collide.
- ci-python.yml, ci-proto.yml (3 jobs), ci-mcp.yml, ci-dagster.yml (added runner dim to existing CH-version matrix), ci-rust-flags-integration.yml, ci-nodejs.yml (3 jobs including sharded tests), ci-rust.yml (build + linting), ci-backend.yml (repo-checks + async-migrations), ci-storybook.yml
Jobs with dynamic / complex matrices → separate -blacksmith shadow job with a trimmed hardcoded subset (1-2 shards) — fusing dimensions into a fromJson matrix breaks downstream needs.X.outputs semantics.
- ci-rust.yml → test-blacksmith (2 representative packages)
- ci-blacksmith-shadow.yml (new file) → turbo-discover-shadow, turbo-tests-shadow, check-migrations-shadow, django-shadow

Deliberately skipped: the 11 container-build / CD workflows (_rust-build-images.yml, cd-*-image.yml, container-images-*.yml, ci-*-container.yml, livestream-docker-image.yml, llm-gateway-cd.yml). They use build-push-action, which offloads the build to a cloud builder — the runner is just orchestrator, so Blacksmith would produce near-identical timings. Shadowing them would also risk double-pushing images.

Apples-to-apples comparison tool: .github/scripts/compare-ci-runners.py scrapes job durations via gh, pairs (workflow, base_job_name, sha) across runners, and reports median / p95 / speedup in markdown or CSV. Pairs work because shadows run on the same SHA as originals and encode the runner label in the job name (for matrix-expanded jobs) or use a -blacksmith / shadow suffix (for dedicated shadow jobs).

Enable: set repo variable BLACKSMITH_SHADOW_ENABLED=true in Settings → Variables → Actions. Flip to false to halt shadows mid-trial without a code change. After a few days, run:

python3 .github/scripts/compare-ci-runners.py --days 7 > report.md

How did you test this code?

YAML validated with yaml.safe_load across all 10 edited workflow files
Ran repo's timeout linter (.github/scripts/check-ci-timeouts.py) — all jobs have timeout-minutes
actionlint via docker — no new errors introduced (only pre-existing shellcheck/label warnings the repo already tolerates)
Not yet tested end-to-end on CI. I'm an agent and haven't validated that the matrix-expanded jobs actually allocate runners on Blacksmith and produce paired data — that requires enabling the repo var. Worth running with the flag on against a sacrificial PR first.

Open items for human follow-up:

Run compare-ci-runners.py --days 1 after one day's data to sanity-check pairing before leaving shadows on for the full week.

Publish to changelog?

no

Docs update

skip-inkeep-docs

🤖 LLM context

Authored with Claude Opus 4.6 (1M context)

Runs each Depot-labeled CI job in parallel on a matching Blacksmith tier, gated by repo variable BLACKSMITH_SHADOW_ENABLED so the trial can be flipped off without a code change. Shadows are continue-on-error and not wired into any required-check needs list. Approach: - Single-shot jobs use a 2-entry matrix on `runner` (adds ~6 lines each). - Jobs with dynamic/complex matrices get a separate `-blacksmith` or `-shadow` job with a trimmed matrix in ci-blacksmith-shadow.yml. - Artifact uploads, cache saves, and side effects are gated to depot only. - Container-build / CD workflows are intentionally skipped because depot/build-push-action offloads the build to Depot's cloud builder, so the runner is just an orchestrator and timing would be near-identical. Post-trial pairing via .github/scripts/compare-ci-runners.py, which scrapes gh job durations, pairs (workflow, job, sha) across runners, and reports median/p95 speedup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Folds the playwright shadow into the main workflow via a 2-entry runner matrix gated on BLACKSMITH_SHADOW_ENABLED. Shadow leg is continue-on-error; artifact uploads, Visual Review run, Cloudflare deploy, PR comment, and screenshot patches remain depot-only via !matrix.is_shadow. Drops the redundant playwright-shadow stub. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-15T12:26:05Z

🎭 Playwright report · View test results →

⚠️ 3 flaky tests:

Logout in another tab results in logout in the current tab too (chromium)
Save view (chromium)
Materialize view pane (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

greptile-apps · 2026-04-15T12:54:44Z

Prompt To Fix All With AI

This is a comment left during a code review.
Path: .github/scripts/compare-ci-runners.py
Line: 59-68

Comment:
**`-L 10` will truncate data to hours, not 7 days**

The fetch limit of `10` means only the last 10 workflow runs are retrieved before the `since` date filter is applied. On a repo running CI many times per day, 10 runs covers a few hours at most, making the `--days 7` window useless in practice. The comment directly above (which says `-L 1000`) suggests this was the intended value.

```suggestion
            "-L",
            "1000",
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/ci-e2e-playwright.yml
Line: 173-174

Comment:
**Matrix output race condition silently breaks Visual Review**

`playwright` is now a 2-entry matrix job. GitHub Actions resolves matrix job-level outputs from whichever entry finishes last. The shadow entry's `vr-create` step is skipped (`!matrix.is_shadow`), so its `vr_run_id` is `""`. If the shadow entry completes after the depot entry — which is unpredictable — `needs.playwright.outputs.vr_run_id` becomes `""`, and `handle-screenshots` skips both the "Complete Visual Review run" step and the VR CLI install (both gated on `needs.playwright.outputs.vr_run_id != ''`), silently abandoning every VR run.

A safe fix is to pin the output to only the depot matrix entry using a dedicated `outputs` step that is skipped for shadows, or move the VR run ID to a separate job output artifact instead of a matrix output.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/ci-e2e-playwright.yml
Line: 741-742

Comment:
**Unrelated job rename may break branch protection**

`capture-run-time` has been renamed to `calculate-running-time`. This change is unrelated to the shadow trial but would silently remove the old job name from any branch protection required-status-check list. If `capture-run-time` was a required check, merging to master would become unguarded until the protection rule is updated.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "cleanup" | Re-trigger Greptile}

Copilot

Pull request overview

Adds a repo-variable–gated “shadow” lane to run most compute-heavy CI jobs on Blacksmith in parallel with the existing Depot runners, enabling apples-to-apples timing comparisons on the same SHAs.

Changes:

Matrix-expands many existing CI jobs to run on both Depot and Blacksmith (shadow runs are continue-on-error and avoid conflicting side effects like artifact uploads / cache saves).
Adds a dedicated ci-blacksmith-shadow.yml workflow for jobs that can’t be safely matrix-expanded (complex/dynamic matrices, side effects).
Adds .github/scripts/compare-ci-runners.py to scrape GitHub Actions job durations and report paired Depot vs Blacksmith statistics.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
.github/workflows/ci-storybook.yml	Matrix-expands Storybook build to add an optional Blacksmith shadow and gates cache/artifact side effects.
.github/workflows/ci-rust.yml	Matrix-expands Rust build/lint; adds a separate Blacksmith-only shadow test job subset.
.github/workflows/ci-rust-flags-integration.yml	Matrix-expands Rust `/flags` integration tests to add an optional Blacksmith shadow.
.github/workflows/ci-python.yml	Matrix-expands Python quality job; gates cache/test-result artifact writes for shadows.
.github/workflows/ci-proto.yml	Matrix-expands proto lint/breaking/codegen checks to add optional Blacksmith shadows.
.github/workflows/ci-nodejs.yml	Matrix-expands Node lint/build and adds a runner dimension to sharded tests for optional Blacksmith shadows.
.github/workflows/ci-mcp.yml	Matrix-expands MCP integration tests to add an optional Blacksmith shadow.
.github/workflows/ci-e2e-playwright.yml	Matrix-expands Playwright job to add an optional Blacksmith shadow and disables shadow side effects (VR, artifacts, comments, deploy).
.github/workflows/ci-dagster.yml	Adds runner dimension to Dagster test matrix and avoids artifact writes for shadows.
.github/workflows/ci-blacksmith-shadow.yml	New workflow containing Blacksmith-only trimmed shadows for complex backend jobs.
.github/workflows/ci-backend.yml	Matrix-expands select backend jobs (repo checks, async migrations) for optional Blacksmith shadows and gates artifact writes.
.github/scripts/compare-ci-runners.py	New CLI to fetch CI job durations via `gh` and compute paired Depot vs Blacksmith medians/p95/speedup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Speedup was silently contaminated by failed runs — a crash-fast job paired against a successful slow job was reported as a speedup. Now: - Speedup uses only success↔success pairs - Per-runner failure counts (x/y) surface runner-specific instability - Mixed-conclusion pairs are reported separately - New "Unpaired runs" table shows where the shadow didn't fire - CSV gains depot_conclusion/blacksmith_conclusion columns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rnegron

approving to unblock!

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deployment-status-posthog · 2026-04-16T16:02:24Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-04-16 16:02 UTC	Run
prod-us	✅ Deployed	2026-04-16 16:12 UTC	Run
prod-eu	✅ Deployed	2026-04-16 16:18 UTC	Run

gantoine changed the title ~~chore(ci): shadow depot jobs on blacksmith for runner trial~~ chore(ci): shadow CI jobs on blacksmith for runner trial Apr 14, 2026

gantoine and others added 2 commits April 14, 2026 19:28

chore(ci): limit GITHUB_TOKEN permissions in rust workflow jobs

cadc8f4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cleanup

f1ee3f5

gantoine marked this pull request as ready for review April 15, 2026 12:51

Copilot AI review requested due to automatic review settings April 15, 2026 12:51

assign-reviewers-posthog Bot requested a review from a team April 15, 2026 12:51

Copilot started reviewing on behalf of gantoine April 15, 2026 12:52 View session

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread .github/scripts/compare-ci-runners.py

Comment thread .github/workflows/ci-e2e-playwright.yml Outdated

Comment thread .github/workflows/ci-e2e-playwright.yml Outdated

Copilot AI reviewed Apr 15, 2026

View reviewed changes

rnegron approved these changes Apr 15, 2026

View reviewed changes

gantoine added 2 commits April 15, 2026 18:09

Merge branch 'master' into georges/blacksmith

5171882

fixes from bot review

0a46cc8

gantoine requested a review from Copilot April 15, 2026 19:22

Copilot started reviewing on behalf of gantoine April 15, 2026 19:22 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread .github/workflows/ci-blacksmith-shadow.yml

Comment thread .github/workflows/ci-blacksmith-shadow.yml Outdated

Comment thread .github/workflows/ci-blacksmith-shadow.yml Outdated

Comment thread .github/scripts/compare-ci-runners.py Outdated

fixes

7aa0940

gantoine merged commit cb1fd0b into master Apr 16, 2026
248 of 251 checks passed

gantoine deleted the georges/blacksmith branch April 16, 2026 15:31

mendral-app Bot mentioned this pull request Apr 20, 2026

fix(ci): add missing Temporal search attributes registration to blacksmith shadow #55388

Merged

gantoine mentioned this pull request Apr 20, 2026

ci: shadow typecheck/unit/integration jobs on Blacksmith PostHog/code#1744

Open

6 tasks

Conversation

gantoine commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

Docs update

🤖 LLM context

Uh oh!

github-actions Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rnegron left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gantoine commented Apr 14, 2026 •

edited

Loading

github-actions Bot commented Apr 15, 2026 •

edited

Loading

deployment-status-posthog Bot commented Apr 16, 2026 •

edited

Loading