Skip to content

Speed up preview deploys and CI: clone payloadcms-dev + cache seeded dev.db & Playwright#1060

Closed
rchlfryn wants to merge 6 commits intochore/upgrade-next-16from
chore/preview-deploy-speedup
Closed

Speed up preview deploys and CI: clone payloadcms-dev + cache seeded dev.db & Playwright#1060
rchlfryn wants to merge 6 commits intochore/upgrade-next-16from
chore/preview-deploy-speedup

Conversation

@rchlfryn
Copy link
Copy Markdown
Collaborator

@rchlfryn rchlfryn commented May 5, 2026

Description

Bundle of CI/deploy speedups, plus consistency cleanups.

Stacked on #1059 (Next 16 upgrade). Will retarget to main after that merges.

Related Issues

N/A. Investigation came out of #1059's CI timing — the preview deploy run for that PR took 11 minutes, of which:

Step Duration
Seed the database 3m 26s
Build Project Artifacts 2m 17s
Deploy to Vercel 2m 30s
Run database migrations 1m 35s
Other ~1m

Key Changes

1. Preview deploys clone payloadcms-dev instead of seeding from scratch

.github/workflows/preview.yaml:

  • turso db create "${name}" --waitturso db create "${name}" --from-db payloadcms-dev --wait. payloadcms-dev is already maintained by development.yaml and sync-prod-to-dev.yml — has all migrations applied and prod-derived data. Cloning is seconds.
  • pnpm seed:standalone step removed entirely — clone has the data.
  • pnpm migrate kept — idempotent. No-op when the PR doesn't add migrations; applies just the new ones when it does.

2. CI caches for the build & e2e jobs (was #1061, merged in)

.github/workflows/ci.yaml:

  • Seeded dev.db cache (~3 min saved on hit). Cache key hashes seed scripts, migrations, payload config, collections, globals, and pnpm-lock.yaml. Applied to both build and e2e jobs (shared key, so they warm each other). Seed step skipped via if: steps.db-cache.outputs.cache-hit != 'true' on a hit.
  • Playwright browser binaries cache (~30-60s saved on hit). Cache key includes pnpm-lock.yaml. We still call playwright install --with-deps so apt system deps stay in sync.

3. pnpm install --frozen-lockfile --ignore-workspace everywhere

Replaced pnpm ii (= pnpm --ignore-workspace install) across all 11 callsites in 6 workflow files: ci.yaml (×7), preview.yaml, development.yaml, production.yaml, dependabot-auto-format.yml, sync-prod-to-dev.yml. Frozen-lockfile skips lockfile reconciliation on CI and prevents accidental lockfile mutations from CI runs. Small per-job speedup, bigger consistency/safety win.

4. Drop Playwright webServer.timeout from 5 min → 90s

playwright.config.ts: was sized for slow webpack dev boots when E2E uses pnpm dev as its server. Next 16's default Turbopack dev boots in ~340ms (per #1059's tests), so 90s is plenty and we now fail fast when the server actually has trouble booting.

5. Cache .next/cache between preview deploys

vercel build runs next build underneath, which uses .next/cache for webpack/SWC module caches and ISR fetch cache. Caching that between preview runs lets subsequent deploys reuse incremental compilation state. Typical Next.js project savings on the build step are 30-60% when the cache hits.

Cache-key hierarchy:

  • Primary: nextjs-cache-<os>-<lockfile-hash>-<sha> (per-commit, always saves fresh)
  • Fallback 1: nextjs-cache-<os>-<lockfile-hash>- (most recent cache, same deps)
  • Fallback 2: nextjs-cache-<os>- (any cache, e.g. when lockfile changes)

6. Shard E2E job by Playwright project (admin / frontend)

ci.yaml's e2e job now uses strategy.matrix to run admin and frontend specs on parallel runners via the existing test:e2e:admin / test:e2e:frontend scripts. Wall-clock for E2E is roughly halved on the test-execution side. fail-fast: false so a failure in one shard doesn't cancel the other.

Important: this changes the GitHub check-run names. The single e2e check becomes two checks: e2e (admin) and e2e (frontend). Branch protection / merge-queue required-status-checks need updating — remove e2e and add both e2e (admin) and e2e (frontend). (One-time settings change.)

Playwright report artifacts are also split: playwright-report-admin and playwright-report-frontend, since matrix jobs can't share the same artifact name.

Expected timing

Preview deploy: ~11m → ~5m

Step Before After
Create DB (now: clone) ~5s ~5s
Run migrations 1m 35s ~5s (no-op for most PRs)
Seed 3m 26s removed
Build (warm .next/cache) 2m 17s ~1m to 1m 30s
Deploy 2m 30s 2m 30s

CI build & e2e jobs: ~3-4m saved on cache hit, plus E2E roughly halves with sharding.

How to test

  • Open a no-op PR after this merges. Confirm preview deploys in ~5-6 min.
  • Verify the deployed site has prod-derived content (real tenants/pages, not synthetic seed data).
  • Open a PR that adds a migration; confirm pnpm migrate runs and applies it on the cloned DB.
  • First preview build is cold (no .next/cache yet). Push a follow-up commit; the second build should be noticeably faster, and the GitHub Actions cache restore step should report a hit.
  • Confirm build/e2e jobs in CI show the seed step as skipped after the cache is warmed.
  • Confirm Playwright chromium download is fast on a second e2e run.
  • Confirm e2e (admin) and e2e (frontend) both appear as separate checks in the merge queue.
  • Update branch protection / required status checks to replace e2e with e2e (admin) + e2e (frontend).

Tradeoffs / risks

  • Data exposure on previews: previews now contain real user accounts and prod-derived content — same as the existing payloadcms-dev deployment. Risk parity with dev.
  • No more bootstrap@avy.com login: the seed creates that test user; cloning from dev/prod doesn't have it. Reviewers/QA log in with real prod accounts.
  • Race window during dev refresh: development.yaml and sync-prod-to-dev.yml do destroy + create --from-db payloadcms-prod. A preview that lands in the ~5-second destroy/recreate window will fail at --from-db. Cost: rerun the workflow.
  • Seed determinism (CI cache): assumes pnpm seed:standalone produces the same DB content for the same inputs. If hidden non-determinism (e.g. Date.now() in a field) causes flakes, the cleanest fix is making the seed deterministic; workaround is expanding the cache key.
  • .next/cache correctness: the cache restoration is keyed by lockfile + commit SHA with restore-keys for fallback. If a stale cache somehow produces an incorrect build (Next/webpack bugs do happen), bumping the lockfile or running a no-op rebase forces a cold build. Cache size is bounded; GitHub keeps caches up to 10 GB and evicts LRU.
  • E2E sharding doubles runner-minute cost (~2× the seat hours per E2E run) for ~50% wall-clock savings. Acceptable since E2E only runs on merge_group, not every PR push.
  • Required-check rename: e2ee2e (admin) + e2e (frontend). Merges may stall until branch protection settings are updated.

Migration Explanation

No application database migrations. CI/deploy-only change.

Future enhancements

  • Bigger CI runners (runs-on: ubuntu-latest-4-cores) — would shrink the build job especially. Cost: more GitHub Actions minutes.
  • Composite action for the setup repo + pnpm + node + install boilerplate (duplicated 11+ times). Pure refactor, no perf change.

🤖 Generated with Claude Code

@rchlfryn rchlfryn force-pushed the chore/preview-deploy-speedup branch from 59f9feb to cdf40b3 Compare May 5, 2026 01:11
@rchlfryn rchlfryn changed the title Speed up preview deploys via Turso template-DB clone Speed up preview deploys by cloning payloadcms-dev May 5, 2026
@rchlfryn rchlfryn force-pushed the chore/upgrade-next-16 branch from 029a927 to d5dcbe1 Compare May 5, 2026 01:33
Each preview previously ran ~5 minutes of migrations + seed-from-scratch
(3m 26s seed + 1m 35s migrations) on a brand-new Turso database. The
existing payloadcms-dev DB is already maintained by development.yaml
(every push to main) and sync-prod-to-dev.yml (nightly) — it has prod
data, all migrations applied, and is always available. Cloning from it
takes seconds.

Changes:
- preview.yaml: turso db create now passes --from-db payloadcms-dev so
  the new preview DB starts as a copy of dev instead of empty.
- The seed step is removed (cloned DB already has data).
- pnpm migrate is kept — runs idempotently on the clone, only applies
  migrations the PR itself adds.
- pnpm ii → pnpm install --frozen-lockfile --ignore-workspace, skipping
  lockfile reconciliation on CI.

Risk parity: previews now contain the same prod-derived data that the
dev environment already exposes. Reviewers log in with their real prod
accounts (the seed's bootstrap@avy.com isn't present in dev/prod).

Expected: preview deploys drop from ~11 min to ~6 min.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rchlfryn rchlfryn force-pushed the chore/preview-deploy-speedup branch from cdf40b3 to 26e17a5 Compare May 5, 2026 01:34
rchlfryn and others added 2 commits May 4, 2026 18:34
Two GitHub Actions caches that target the slowest non-build steps in
the build and e2e jobs:

- Seeded dev.db cache (~3 minutes saved). Cache key hashes the seed
  scripts, migrations, payload config, collections, globals, and the
  lockfile — so any change that would alter the seed output evicts the
  cache. Otherwise, the seed step is skipped via a conditional `if`.
  Applied to both `build` and `e2e` jobs (they share the same key, so
  whichever runs first warms the cache for the other).
- Playwright browser binaries cache (~30-60s saved). Cache key includes
  pnpm-lock.yaml so a `@playwright/test` bump invalidates it. We still
  call `playwright install --with-deps` to keep apt system deps in
  sync.

Cache misses (seed-related files changed) fall back to the original
behavior — full seed and full chromium download.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cache seeded dev.db and Playwright browsers in CI
@rchlfryn rchlfryn changed the title Speed up preview deploys by cloning payloadcms-dev Speed up preview deploys and CI: clone payloadcms-dev + cache seeded dev.db & Playwright May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

…meout

- Replace `pnpm ii` (= `pnpm --ignore-workspace install`) with
  `pnpm install --frozen-lockfile --ignore-workspace` across all
  remaining workflows (ci.yaml ×7, development, production,
  dependabot-auto-format, sync-prod-to-dev). preview.yaml was already
  changed earlier in this PR. Frozen-lockfile skips lockfile
  reconciliation on CI and prevents accidental lockfile mutations
  from CI runs.
- playwright.config.ts: drop webServer.timeout from 300000ms (5 min)
  to 90000ms (90s). Was sized for slow webpack dev boots; Next 16's
  default Turbopack dev boots in ~340ms, so 90s is plenty and we now
  fail fast when the server actually has trouble booting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`vercel build` runs `next build` underneath, which uses .next/cache for
webpack/SWC module caches and ISR fetch cache. Caching this between
runs lets subsequent preview deploys reuse incremental compilation
state, typically shrinking the build step by 30-60%.

Cache key hierarchy:
- Primary: nextjs-cache-<os>-<lockfile-hash>-<sha> (per-commit, always
  saves a fresh entry)
- Fallback 1: nextjs-cache-<os>-<lockfile-hash>- (most recent cache
  with the same dependency tree)
- Fallback 2: nextjs-cache-<os>- (any cache for this OS, when the
  lockfile changes)

Cache size is typically 100-300 MB; well within GitHub's 10 GB repo
limit. Cache misses fall back to today's behavior (cold build).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the e2e job across two parallel runners using strategy.matrix on
the Playwright project name. Each shard runs only its own specs via the
existing `pnpm test:e2e:admin` / `pnpm test:e2e:frontend` scripts.

Wall-clock for E2E is roughly halved on the test-execution side.
Setup duplicates across shards, but the seed and Playwright caches
(also added in this PR) cover most of it — main duplicated cost is
the build (~2m) which we'd pay once before sharding anyway.

`fail-fast: false` so a failure in one shard doesn't cancel the
other — we want both reports.

Note for merge-queue setup: both `e2e (admin)` and `e2e (frontend)`
need to be added as required status checks if the old `e2e` check
was required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rchlfryn
Copy link
Copy Markdown
Collaborator Author

rchlfryn commented May 5, 2026

Closing because we don't need to spend time on this at the moment

@rchlfryn rchlfryn closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant