Skip to content

Make the test suite parallel-safe and speed up CI#1273

Merged
RhysSullivan merged 15 commits into
mainfrom
ci/parallel-safe-tests
Jul 2, 2026
Merged

Make the test suite parallel-safe and speed up CI#1273
RhysSullivan merged 15 commits into
mainfrom
ci/parallel-safe-tests

Conversation

@RhysSullivan

Copy link
Copy Markdown
Owner

CI on main has been red on roughly half of recent runs, all from load-dependent flakes rather than product bugs. This reworks the test infrastructure so suites run reliably in parallel, and adds caching so runs are faster.

Flakes fixed

  • host-selfhost cluster timeouts (the dominant failure): the six integration files booted a full app graph at module load while turbo ran every package's vitest concurrently, starving 2-core runners. Boots now happen in beforeAll, the package's files run serially (fileParallelism: false), and the CI Test job caps turbo to --concurrency=3 via TURBO_TEST_CONCURRENCY (local dev unaffected). Test budget sized for a loaded runner, since scope-isolation fans out dozens of concurrent requests.
  • cloud db.test.ts ECONNRESET: the per-scope DB teardown fire-and-forgot sql.end(), so an old connection's teardown raced the next test's connect against the single-connection PGlite socket server. Teardown is now awaited in the finalizer.
  • sdk oauth test-server races: makeTestHttpServer could hand back a server before the socket reliably accepted under load. It now probes readiness with a raw TCP connect (invisible to request-recording fixtures) before returning.
  • graphql plugin introspection assertions: the request recorder is eventually consistent with connect; tests now poll for the recorded introspection request instead of asserting immediately.
  • stdio-MCP e2e boot timeout: more headroom for the cold vite optimizeDeps boot, keeping the boot-wait < test-timeout gap so the boot diagnostic still surfaces.

CI changes

  • Push-to-main runs get unique concurrency groups (PRs keep cancel-in-progress), so rapid merges no longer cancel main's verdict — this previously let a lint failure land unnoticed.
  • Caching: bun package cache in every job, Playwright browsers in e2e jobs, GHA layer cache for the self-host Docker image.

Out of scope: the cloud dev-server SSE/OTel memory growth behind the e2e shard degradation (tracked separately) and shard rebalancing.

Verified with typecheck, lint, and repeated forced full-suite runs (turbo cache disabled); the suite is green back-to-back where it previously failed most forced runs.

@pkg-pr-new

pkg-pr-new Bot commented Jul 2, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1273

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1273

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1273

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1273

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1273

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1273

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1273

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1273

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1273

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1273

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1273

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1273

executor

npm i https://pkg.pr.new/executor@1273

commit: b0ed1da

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a cluster of load-dependent CI flakes and adds caching to speed up runs. The root causes are each addressed with a targeted, minimal change: heavy app boots moved to beforeAll, DB teardown now awaited, HTTP test servers probed at the TCP level before returning, and GraphQL introspection assertions converted to polling.

  • Boot serialization: all six host-selfhost integration files now boot their app graph in beforeAll rather than at module load; the vitest config switches to fileParallelism: false + maxWorkers: 1 with 120s/60s test and hook budgets, and TURBO_TEST_CONCURRENCY=3 caps turbo concurrency on CI runners.
  • Infrastructure fixes: sql.end() is now awaited in the DB finalizer (fixes ECONNRESET), makeTestHttpServer probes TCP readiness before returning (fixes OAuth server races), and GraphQL introspection assertions poll via waitForRecordedRequests (fixes eventually-consistent Yoga recorder).
  • isAsyncResultLoading semantic change: waiting states that carry a stale value are no longer treated as loading, enabling stale-while-revalidate rendering; the existing test suite is updated to cover both branches.

Confidence Score: 5/5

Safe to merge; all changes are scoped to test infrastructure, CI configuration, and one well-tested behavioral tweak to isAsyncResultLoading.

Every fix addresses a documented, specific root cause with a matching test or structural guard. The db.ts and async-result.ts changes are the only production-code modifications: the former removes a fire-and-forget that was already wrapped in Effect.ignore, and the latter has explicit before/after test coverage for both branches of the new condition. No auth, data, or request-path logic is touched.

e2e/cloud/auth-routing-flow.test.ts contains a page.waitForTimeout(250) sleep; otherwise no files require special attention.

Important Files Changed

Filename Overview
.github/workflows/ci.yml Adds Bun package cache, Playwright browser cache, GHA Docker layer cache, per-SHA concurrency group for push-to-main, and TURBO_TEST_CONCURRENCY=3 cap for the Test job.
apps/cloud/src/db/db.ts Removes the fire-and-forget fork of sql.end() and awaits it directly; fixes the ECONNRESET race between per-scope DB teardown and the next test's connection.
apps/host-selfhost/vitest.config.ts Replaces maxForks:2 parallelism cap with fileParallelism:false + maxWorkers:1 (fully serial), raises testTimeout to 120s and hookTimeout to 60s to accommodate loaded-runner boots.
packages/core/sdk/src/testing.ts Adds a TCP-level readiness probe (up to 100 retries at 10 ms) after the HTTP server binds, before returning the test server shape; prevents connect failures on loaded runners.
packages/plugins/graphql/src/testing/index.ts Adds waitForRecordedRequests helper that polls the Ref-backed request log until a predicate matches or 100×50ms exhausts; addresses eventual consistency of Yoga's async captureRequest path.
packages/react/src/lib/async-result.ts Changes isAsyncResultLoading so waiting+value is not loading (stale-while-revalidate); only waiting-without-value and initial states are treated as loading. Test coverage updated to match.
e2e/cloud/auth-routing-flow.test.ts Adds networkidle wait + retry fill to guard against hydration clearing the org-name input; includes a page.waitForTimeout(250) sleep that violates the no-sleep policy.
apps/host-selfhost/src/boot.test.ts Moves app-graph construction from module-level top-level await to beforeAll, preventing multiple concurrent heavy boots at module load time.
package.json Threads TURBO_TEST_CONCURRENCY into the turbo --concurrency flag via ${VAR:+expansion}; unset locally leaves behavior unchanged, CI sets it to 3.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Before["Before (flaky)"]
        B1["Module load: top-level await\nmakeSelfHostTestApp / makeSelfHostApiHandler\n× 6 files in parallel"]
        B2["All 6 heavy boots run concurrently\nat module-evaluation time"]
        B3["CPU starvation on 2-core runner\n→ in-flight requests stall → timeout"]
        B1 --> B2 --> B3
    end

    subgraph After["After (stable)"]
        A1["Module load: declare handler/dispose\nvariables only (no I/O)"]
        A2["beforeAll: import + boot\none file at a time\n(fileParallelism: false, maxWorkers: 1)"]
        A3["Tests run with 120s budget\nsized for loaded runner"]
        A4["afterAll: dispose()"]
        A1 --> A2 --> A3 --> A4
    end

    subgraph SupportingFixes["Supporting fixes"]
        S1["makeTestHttpServer:\nTCP probe before returning\n(100 × 10ms retries)"]
        S2["db.ts close():\nawait sql.end() — no more\nECONNRESET race"]
        S3["GraphQL plugin tests:\nwaitForRecordedRequests polling\n(100 × 50ms) for eventual-\nconsistent Yoga recorder"]
        S4["CI workflow:\nper-SHA concurrency group for push,\nbun + playwright cache,\nTURBO_TEST_CONCURRENCY=3"]
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    subgraph Before["Before (flaky)"]
        B1["Module load: top-level await\nmakeSelfHostTestApp / makeSelfHostApiHandler\n× 6 files in parallel"]
        B2["All 6 heavy boots run concurrently\nat module-evaluation time"]
        B3["CPU starvation on 2-core runner\n→ in-flight requests stall → timeout"]
        B1 --> B2 --> B3
    end

    subgraph After["After (stable)"]
        A1["Module load: declare handler/dispose\nvariables only (no I/O)"]
        A2["beforeAll: import + boot\none file at a time\n(fileParallelism: false, maxWorkers: 1)"]
        A3["Tests run with 120s budget\nsized for loaded runner"]
        A4["afterAll: dispose()"]
        A1 --> A2 --> A3 --> A4
    end

    subgraph SupportingFixes["Supporting fixes"]
        S1["makeTestHttpServer:\nTCP probe before returning\n(100 × 10ms retries)"]
        S2["db.ts close():\nawait sql.end() — no more\nECONNRESET race"]
        S3["GraphQL plugin tests:\nwaitForRecordedRequests polling\n(100 × 50ms) for eventual-\nconsistent Yoga recorder"]
        S4["CI workflow:\nper-SHA concurrency group for push,\nbun + playwright cache,\nTURBO_TEST_CONCURRENCY=3"]
    end
Loading

Reviews (3): Last reviewed commit: "Match the new sign-in heading in the sel..." | Re-trigger Greptile

Comment thread .github/workflows/ci.yml
Comment on lines +158 to 166
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-1.60.0

# Install from e2e so bunx resolves ITS pinned playwright (the version
# the tests run against) rather than floating to the latest.
- name: Install Playwright Chromium

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The Playwright browser cache key is hardcoded to 1.60.0. If the Playwright version is bumped in bun.lock without updating this string, CI will continue serving the old browser binaries from cache, which can cause subtle test failures (new features or fix behaviour tied to the new version absent, or ABI mismatches). Deriving the key from the lockfile keeps it automatically in sync.

Suggested change
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-1.60.0
# Install from e2e so bunx resolves ITS pinned playwright (the version
# the tests run against) rather than floating to the latest.
- name: Install Playwright Chromium
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-${{ hashFiles('e2e/bun.lock', 'bun.lock') }}
# Install from e2e so bunx resolves ITS pinned playwright (the version
# the tests run against) rather than floating to the latest.
- name: Install Playwright Chromium

Comment on lines +116 to +122
yield* Effect.callback<void, TestHttpServerServeError>((resume) => {
const socket = createConnection({ host: "127.0.0.1", port: address.port }, () => {
socket.end();
resume(Effect.void);
});
socket.on("error", (cause) => resume(Effect.fail(new TestHttpServerServeError({ cause }))));
}).pipe(Effect.retry(Schedule.both(Schedule.spaced("10 millis"), Schedule.recurs(100))));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 TCP probe socket not cleaned up on interruption

Effect.callback without a returned cleanup function means if the outer fiber is interrupted mid-probe (e.g., a test times out while the server is still booting), the in-flight createConnection socket is not destroyed. On a loaded CI runner this leaves dangling half-open sockets for the duration of the connect timeout. The fix is to return a cleanup from the callback that calls socket.destroy().

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Torn down — the PR is closed.

@RhysSullivan RhysSullivan merged commit 8652c99 into main Jul 2, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant