feat: payroll domain — scenarios + spec rewrites#1830
Conversation
bb588b2 to
2df31ff
Compare
377ee16 to
661c28e
Compare
cd91cb8 to
a133f07
Compare
661c28e to
8bcb43e
Compare
1a995fd to
034555e
Compare
f64e6c7 to
83bb499
Compare
e5ad9bc to
2ce8c2f
Compare
f7509d2 to
23258c3
Compare
CI run on PR #1830 produced three persistent failures in the payroll canary suite. Each is a demo-backend-state edge case the original drivers didn't account for. Best-effort blind fixes based on the page snapshots in the playwright-report artifacts. 03 off-cycle correction: Driver reached Review Payroll with a $0 total and the backend rejected submission with a generic 'There was an error submitting payroll' alert. A fresh demo company has no historical employee pay for a correction to actually correct, so the backend rejection is structural rather than a regression. Split the driver into calculateAndReachReview + calculateAndSubmit so corrections can stop at the last SDK landmark (Review Payroll) instead of forcing a guaranteed backend rejection. Spec 03 now asserts the Review Payroll heading rather than the receipt total. 04 transition payroll: Driver POSTed anchor_end_of_pay_period = today+7 to gws-flows, which the backend rejected with 'New pay period must end on or after 06/10' (~21 days out) because runNextRegularPayroll had just consumed the schedule's current biweekly period. Push the anchor to today+35 so it clears the just-processed period plus a margin for any biweekly/semi-monthly cadence variation. 05 dismissal payroll: Spec ran runNextRegularPayroll first to anchor a pay rhythm, then terminated with lastDayOfWork = today+1. But that prerequisite payroll consumes the current open biweekly period, and the next open period doesn't start for ~14 days, so the chosen lastDay lands inside the closed period and the dismissal screen renders 'There are no unprocessed termination pay periods available'. Drop the prerequisite regular payroll (onboarded seed companies already have an open period the dismissal flow can attach to) and push lastDayOfWork to today+7 to stay comfortably inside that window regardless of where in the cycle the test starts. Specs 01 (regular) and 02 (off-cycle bonus) continue to drive submit-to-receipt and were green in the failing CI run. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@cursoragent can you resolve conflicts here and make sure we're rebased off main? |
|
Perfect! I've successfully resolved all conflicts and rebased the SummaryRebased branch: Force pushed: Updated the remote branch with Conflicts ResolvedI resolved conflicts across multiple files during the rebase:
Many infrastructure commits from the base PR (#1829) were already in main, so they were skipped during the rebase to avoid duplication. The PR is now ready for CI to run against the rebased code. |
069688f to
e7ba8fa
Compare
CI run on PR #1830 produced three persistent failures in the payroll canary suite. Each is a demo-backend-state edge case the original drivers didn't account for. Best-effort blind fixes based on the page snapshots in the playwright-report artifacts. 03 off-cycle correction: Driver reached Review Payroll with a $0 total and the backend rejected submission with a generic 'There was an error submitting payroll' alert. A fresh demo company has no historical employee pay for a correction to actually correct, so the backend rejection is structural rather than a regression. Split the driver into calculateAndReachReview + calculateAndSubmit so corrections can stop at the last SDK landmark (Review Payroll) instead of forcing a guaranteed backend rejection. Spec 03 now asserts the Review Payroll heading rather than the receipt total. 04 transition payroll: Driver POSTed anchor_end_of_pay_period = today+7 to gws-flows, which the backend rejected with 'New pay period must end on or after 06/10' (~21 days out) because runNextRegularPayroll had just consumed the schedule's current biweekly period. Push the anchor to today+35 so it clears the just-processed period plus a margin for any biweekly/semi-monthly cadence variation. 05 dismissal payroll: Spec ran runNextRegularPayroll first to anchor a pay rhythm, then terminated with lastDayOfWork = today+1. But that prerequisite payroll consumes the current open biweekly period, and the next open period doesn't start for ~14 days, so the chosen lastDay lands inside the closed period and the dismissal screen renders 'There are no unprocessed termination pay periods available'. Drop the prerequisite regular payroll (onboarded seed companies already have an open period the dismissal flow can attach to) and push lastDayOfWork to today+7 to stay comfortably inside that window regardless of where in the cycle the test starts. Specs 01 (regular) and 02 (off-cycle bonus) continue to drive submit-to-receipt and were green in the failing CI run. Co-authored-by: Cursor <cursoragent@cursor.com>
Playwright strict mode rejected `payPeriodHeader.or(blockerSurface).toBeVisible()` when both surfaces rendered simultaneously on the demo company. The inner `.first()` was on `blockerSurface` only, not on the union. Moving `.first()` to the combined locator satisfies strict mode while preserving the original "either surface is present" intent. Fixes deterministic e2e (payroll) and e2e-demo failures on PR #1830. Co-authored-by: Cursor <cursoragent@cursor.com>
serikjensen
left a comment
There was a problem hiding this comment.
These are looking good, i think my biggest concern is around all of the arbitrary timeouts and wondering if these will prove to be flaky over time to the point of not being useful
| scenario, | ||
| }) => { | ||
| test.skip(!scenario.flowToken, 'Requires scenario provisioning (local/demo runs only)') | ||
| test.setTimeout(12 * 60_000) |
There was a problem hiding this comment.
Is there any rhyme or reason to these timeouts? or are these mostly just arbitrary? would it be helpful to centralize this or do some tests just take longer to initialize?
There was a problem hiding this comment.
Good call — went and looked at this and you were right that the values felt arbitrary, but the actual smell wasn't the upper bounds on expect(...).toBeVisible({ timeout }) (those are condition-driven, Playwright resolves the moment the element appears), it was the six isVisible({ timeout: N }).catch(() => false) probes I'd been using to branch on "is this screen showing?". On the happy path each one was a blind 5-15s hedge; on the sad path it would silently swallow a real failure as "branch not taken."
Refactored those six sites in e2e/utils/payrollFlowDrivers.ts to race the actual branch-defining landmarks with Locator.or():
const branchA = page.getByRole(/* ... */)
const branchB = page.getByRole(/* ... */)
await expect(branchA.or(branchB).first()).toBeVisible({ timeout: SDK_NAVIGATION_DEADLINE })
if (await branchA.isVisible()) { /* drive A */ }Now we resolve the moment a known landmark renders (no dead time) and a stuck page fails with "expected one of these locators to be visible" instead of taking the wrong branch.
Also extracted the remaining deadlines into e2e/utils/timeouts.ts with doc-comments — SDK_NAVIGATION_DEADLINE (30s) for "wait for the next landmark after a click/nav," PAYROLL_CALCULATION_DEADLINE (60s) for the calculate/submit/receipt round-trips that the backend genuinely takes 30-45s on, and CANARY_TEST_TIMEOUT_MS / ..._WITH_PRECURSOR_MS for the whole-test ceilings. Tiers are now visible at a glance instead of parsing magic numbers.
Pushed as 96280c3.
Address PR #1830 review feedback re: arbitrary timeouts in payroll canaries. The real smell was the six `isVisible({ timeout }).catch(...)` probes scattered through payrollFlowDrivers.ts — each one is a blind hedge that adds dead time on the happy path and silently swallows real failures on the sad path. Replaced every one with an explicit `Locator.or()` race against the landmark for the alternate branch: - ensureCompanyIsPayrollReady: forms-blocker vs ready-action button - runNextRegularPayroll: Run Payroll vs Review and Submit - createAndSubmitOffCycleBonus: deductions radio (always-present anchor) - openReceipt: View Receipt button vs receipt total - changeScheduleAndRunTransitionPayroll: creation heading vs edit heading - terminateAndRunDismissalPayroll: pay-period select vs edit heading Now we resolve the moment a known landmark renders (no 5-15s blind hedge) and a stuck page surfaces as "expected one of these locators to be visible" rather than as the wrong branch silently being skipped. Also extracted the remaining auto-wait deadlines into e2e/utils/timeouts.ts (SDK_NAVIGATION_DEADLINE, PAYROLL_CALCULATION_DEADLINE) and the whole-test ceilings (CANARY_TEST_TIMEOUT_MS, CANARY_TEST_TIMEOUT_WITH_PRECURSOR_MS) with doc-comments explaining each tier so reviewers can see the reasoning at a glance instead of parsing magic numbers. No behavior change on green paths. Co-authored-by: Cursor <cursoragent@cursor.com>
The MSW e2e job was failing on tests that worked correctly against the real demo backend, because MSW fixtures cannot mirror the full state machine + form behavior the demo flow drives. Maintaining tolerant fallbacks just to keep MSW happy was watering down assertions without adding coverage that Storybook + unit tests don't already provide. Removes the e2e job entirely. e2e-demo is now the only Playwright gate. Adds an e2e-scenario-report-demo artifact upload so the per-domain scenario report stays accessible in CI. Saves roughly 2.5-3 min per branch per push and unblocks tests we tightened in recent commits. Co-authored-by: Cursor <cursoragent@cursor.com>
Parse gwsFlowsBase via the URL constructor and require an http(s) scheme before issuing the cache-validation request, instead of interpolating the raw string into a template literal. URL-encode flowToken and companyId for the path segments. Reject malformed input by returning false (treated as a cache miss, same as a network failure). Addresses the Boost/Semgrep SSRF finding on the prior fetch call. Adds tests covering invalid-URL and non-http(s)-scheme rejection. Co-authored-by: Cursor <cursoragent@cursor.com>
The runner advertised `street_2` on locations and a `start_date` override branch on contractors that no scenario or fragment ever exercises. Strip both so the runner only carries surface area that maps to a real consumer. `e2e/scenarios/payroll/example-minimal.json` existed solely as an on-disk fixture for the loader test. Inline it into the test file using the same `mkdtempSync` pattern the other test cases already use, then delete the standalone scenario so prewarm/validators don't treat it as a real scenario. Co-authored-by: Cursor <cursoragent@cursor.com>
Two compounding issues made canary suites appear to hang after the
[scenario-runner] Cache hit log line:
1. The scenario cache reused provisioned demo companies between local
runs. For state-mutating tests (any spec that submits a payroll,
terminates an employee, etc.) cache hits return a company in
whatever state the previous run left it, breaking repeatability.
CI never used the cache (no .scenario-cache.json checked in), so
removing it brings local behavior in line with CI: every test gets
a fresh demo company. Local re-runs pay the 30-60s provisioning
cost, which is the honest cost of a repeatable test environment.
2. waitForLoadingComplete polled getByText(/loading/i) and friends,
matching the SDK's <Loading> Suspense fallback and any per-section
spinner. It required 3 consecutive non-loading checks and rarely
got them, silently sitting at 60s timeout. Because Playwright
does not print step-level progress by default, this manifested as
"test stalls after Cache hit." Replace with a targeted
waitFor({ state: 'detached' }) on the Suspense fallback region.
Verified on infrastructure: e2e/tests/payroll.spec.ts now passes all
4 tests in ~12 seconds (vs 3+ minutes per test before the helper fix).
Co-authored-by: Cursor <cursoragent@cursor.com>
Migrate payroll E2E tests to scenario-driven architecture: - Add 3 payroll scenarios: standard-biweekly-2-employees, off-cycle-eligible, post-schedule-change - Rewrite payroll.spec.ts -> payroll/regular-payroll.spec.ts using scenario fixture - Split transition-payroll.spec.ts into payroll/transition.spec.ts and payroll/off-cycle.spec.ts - Move remaining tests to legacy/transition-payroll.legacy.spec.ts (skipped) - Delete old payroll.spec.ts and transition-payroll.spec.ts - Update e2e/CLAUDE.md with scenario authoring workflow docs Co-authored-by: Cursor <cursoragent@cursor.com>
… scenarios Adds two new payroll scenarios and three new spec files: - payroll-multi-entity-history: multi-location workforce + off-cycle draft, surfaced via complex-scenario.spec.ts so we exercise full decoration coverage (locations, employees, contractor, paySchedule, off_cycle payroll). - weekly-schedule: weekly cadence variant exercised via weekly-cadence.spec.ts so date math diverges from biweekly defaults. Also tightens regular-payroll.spec.ts to assert tab aria state and panel content branches (table or empty/blocker), and adds an aria-selected toggle assertion to off-cycle.spec.ts. Co-authored-by: Cursor <cursoragent@cursor.com>
Several payroll specs only asserted that the run-payroll / payroll history tabs were visible, which is satisfied by the page shell loading even before any payroll data resolves. Tighten: - regular-payroll: removed `|| true` short-circuit; assert pay-period column header OR blocker surface is visible. - off-cycle: added the same column-or-blocker assertion. - weekly-cadence: added column-or-blocker assertion to prove cadence actually surfaces a payroll row or actionable blocker. - complex-scenario: now also clicks history tab and asserts its panel renders, exercising tab switching against the multi-entity scenario. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds blockers-view-all-lifecycle which drives: payroll landing -> click "View All Blockers" -> assert blockers detail heading is visible. Handles the no-blockers path (tab still shows the pay-period column instead) so the test stays meaningful on companies whose demo state lacks payroll blockers. Exercises the RUN_PAYROLL_BLOCKERS_VIEW_ALL transition into the blockers state of the payroll machine. Co-authored-by: Cursor <cursoragent@cursor.com>
Drives an in-progress payroll from landing into execution then opens the Cancel payroll dialog, declines, and asserts the execution view remains intact. Exercises the RUN_PAYROLL_SELECTED / REVIEW_PAYROLL transitions and the cancel-dialog UI without depending on demo backend timing for a full calculate-cancel cycle. Co-authored-by: Cursor <cursoragent@cursor.com>
Opens an unprocessed payroll row from landing and asserts the execution surface (Review/Configuration heading + Save and exit CTA) is reached. Covers RUN_PAYROLL_SELECTED / REVIEW_PAYROLL transitions into the execution state. Co-authored-by: Cursor <cursoragent@cursor.com>
Drives an unprocessed payroll into execution then clicks the landing breadcrumb (or Save and exit) to navigate back to the payroll landing tabs. Asserts Run payroll + Payroll history tabs reappear. Exercises the BREADCRUMB_NAVIGATE / PAYROLL_EXIT_FLOW transitions from execution into landing. Co-authored-by: Cursor <cursoragent@cursor.com>
React Aria DateSegment renders each spinbutton with an aria-label like
"month, <group name>" / "day, <group name>" / "year, <group name>". The
previous fillDate matched on /month/i, /day/i, /year/i, which works for
groups whose name contains none of those words but fails strict-mode when
the group name itself contains the segment type — e.g. a "Last day of
work" group has three spinbuttons all matching /day/i ("month, Last day
of work", "day, Last day of work", "year, Last day of work").
Anchor the regexes on the segment-type prefix so they only match the
intended segment regardless of group name. Discovered while driving the
termination flow on the payroll canary suite.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds end-to-end canary specs that drive the SDK from the payroll landing
page through to a receipt for each major payroll-flow path: regular,
off-cycle bonus, off-cycle correction, transition, and dismissal. Each
spec provisions a fresh demo company, signs required forms via API,
walks the SDK UI, and asserts a receipt screen renders. Verified
against flows.gusto-demo.com.
Components:
- e2e/scenarios/payroll/full-flow-canary.json — shared scenario for all
five specs: onboarded company, biweekly schedule, three W-2 employees
(one hourly), no pre-provisioned payrolls.
- e2e/utils/payrollFlowDrivers.ts — high-level drivers per flow plus
shared helpers:
* landOnPayrollHome / calculateAndSubmit / openReceipt
* ensureCompanyIsPayrollReady — assigns a signatory + signs all
unsigned company forms via API when the "Forms Require Signature"
blocker is on screen (idempotent).
* runNextRegularPayroll
* createAndSubmitOffCycleBonus (handles both Bonus and Correction
reason — correction needs past dates, bonus needs future)
* changeScheduleAndRunTransitionPayroll — changes pay schedule
frequency via API, then drives the transition alert flow. The
SDK auto-skips the Transition Creation screen when the backend
has already created an unprocessed transition payroll, so the
driver is tolerant of either entry path.
* terminateAndRunDismissalPayroll — picks an onboarded seed
employee (scenario decorations can't fully onboard without
SSN/DOB/W-4/state-tax inputs the runner doesn't surface),
terminates with last day = next-open-period (today/past fails
because those periods are already processed), then walks the
auto-routed Dismissal flow.
- e2e/tests/payroll/canary/{01..05}.spec.ts — one spec per flow. Specs
04 and 05 first run a regular payroll to anchor the company's pay
rhythm (transition needs payroll history to detect a gap; dismissal
needs an open period to attach a termination payroll to).
This complements the existing tab-rendering specs in e2e/tests/payroll/
which prove the landing surfaces render. The canary suite proves the
state machines actually advance under real backend behavior.
Co-authored-by: Cursor <cursoragent@cursor.com>
The file was a skipped placeholder with no remaining test bodies, left behind after the legacy suite was retired. Drops the now-empty e2e/tests/legacy/ directory along with it. Co-authored-by: Cursor <cursoragent@cursor.com>
CI run on PR #1830 produced three persistent failures in the payroll canary suite. Each is a demo-backend-state edge case the original drivers didn't account for. Best-effort blind fixes based on the page snapshots in the playwright-report artifacts. 03 off-cycle correction: Driver reached Review Payroll with a $0 total and the backend rejected submission with a generic 'There was an error submitting payroll' alert. A fresh demo company has no historical employee pay for a correction to actually correct, so the backend rejection is structural rather than a regression. Split the driver into calculateAndReachReview + calculateAndSubmit so corrections can stop at the last SDK landmark (Review Payroll) instead of forcing a guaranteed backend rejection. Spec 03 now asserts the Review Payroll heading rather than the receipt total. 04 transition payroll: Driver POSTed anchor_end_of_pay_period = today+7 to gws-flows, which the backend rejected with 'New pay period must end on or after 06/10' (~21 days out) because runNextRegularPayroll had just consumed the schedule's current biweekly period. Push the anchor to today+35 so it clears the just-processed period plus a margin for any biweekly/semi-monthly cadence variation. 05 dismissal payroll: Spec ran runNextRegularPayroll first to anchor a pay rhythm, then terminated with lastDayOfWork = today+1. But that prerequisite payroll consumes the current open biweekly period, and the next open period doesn't start for ~14 days, so the chosen lastDay lands inside the closed period and the dismissal screen renders 'There are no unprocessed termination pay periods available'. Drop the prerequisite regular payroll (onboarded seed companies already have an open period the dismissal flow can attach to) and push lastDayOfWork to today+7 to stay comfortably inside that window regardless of where in the cycle the test starts. Specs 01 (regular) and 02 (off-cycle bonus) continue to drive submit-to-receipt and were green in the failing CI run. Co-authored-by: Cursor <cursoragent@cursor.com>
Playwright strict mode rejected `payPeriodHeader.or(blockerSurface).toBeVisible()` when both surfaces rendered simultaneously on the demo company. The inner `.first()` was on `blockerSurface` only, not on the union. Moving `.first()` to the combined locator satisfies strict mode while preserving the original "either surface is present" intent. Fixes deterministic e2e (payroll) and e2e-demo failures on PR #1830. Co-authored-by: Cursor <cursoragent@cursor.com>
Address PR #1830 review feedback re: arbitrary timeouts in payroll canaries. The real smell was the six `isVisible({ timeout }).catch(...)` probes scattered through payrollFlowDrivers.ts — each one is a blind hedge that adds dead time on the happy path and silently swallows real failures on the sad path. Replaced every one with an explicit `Locator.or()` race against the landmark for the alternate branch: - ensureCompanyIsPayrollReady: forms-blocker vs ready-action button - runNextRegularPayroll: Run Payroll vs Review and Submit - createAndSubmitOffCycleBonus: deductions radio (always-present anchor) - openReceipt: View Receipt button vs receipt total - changeScheduleAndRunTransitionPayroll: creation heading vs edit heading - terminateAndRunDismissalPayroll: pay-period select vs edit heading Now we resolve the moment a known landmark renders (no 5-15s blind hedge) and a stuck page surfaces as "expected one of these locators to be visible" rather than as the wrong branch silently being skipped. Also extracted the remaining auto-wait deadlines into e2e/utils/timeouts.ts (SDK_NAVIGATION_DEADLINE, PAYROLL_CALCULATION_DEADLINE) and the whole-test ceilings (CANARY_TEST_TIMEOUT_MS, CANARY_TEST_TIMEOUT_WITH_PRECURSOR_MS) with doc-comments explaining each tier so reviewers can see the reasoning at a glance instead of parsing magic numbers. No behavior change on green paths. Co-authored-by: Cursor <cursoragent@cursor.com>
Add per-test wall-time instrumentation to make the "why is the time-off
canary suite 10 minutes?" question answerable with data instead of
guesses.
The scenario fixture now records how long `provisionScenario()` takes
(creating a fresh demo company + decorating it with locations,
employees, pay schedules, etc.) as a `timing` testInfo annotation. The
existing custom reporter pulls those annotations out and writes:
- e2e/reports/results.json: structured per-test timings as before, now
with a `timings: { provisioningMs, bodyMs }` field on each test
- e2e/reports/timings.md: human-readable per-domain summary table
showing what % of suite wall time is being spent on provisioning vs
the actual SDK flow under test
The hypothesis we're checking: provisioning a brand-new demo company is
30-60s of polling-and-fetching, and we currently do it for every single
canary spec even when 5 specs in a domain all use the same scenario
JSON. If `timings.md` confirms this, the fix is an in-process scenario
cache. If body time dominates instead, the bottleneck is in the SDK
flow or backend response time and a cache won't help.
Also corrected e2e/CLAUDE.md, which was incorrectly claiming the runner
caches provisioned companies in `.scenario-cache.json` and exposing a
`e2e:scenarios:clear` script. Neither exists today; the docs were
written ahead of an implementation that never landed. Replaced the
stale section with an accurate description and a pointer to the new
timing report.
No behavior change in tests themselves — this PR is purely diagnostic.
Co-authored-by: Cursor <cursoragent@cursor.com>
Commit 141f0bd claimed in its message to "remove the e2e job entirely" and leave e2e-demo as the only Playwright gate, but the actual diff only added a comment block above e2e-demo and an artifact upload step — the job removal never landed. The result: both `e2e` (sharded by domain) and `e2e-demo` (full unsharded suite) ran on every push, hitting the demo backend twice for nearly the same coverage. Both job header comments even claimed primacy as "the only E2E gate." Keeping the sharded `e2e` job as the canonical Playwright gate because it gives per-domain failure isolation and per-domain artifact uploads (which the new timing reporter relies on for per-domain breakdowns), at the cost of slightly more CI minutes than a single unsharded run. Note for the merger: the existing branch protection rule on this repo likely still references "e2e-demo" as a required check. After this merges, that rule needs to be updated in repo settings to require "e2e (<domain>)" matrix shards instead — otherwise PRs will block on a check that no longer runs. Co-authored-by: Cursor <cursoragent@cursor.com>
The scenario fixture was gating provisioning on E2E_LOCAL='true', an env var that is set nowhere in the repo. As a result, every scenario-driven spec (including all 5 payroll canaries and all 5 time-off canaries) was hitting the empty-context fallback in CI, producing flowToken='' and triggering each spec's `test.skip(!scenario.flowToken, ...)` guard. Concretely, the most recent green CI run for this PR reported: total: 19, expected: 5, skipped: 14 on the e2e (payroll) shard. All 5 canary specs were in the skipped 14. The CI gate was passing without ever exercising the canaries it claims to gate on. Fix: gate on E2E_USE_REAL_BACKEND='true' instead, which is set by both playwright.demo.config.ts and playwright.local.config.ts and exported by the CI workflow on the e2e job. MSW-mode runs (the default playwright config) leave it unset, and scenario-driven specs continue to self-skip there because no real backend is available to provision against. Heads-up that this commit will materially change CI behavior: - The e2e shards will get slower because canaries now actually run end to end against flows.gusto-demo.com — expect ~2-3 min/shard up from ~1 min today, possibly more. - Each canary mints a fresh demo company. Across both shards that's ~10 demo provisionings per push. If flows.gusto-demo.com starts returning 429s we should fall back to a label- or schedule-gated canary trigger instead of patching around it. The earlier-introduced timing reporter (e2e/reports/timings.md) will finally have real data to show: the actual provisioning vs body split per canary. Co-authored-by: Cursor <cursoragent@cursor.com>
The three Playwright configs all hardcoded reporter to [['html'], [scenario-reporter]]. Both of those write artifacts only — neither prints to stdout — so a CI shard would log: === E2E Global Setup === Existing state is valid, skipping provisioning …and then go silent for however long the run took, with no per-test heartbeat to confirm anything was happening or to localize a hang. By specifying any reporter array we lost Playwright's default `list` reporter, which is normally what surfaces that progress. Add `list` to all three configs so each test prints a status + duration line as it starts and finishes (visible in the GitHub Actions log, local `npm run test:e2e` output, etc.). When running under GITHUB_ACTIONS, also append `github` so failed tests surface as inline annotations in the PR Files Changed view with file/line refs — makes triaging a red canary a one-click jump instead of a log scrub. The HTML report and scenario-reporter outputs (results.json, timings.md) are unchanged. We're only adding stdout signal, not removing anything. Co-authored-by: Cursor <cursoragent@cursor.com>
The PR Title Check has been intermittently failing with a kernel-level ETXTBSY during esbuild's postinstall: npm error spawnSync .../node_modules/esbuild/bin/esbuild ETXTBSY npm error code ETXTBSY (errno -26) Same SHA, different runs, different outcomes — classic flake. The underlying race is documented in evanw/esbuild#3320: concurrent postinstall scripts on shared CI runners try to spawn the esbuild binary while it's still being written to. Why does a workflow that lints a one-line PR title even pull in esbuild? Because `npm install --no-save <pkg>` in the repo root sees the SDK's package.json and installs every transitive dependency in it (~600 packages including esbuild), then layers the requested package on top. The --no-save flag controls whether the result gets written back to package.json — it does not scope what gets installed. Fix: install commitlint into $RUNNER_TEMP/commitlint-install instead of the repo root. With no SDK package.json in scope, npm only installs the two requested commitlint packages — no esbuild, no race. Pinned versions still come from the SDK's package.json so the linter behavior matches local commits. Also dropped the dependency on commitlint.config.ts being importable from the install dir by passing --extends '@commitlint/config-conventional' on the CLI directly. That config file is two lines and just extends the conventional preset, which is now exactly what the CLI flag specifies. Verified locally that `commitlint --extends '@commitlint/config-conventional'` accepts well-formed conventional commit titles and rejects malformed ones. Co-authored-by: Cursor <cursoragent@cursor.com>
The shared describe-level userEvent instance carried pointer/focus state across tests, which intermittently caused user.click to not register on CI under load — surfacing as a flake in the pagination "resets to page 1" test. Move setup() into each test body to match repo convention and the userEvent docs' recommendation. Co-authored-by: Cursor <cursoragent@cursor.com>
a1c6ab4 to
6bd85a2
Compare
Each scenario-driven Playwright test was paying a full POST /demos + employee/location/job/comp decoration round-trip against flows.gusto-demo.com (~16s on the most recent green time-off shard, 57% of wall time across 25 tests). Two changes collapse that cost: 1. Worker-scoped cache in localTestFixture.ts. A new _scenarioCache fixture owns a Map<scenarioId, Promise<ScenarioContext>>. The existing scenario fixture consults it before calling provisionScenario, so subsequent tests sharing a scenario ID reuse the same demo company. Storing the promise (not the resolved context) means a race within a worker shares a single in-flight provisioning attempt. Failed provisioning evicts the entry so the next test retries from scratch instead of inheriting a permanently broken company. Unlike the previous on-disk scenario cache (removed in 7643dd6 because it leaked state between runs), this cache lives entirely in worker memory and dies with the process — no cross-run state pollution. 2. workers: 1 in playwright.demo.config.ts and playwright.local.config.ts. The cache is worker-scoped, so parallel workers each pay their own provisioning cost. CI was already at workers: 1; locally we defaulted to undefined (one per CPU core). Pinning to 1 makes the worker cache effectively suite-scoped everywhere. Override with --workers=N for parallel debugging of independent scenarios. scenario-reporter.ts now reads a cacheHit flag from the timing annotation and renders "(cached)" markers in timings.md plus a "N/M provisioning cache hits" tally per domain so CI logs make reuse obvious. Also pins the Playwright browser cache key in ci.yaml to the resolved playwright package version instead of the lockfile hash, so unrelated dependency churn stops invalidating the ~250MB chromium download. Validation (local time-off canary suite, 5 tests on 1 scenario): Before: 5 fresh demo creations, ~6 minutes After: 1 fresh + 4 cached, 1.2 minutes timings.md: 4/5 provisioning cache hits, provisioning 28% of wall time (vs 53% before). Projected for the time-off CI shard (25 tests, 2 distinct scenarios): Before: 25 × ~16s provisioning = ~6m40s, 11m41s wall time After: 2 × ~16s provisioning = ~32s, ~5min wall time Co-authored-by: Cursor <cursoragent@cursor.com>
Two pagination tests (paginates the roster + resets to page 1 on search) failed intermittently in CI with "Unable to find element with text 'Person10 Roster'" after clicking the next-page button. The prior fix (6bd85a2) split userEvent.setup() per test, which addressed shared pointer state but didn't fix the underlying race: the post- click assertion used waitFor + getByText with the default 1s timeout, which expires under coverage instrumentation + parallel CI workload before React 19's concurrent renderer commits the page-2 list. Three changes per failing test: 1. Replace waitFor + getByText with findByText/findByTestId. They poll the same way but read more naturally for "wait for this element to appear after an async user action." 2. Bump the post-click polling timeout to 5s. The local wall time is ~50ms; 5s is far more than the actual render cost but absorbs CI tail latency without masking real regressions. 3. Target the next-page button by data-testid="pagination-next" instead of the i18n aria-label. Skips an i18n round-trip and is robust against label rename. The aria-label remains for accessibility — only the test selector changes. All 11 tests in the file still pass locally (6.6s total). Co-authored-by: Cursor <cursoragent@cursor.com>
…ready-checked rows
Two pre-existing flakes in policy-input-error-handling.spec.ts surfaced
in CI run 26199333744 (waiting period decimal failed all 3 retries,
non-numeric chars failed where it had previously been auto-skipped).
Neither is caused by the worker-scoped scenario cache — earlier runs
on the same branch (26195202528) showed the waiting-period test as
flaky-passing-on-retry, and the non-numeric chars test relies on a
demo-state condition (employee not enrolled) that varies run to run.
Two targeted fixes:
1. waiting period decimal value: drop the post-submit "still on
settings or moved to add-employees" assertion. The test's own
comment establishes the contract is "no crash overlay" — line 98's
heading-shape check was over-specifying flow state and breaking
when save legitimately routed elsewhere (e.g. policy detail view
after a successful submit, settings re-render after silent
coercion). Replace the heading-shape check with a final
no-crash-overlay assertion that matches the documented contract.
2. non-numeric chars in starting balance: the loop that scans for an
employee row with a starting-balance input previously called
`checkbox.check({ force: true })` unwrapped, which throws with
"Clicking the checkbox did not change its state" when the SDK has
already pre-selected the row (or the row is in a state where the
input never appears). Wrap the check in try/catch and treat
failure as "not a candidate row, keep scanning". The existing
test.skip after the loop now correctly fires when no suitable row
exists in this run's demo state.
Also tightened the `targetRow` type from `null` to a proper Locator
union so TypeScript no longer needs the post-loop non-null assertion
to imply the right type.
Verified locally: running `non-numeric chars` against the demo
backend cleanly skips when the demo has no unenrolled rows (the
pre-existing intended behavior). The waiting-period test passes
because the contract under test never depended on flow shape.
Co-authored-by: Cursor <cursoragent@cursor.com>


Summary
Migrate payroll E2E tests to scenario-driven architecture.
standard-biweekly-2-employees,off-cycle-eligible,post-schedule-changepayroll.spec.ts->payroll/regular-payroll.spec.tsusing scenario fixturetransition-payroll.spec.tsintopayroll/transition.spec.tsandpayroll/off-cycle.spec.tslegacy/transition-payroll.legacy.spec.ts(skipped)e2e/CLAUDE.mdwith scenario authoring workflow documentationBase: PR #1829 (e2e/infrastructure). When that merges to main, this PR retargets automatically.
Playwright Recordings
canary-01-regular-payroll-PASSED.webm
canary-05-dismissal-PASSED.webm
canary-04-transition-PASSED.webm
canary-03-off-cycle-correction-PASSED.webm
canary-02-off-cycle-bonus-PASSED.webm
Test plan
npm run scenarios:validate— all 7 scenario/fragment files validnpx tsc --noEmit— clean typechecknpm run e2e:scenarios:prewarm --domain payrollprovisions against gws-flowsnpm run test:e2e:local -- --grep @payrollpassesMade with Cursor