feat: add performance regression gate to CI (CI-03)#918
Conversation
Checks vite build output against configurable size thresholds (entry chunk, single chunk, total JS) and emits GitHub Actions annotations for warnings and errors. Produces a JSON report for trend tracking. Refs #872
Parses k6 --summary-export JSON and emits GitHub Actions annotations. Reports near-threshold warnings (within 20% of limit) and produces a JSON report for historical comparison. Refs #872
Lower error rate limit from 2% to 1% and add p95<2000ms hard gate alongside existing aspirational p95<1200ms target. Refs #872
Combines k6 API load testing with frontend bundle size checks in a single CI job. Emits GitHub Actions annotations for threshold violations and near-threshold warnings. Uploads JSON reports as artifacts for historical trend tracking. Refs #872
Add performance-regression-gate job to ci-extended (triggered by 'performance' label or manual dispatch) and ci-nightly. Update topology comment in ci-required.yml. Refs #872
Add section describing k6 API thresholds, frontend bundle size thresholds, override environment variables, and artifact outputs. Refs #872
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d5825e72f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| node scripts/ci/check-bundle-size.mjs \ | ||
| --dist frontend/taskdeck-web/dist \ | ||
| --output-json frontend/taskdeck-web/test-results/perf/bundle-size-report.json |
There was a problem hiding this comment.
Fail bundle-size gate on budget breaches
This workflow step invokes check-bundle-size.mjs without --fail-on-error, so bundle budget violations only emit ::error annotations but do not fail the job. Because the script defaults to non-failing mode unless that flag is set, PRs and nightly runs can exceed the configured entry/single/total JS limits while the performance gate still passes.
Useful? React with 👍 / 👎.
| checks: ["rate>0.99"], | ||
| http_req_duration: ["p(95)<1200", "p(99)<2500"], | ||
| // CI gate: p95 must stay below 2000ms (issue #872); aspirational target p95<1200ms | ||
| http_req_duration: ["p(95)<2000", "p(95)<1200", "p(99)<2500"], |
There was a problem hiding this comment.
Remove hard p95<1200 threshold from CI gate
Adding both p(95)<2000 and p(95)<1200 makes k6 enforce the stricter 1200ms threshold as a hard failure, since k6 treats each threshold expression as pass/fail and a false expression fails the test. That means runs with p95 between 1200ms and 2000ms will still fail, which contradicts the stated intent that 2000ms is the gate and 1200ms is aspirational.
Useful? React with 👍 / 👎.
- Remove aspirational p95<1200ms from k6 thresholds (all k6 thresholds are enforced, so aspirational would cause false CI failures). Moved to check-k6-thresholds.mjs as a warning instead. - Add --fail-on-error to bundle size check in CI workflow so error thresholds actually fail the step. - Use path.dirname instead of string slicing for cross-platform path handling in both CI scripts. Refs #872
Adversarial Self-ReviewIssues Found and Fixed (b3ee9fe)
Remaining Considerations
|
There was a problem hiding this comment.
Code Review
This pull request introduces performance regression gates in the CI pipeline by adding documentation and scripts to monitor frontend bundle sizes and k6 API performance thresholds. Key feedback includes correcting the reported limit in bundle size warnings, fixing fragile command-line argument parsing in the k6 threshold script, and addressing a conflict in k6 threshold definitions where multiple p95 targets would cause premature test failures. Additionally, it was suggested to dynamically derive near-threshold warnings from the k6 summary data rather than using hardcoded constants.
| } else if (entryKB > WARN_ENTRY_KB) { | ||
| const msg = `Entry chunk ${entryChunk.name} is ${formatKB(entryChunk.sizeBytes)} KB, approaching limit of ${MAX_ENTRY_KB} KB (warn at ${WARN_ENTRY_KB} KB)`; | ||
| console.log(`::warning::${msg}`); | ||
| violations.push({ level: "warning", metric: "entry_chunk_kb", value: entryKB, limit: MAX_ENTRY_KB, message: msg }); |
There was a problem hiding this comment.
In the warning case, the limit field in the violation object is set to MAX_ENTRY_KB (the error threshold), but the warning was triggered by exceeding WARN_ENTRY_KB. This makes the JSON report misleading, as the value will appear to be below the limit despite being flagged as a violation. The limit field should reflect the threshold that was actually breached for that specific finding.
| violations.push({ level: "warning", metric: "entry_chunk_kb", value: entryKB, limit: MAX_ENTRY_KB, message: msg }); | |
| violations.push({ level: "warning", metric: "entry_chunk_kb", value: entryKB, limit: WARN_ENTRY_KB, message: msg }); |
| import { readFileSync, writeFileSync, mkdirSync } from "node:fs"; | ||
| import { dirname } from "node:path"; | ||
|
|
||
| const args = process.argv.slice(2); |
There was a problem hiding this comment.
The logic for finding summaryPath is fragile. Using args.find((a) => !a.startsWith("--")) will incorrectly identify the value of a flag (e.g., the path provided to --output-json) as the summary file path if that flag appears before the positional argument. This will cause the script to fail or read the wrong file.
| const args = process.argv.slice(2); | ||
| const summaryPath = args.find((a) => !a.startsWith("--")); | ||
| const failOnBreach = args.includes("--fail-on-breach"); | ||
| const outputJsonIdx = args.indexOf("--output-json"); |
There was a problem hiding this comment.
There is no check to ensure that an argument actually follows the --output-json flag. If the flag is provided as the last argument, args[outputJsonIdx + 1] will be undefined, which will cause mkdirSync or writeFileSync to throw an error later in the script.
| const outputJsonIdx = args.indexOf("--output-json"); | |
| const outputJson = (outputJsonIdx !== -1 && args[outputJsonIdx + 1]) ? args[outputJsonIdx + 1] : null; |
| // Check for near-threshold conditions and aspirational targets | ||
| const p95Limit = 2000; // ms -- hard gate (issue #872) | ||
| const p95Aspirational = 1200; // ms -- aspirational target (warning only) |
There was a problem hiding this comment.
Hardcoding these limits (2000ms, 1%) in the script creates a maintenance risk as they are likely to drift from the actual thresholds defined in the k6 test scripts. Since the k6 JSON summary already contains the threshold definitions and their results (which you are already parsing in lines 45-59), it would be more robust to derive the 'near-threshold' warnings dynamically from the limits found in the JSON data rather than using hardcoded constants.
| checks: ["rate>0.99"], | ||
| http_req_duration: ["p(95)<1200", "p(99)<2500"], | ||
| // CI gate: p95 must stay below 2000ms (issue #872) | ||
| // Aspirational target p95<1200ms is enforced as a warning via check-k6-thresholds.mjs |
There was a problem hiding this comment.
The inclusion of "p(95)<1200" as a threshold here conflicts with the intended CI gate of 2000ms defined in the documentation and PR description. In k6, if any threshold in the list is breached, the entire test run is marked as failed (non-zero exit code). This effectively makes 1200ms the hard gate, rendering the 2000ms threshold redundant and causing CI to fail on performance levels that were intended to only trigger warnings or be 'aspirational'.
| // Aspirational target p95<1200ms is enforced as a warning via check-k6-thresholds.mjs | |
| http_req_duration: ["p(95)<2000", "p(99)<2500"], |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3ee9fe13a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| function isEntryChunk(name) { | ||
| // Vite names the main entry chunk as index-<hash>.js | ||
| return /^index-[A-Za-z0-9]+\.js$/.test(name); |
There was a problem hiding this comment.
Accept full Vite hash alphabet for entry chunk matching
The entry-chunk detector only matches index- names whose hash is strictly alphanumeric, but this repo’s Vite build already emits chunk hashes containing _ and - (for example OpsConsoleView-CgIU3_Wh.js and useWorkspaceOnboardingActions-sTMco-zd.js). If the index-* hash includes either character, entryChunk becomes null and the 150 KB entry budget is silently skipped, allowing regressions between the entry limit and the 250 KB single-chunk limit to pass.
Useful? React with 👍 / 👎.
| const assetsDir = join(distDir, "assets"); | ||
| const jsFiles = collectJsFiles(assetsDir); |
There was a problem hiding this comment.
Include root-level JS files in total bundle-size budget
The script computes total_js_kb by scanning only dist/assets, but this frontend build also generates JavaScript at the dist root (notably sw.js and workbox-*.js from the PWA pipeline). Those files are shipped to users yet excluded from the total budget and JSON report, so CI can report “total JS size” as healthy even when root-level JS regresses significantly.
Useful? React with 👍 / 👎.
Adversarial Review of PR #918 (Performance Regression Gate)I have reviewed all 9 bot comments, all changed/new files at the latest commit (b3ee9fe), and performed my own adversarial analysis. Here are consolidated findings. Bot Comments: TriageAlready fixed by self-review commit (b3ee9fe):
Still open -- legitimate issues that need fixes:
Bot comments I disagree with / low priority:
My Own Findings (beyond bot comments)7. ci-extended.yml: Missing 8. 9. No 10. Workflow does Summary of required fixes (ordered by severity)
Items 4 and 7 should be fixed before merge. Items 1, 2, 3 are worth fixing while we're here. |
- Widen entry chunk regex to include _ and - (Base64url hash chars) - Fix warning violations to report WARN_*_KB limit instead of MAX_*_KB - Guard --output-json against missing value in k6 threshold script - Fix fragile positional arg parsing to skip flag values correctly - Add missing needs: [backend-solution] to ci-extended perf gate job
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91a5d5e52a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| needs: | ||
| - backend-solution |
There was a problem hiding this comment.
Remove unmet
needs from performance-gate trigger path
In ci-extended, this job is intended to run for PRs labeled performance, but it also declares needs: backend-solution. Because backend-solution itself only runs for testing-labeled PRs, a PR with only the performance label will skip backend-solution and therefore skip this new gate entirely. That silently disables the performance regression checks in the exact trigger path documented for this workflow.
Useful? React with 👍 / 👎.
Summary
performance) and nightly workflows::warning/::errorannotations and produce JSON artifacts for historical trend trackingCloses #872
Changes
scripts/ci/check-bundle-size.mjsscripts/ci/check-k6-thresholds.mjstests/load/k6/board-heavy-load.js.github/workflows/reusable-performance-regression-gate.yml.github/workflows/ci-extended.yml.github/workflows/ci-nightly.yml.github/workflows/ci-required.ymldocs/PERFORMANCE_BUDGETS.mdAcceptance Criteria Coverage
Test plan
node scripts/ci/check-bundle-size.mjs --dist frontend/taskdeck-web/distpasses locally aftervite buildperformancelabel to validate end-to-end