Skip to content

feat: add performance regression gate to CI (CI-03)#918

Merged
Chris0Jeky merged 8 commits intomainfrom
feat/ci-03-performance-regression-gate
Apr 22, 2026
Merged

feat: add performance regression gate to CI (CI-03)#918
Chris0Jeky merged 8 commits intomainfrom
feat/ci-03-performance-regression-gate

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

  • Add k6 API performance thresholds enforced in CI (p95 < 2s, error rate < 1%) with near-threshold warnings
  • Add frontend bundle size checks (entry chunk < 150 KB, single chunk < 250 KB, total JS < 1200 KB)
  • Wire performance regression gate into ci-extended (label: performance) and nightly workflows
  • Both checks emit GitHub Actions ::warning / ::error annotations and produce JSON artifacts for historical trend tracking

Closes #872

Changes

File Purpose
scripts/ci/check-bundle-size.mjs Frontend bundle size threshold script
scripts/ci/check-k6-thresholds.mjs k6 JSON summary parser with GHA annotations
tests/load/k6/board-heavy-load.js Tighten error rate to 1%, add p95<2s hard gate
.github/workflows/reusable-performance-regression-gate.yml Reusable workflow combining both checks
.github/workflows/ci-extended.yml Add performance-regression-gate job
.github/workflows/ci-nightly.yml Add performance-regression-gate job
.github/workflows/ci-required.yml Update topology comment
docs/PERFORMANCE_BUDGETS.md Document CI enforcement section

Acceptance Criteria Coverage

  • k6 thresholds enforced in ci-extended lane (fail on p95 > 2s, error rate > 1%)
  • Frontend build size threshold enforced (warn if main bundle exceeds limit)
  • Performance budget violations logged as CI step warnings
  • Historical trend tracking (JSON artifacts with timestamps for cross-run comparison)

Test plan

  • Verify node scripts/ci/check-bundle-size.mjs --dist frontend/taskdeck-web/dist passes locally after vite build
  • Verify workflow YAML is valid (actionlint in CI)
  • Trigger ci-extended with performance label to validate end-to-end
  • Verify k6 threshold tightening does not cause false positives on baseline load

Checks vite build output against configurable size thresholds (entry
chunk, single chunk, total JS) and emits GitHub Actions annotations
for warnings and errors. Produces a JSON report for trend tracking.

Refs #872
Parses k6 --summary-export JSON and emits GitHub Actions annotations.
Reports near-threshold warnings (within 20% of limit) and produces a
JSON report for historical comparison.

Refs #872
Lower error rate limit from 2% to 1% and add p95<2000ms hard gate
alongside existing aspirational p95<1200ms target.

Refs #872
Combines k6 API load testing with frontend bundle size checks in a
single CI job. Emits GitHub Actions annotations for threshold
violations and near-threshold warnings. Uploads JSON reports as
artifacts for historical trend tracking.

Refs #872
Add performance-regression-gate job to ci-extended (triggered by
'performance' label or manual dispatch) and ci-nightly. Update
topology comment in ci-required.yml.

Refs #872
Add section describing k6 API thresholds, frontend bundle size
thresholds, override environment variables, and artifact outputs.

Refs #872
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d5825e72f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +107 to +109
node scripts/ci/check-bundle-size.mjs \
--dist frontend/taskdeck-web/dist \
--output-json frontend/taskdeck-web/test-results/perf/bundle-size-report.json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail bundle-size gate on budget breaches

This workflow step invokes check-bundle-size.mjs without --fail-on-error, so bundle budget violations only emit ::error annotations but do not fail the job. Because the script defaults to non-failing mode unless that flag is set, PRs and nightly runs can exceed the configured entry/single/total JS limits while the performance gate still passes.

Useful? React with 👍 / 👎.

Comment thread tests/load/k6/board-heavy-load.js Outdated
checks: ["rate>0.99"],
http_req_duration: ["p(95)<1200", "p(99)<2500"],
// CI gate: p95 must stay below 2000ms (issue #872); aspirational target p95<1200ms
http_req_duration: ["p(95)<2000", "p(95)<1200", "p(99)<2500"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove hard p95<1200 threshold from CI gate

Adding both p(95)<2000 and p(95)<1200 makes k6 enforce the stricter 1200ms threshold as a hard failure, since k6 treats each threshold expression as pass/fail and a false expression fails the test. That means runs with p95 between 1200ms and 2000ms will still fail, which contradicts the stated intent that 2000ms is the gate and 1200ms is aspirational.

Useful? React with 👍 / 👎.

- Remove aspirational p95<1200ms from k6 thresholds (all k6 thresholds
  are enforced, so aspirational would cause false CI failures). Moved to
  check-k6-thresholds.mjs as a warning instead.
- Add --fail-on-error to bundle size check in CI workflow so error
  thresholds actually fail the step.
- Use path.dirname instead of string slicing for cross-platform path
  handling in both CI scripts.

Refs #872
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

Issues Found and Fixed (b3ee9fe)

  1. k6 aspirational threshold would cause false failures -- All k6 thresholds are hard-enforced (exit non-zero on breach). The original diff had p(95)<1200 alongside p(95)<2000 in the k6 thresholds array. If p95 lands between 1200-2000ms, k6 would fail the CI even though it's within the issue's hard gate. Fix: Removed p(95)<1200 from k6 thresholds; the aspirational target is now emitted as a ::warning by check-k6-thresholds.mjs instead.

  2. Bundle size errors did not fail CI -- The check-bundle-size.mjs script only exits non-zero with --fail-on-error, which was not passed in the workflow. This meant error thresholds would only emit annotations without actually failing the step. Fix: Added --fail-on-error to the workflow invocation.

  3. String-based path splitting -- Both scripts used outputJson.substring(0, outputJson.lastIndexOf("/")) which would fail on Windows paths. Fix: Switched to path.dirname().

Remaining Considerations

  • Duplicate k6 run: The load-concurrency-harness and performance-regression-gate jobs both run k6 when both are triggered. This is intentional -- the load harness tests concurrency semantics (Playwright + k6 together) while the performance gate focuses on threshold enforcement with richer analysis. They use different database files and artifact names. If CI time becomes a concern, the performance gate's k6 step could be conditioned on whether load-concurrency-harness already ran.

  • Nightly redundancy: The nightly workflow now runs both load-concurrency-harness and performance-regression-gate. The latter provides threshold analysis that the former lacks, so both add value. The ~15 min additional CI time on nightly is acceptable.

  • Bundle size thresholds are generous: Current entry chunk is 88 KB vs 150 KB limit; largest chunk is 152 KB vs 250 KB limit. These provide growth headroom. The warning thresholds at 80% of limits will catch gradual creep before hitting hard gates.

  • Flakiness risk: k6 results can vary with CI runner load. The 2000ms p95 hard gate has ~70% headroom over typical values (~1200ms), which should absorb CI noise. The existing load harness has been running without flakiness issues at even stricter thresholds.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces performance regression gates in the CI pipeline by adding documentation and scripts to monitor frontend bundle sizes and k6 API performance thresholds. Key feedback includes correcting the reported limit in bundle size warnings, fixing fragile command-line argument parsing in the k6 threshold script, and addressing a conflict in k6 threshold definitions where multiple p95 targets would cause premature test failures. Additionally, it was suggested to dynamically derive near-threshold warnings from the k6 summary data rather than using hardcoded constants.

Comment thread scripts/ci/check-bundle-size.mjs Outdated
} else if (entryKB > WARN_ENTRY_KB) {
const msg = `Entry chunk ${entryChunk.name} is ${formatKB(entryChunk.sizeBytes)} KB, approaching limit of ${MAX_ENTRY_KB} KB (warn at ${WARN_ENTRY_KB} KB)`;
console.log(`::warning::${msg}`);
violations.push({ level: "warning", metric: "entry_chunk_kb", value: entryKB, limit: MAX_ENTRY_KB, message: msg });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the warning case, the limit field in the violation object is set to MAX_ENTRY_KB (the error threshold), but the warning was triggered by exceeding WARN_ENTRY_KB. This makes the JSON report misleading, as the value will appear to be below the limit despite being flagged as a violation. The limit field should reflect the threshold that was actually breached for that specific finding.

Suggested change
violations.push({ level: "warning", metric: "entry_chunk_kb", value: entryKB, limit: MAX_ENTRY_KB, message: msg });
violations.push({ level: "warning", metric: "entry_chunk_kb", value: entryKB, limit: WARN_ENTRY_KB, message: msg });

import { readFileSync, writeFileSync, mkdirSync } from "node:fs";
import { dirname } from "node:path";

const args = process.argv.slice(2);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for finding summaryPath is fragile. Using args.find((a) => !a.startsWith("--")) will incorrectly identify the value of a flag (e.g., the path provided to --output-json) as the summary file path if that flag appears before the positional argument. This will cause the script to fail or read the wrong file.

const args = process.argv.slice(2);
const summaryPath = args.find((a) => !a.startsWith("--"));
const failOnBreach = args.includes("--fail-on-breach");
const outputJsonIdx = args.indexOf("--output-json");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is no check to ensure that an argument actually follows the --output-json flag. If the flag is provided as the last argument, args[outputJsonIdx + 1] will be undefined, which will cause mkdirSync or writeFileSync to throw an error later in the script.

Suggested change
const outputJsonIdx = args.indexOf("--output-json");
const outputJson = (outputJsonIdx !== -1 && args[outputJsonIdx + 1]) ? args[outputJsonIdx + 1] : null;

Comment on lines +87 to +89
// Check for near-threshold conditions and aspirational targets
const p95Limit = 2000; // ms -- hard gate (issue #872)
const p95Aspirational = 1200; // ms -- aspirational target (warning only)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding these limits (2000ms, 1%) in the script creates a maintenance risk as they are likely to drift from the actual thresholds defined in the k6 test scripts. Since the k6 JSON summary already contains the threshold definitions and their results (which you are already parsing in lines 45-59), it would be more robust to derive the 'near-threshold' warnings dynamically from the limits found in the JSON data rather than using hardcoded constants.

checks: ["rate>0.99"],
http_req_duration: ["p(95)<1200", "p(99)<2500"],
// CI gate: p95 must stay below 2000ms (issue #872)
// Aspirational target p95<1200ms is enforced as a warning via check-k6-thresholds.mjs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The inclusion of "p(95)<1200" as a threshold here conflicts with the intended CI gate of 2000ms defined in the documentation and PR description. In k6, if any threshold in the list is breached, the entire test run is marked as failed (non-zero exit code). This effectively makes 1200ms the hard gate, rendering the 2000ms threshold redundant and causing CI to fail on performance levels that were intended to only trigger warnings or be 'aspirational'.

Suggested change
// Aspirational target p95<1200ms is enforced as a warning via check-k6-thresholds.mjs
http_req_duration: ["p(95)<2000", "p(99)<2500"],

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3ee9fe13a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/ci/check-bundle-size.mjs Outdated

function isEntryChunk(name) {
// Vite names the main entry chunk as index-<hash>.js
return /^index-[A-Za-z0-9]+\.js$/.test(name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept full Vite hash alphabet for entry chunk matching

The entry-chunk detector only matches index- names whose hash is strictly alphanumeric, but this repo’s Vite build already emits chunk hashes containing _ and - (for example OpsConsoleView-CgIU3_Wh.js and useWorkspaceOnboardingActions-sTMco-zd.js). If the index-* hash includes either character, entryChunk becomes null and the 150 KB entry budget is silently skipped, allowing regressions between the entry limit and the 250 KB single-chunk limit to pass.

Useful? React with 👍 / 👎.

Comment on lines +70 to +71
const assetsDir = join(distDir, "assets");
const jsFiles = collectJsFiles(assetsDir);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include root-level JS files in total bundle-size budget

The script computes total_js_kb by scanning only dist/assets, but this frontend build also generates JavaScript at the dist root (notably sw.js and workbox-*.js from the PWA pipeline). Those files are shipped to users yet excluded from the total budget and JSON report, so CI can report “total JS size” as healthy even when root-level JS regresses significantly.

Useful? React with 👍 / 👎.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Review of PR #918 (Performance Regression Gate)

I have reviewed all 9 bot comments, all changed/new files at the latest commit (b3ee9fe), and performed my own adversarial analysis. Here are consolidated findings.


Bot Comments: Triage

Already fixed by self-review commit (b3ee9fe):

  • Codex: Missing --fail-on-error -- Fixed. The workflow now passes --fail-on-error to check-bundle-size.mjs.
  • Codex/Gemini: Dual p95 threshold -- Fixed. board-heavy-load.js now has only p(95)<2000 (the aspirational 1200ms target is handled as a warning via check-k6-thresholds.mjs).

Still open -- legitimate issues that need fixes:

  1. Gemini: limit field in warning violations uses wrong threshold (check-bundle-size.mjs line 122) -- In the warning branch, violations push limit: MAX_ENTRY_KB but the threshold that was actually breached is WARN_ENTRY_KB. This makes the JSON report misleading (value appears below the reported limit). The same pattern repeats for single-chunk and total-JS warnings at lines 133 and 143. Verdict: legitimate, should fix all three.

  2. Gemini: Fragile positional arg parsing (check-k6-thresholds.mjs line 18) -- args.find((a) => !a.startsWith("--")) would match the value of --output-json if it appears before the positional argument (e.g., --output-json report.json k6-summary.json). However, the CI invocation always puts the positional arg first, so this is low risk in practice. Verdict: legitimate but low priority. Worth a defensive fix.

  3. Gemini: Missing guard on --output-json value (check-k6-thresholds.mjs line 21) -- If --output-json is the last argument, args[outputJsonIdx + 1] is undefined, causing mkdirSync(undefined) to throw. Verdict: legitimate, should fix.

  4. Codex: Entry chunk regex too narrow (check-bundle-size.mjs line 61) -- The regex /^index-[A-Za-z0-9]+\.js$/ does not match Vite hashes containing _ or -. Vite's default hash uses Base64url encoding which includes _ and -. If the entry chunk hash contains these characters, entryChunk is null and the 150 KB entry budget check is silently skipped. Verdict: legitimate and important. The regex should be /^index-[A-Za-z0-9_-]+\.js$/.

Bot comments I disagree with / low priority:

  1. Gemini: Hardcoded thresholds in check-k6-thresholds.mjs (lines 87-89) -- Suggests deriving limits dynamically from k6 JSON summary. I disagree: the k6 summary only tells you whether thresholds passed/failed, not their numeric limits. You cannot extract "2000" from p(95)<2000 reliably without parsing threshold expression strings. The hardcoded constants match the k6 thresholds and the documented budgets. Verdict: not actionable; a code comment noting the coupling would suffice.

  2. Codex: Include root-level JS files (sw.js, workbox-*.js) in total -- These are service worker files generated by VitePWA, not part of the app bundle that gates initial page load performance. Mixing them in would inflate the total and provide misleading signals. Verdict: reasonable concern but intentionally out of scope. A code comment clarifying this would help.


My Own Findings (beyond bot comments)

7. ci-extended.yml: Missing needs: [backend-solution] -- The performance-regression-gate job in ci-extended has no needs dependency, unlike every other heavy job (load-concurrency-harness, visual-regression, container-integration). In ci-nightly it correctly has needs: [backend-solution]. Without this, the perf gate will run even if the basic build fails, wasting CI minutes on a doomed run.

8. check-k6-thresholds.mjs near-threshold/aspirational overlap -- If p95 is exactly at 1600ms (which is both > 1200ms aspirational AND > 1600ms = 80% of 2000ms), the script emits the aspirational warning but not the near-threshold warning. This is because the conditions use else if chains. The aspirational warning fires first. This is acceptable behavior (you'd want the more specific warning), but worth noting that the 80% near-threshold window (1600-2000ms) is partially eclipsed by the aspirational window (1200-2000ms).

9. No checks metric near-threshold analysis -- The k6 thresholds include checks: ["rate>0.99"], but check-k6-thresholds.mjs only does near-threshold analysis for http_req_duration and http_req_failed. If the check pass rate drops to 99.1% (dangerously close to the 99% gate), no warning is emitted. Low priority since the threshold parsing loop at lines 45-59 already catches actual breaches.

10. Workflow does dotnet restore then later dotnet build --no-restore -- This is correct, but note the workflow restores the full solution (backend/Taskdeck.sln) then only builds the Api project. This is fine because --no-restore on the API project will use the already-restored packages.


Summary of required fixes (ordered by severity)

# Severity File Issue
4 High check-bundle-size.mjs:61 Entry chunk regex misses _ and - in hash, silently skipping entry budget
7 Medium ci-extended.yml Missing needs: [backend-solution] on performance-regression-gate job
1 Low check-bundle-size.mjs:122,133,143 Warning violations report limit: MAX_*_KB instead of limit: WARN_*_KB
3 Low check-k6-thresholds.mjs:21 No guard against undefined --output-json value
2 Low check-k6-thresholds.mjs:18 Fragile positional arg parsing

Items 4 and 7 should be fixed before merge. Items 1, 2, 3 are worth fixing while we're here.

- Widen entry chunk regex to include _ and - (Base64url hash chars)
- Fix warning violations to report WARN_*_KB limit instead of MAX_*_KB
- Guard --output-json against missing value in k6 threshold script
- Fix fragile positional arg parsing to skip flag values correctly
- Add missing needs: [backend-solution] to ci-extended perf gate job
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91a5d5e52a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +146 to +147
needs:
- backend-solution
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unmet needs from performance-gate trigger path

In ci-extended, this job is intended to run for PRs labeled performance, but it also declares needs: backend-solution. Because backend-solution itself only runs for testing-labeled PRs, a PR with only the performance label will skip backend-solution and therefore skip this new gate entirely. That silently disables the performance regression checks in the exact trigger path documented for this workflow.

Useful? React with 👍 / 👎.

@Chris0Jeky Chris0Jeky merged commit c8925d4 into main Apr 22, 2026
14 checks passed
@github-project-automation github-project-automation Bot moved this from Pending to Done in Taskdeck Execution Apr 22, 2026
@Chris0Jeky Chris0Jeky deleted the feat/ci-03-performance-regression-gate branch April 23, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

CI-03: Performance regression gate in CI

1 participant