test(app): add PR0.3 perf diagnostics by Astro-Han · Pull Request #609 · Astro-Han/pawwork

Astro-Han · 2026-05-13T15:03:26Z

Summary

Add PR0.3 perf diagnostics without changing Area A behavior: a terminal side-panel perf scenario, failure-only Playwright trace reruns for comparator failures, and an upserted PR perf delta comment.

Why

PR0.2 can block regressions, but when the perf gate fails it still takes too much manual artifact digging to explain why. PR0.3 is the diagnostics layer: make failures replayable, make terminal/panel cost visible, and surface perf deltas directly on the PR.

Related Issue

Part of #600. Task anchor: task #78.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

packages/app/e2e/perf/perf-probe.spec.ts: the new terminal-side-panel-open scenario measures the real terminal open path without expanding scope into terminal history interactions.
.github/workflows/perf-probe-baseline.yml and packages/app/playwright.config.ts: trace is only enabled for diagnostic reruns after comparator failure, so normal measurement numbers stay uncontaminated.
packages/app/script/compare-perf.ts and packages/app/src/testing/perf-metrics.ts: perf comment markdown is stable, marker-based, and upserts one PR comment instead of spamming new comments on every push.

Risk Notes

Low. This PR only changes perf harness, CI diagnostics, and PR reporting. It does not change product behavior, thresholds, or Area A render architecture. The main risk is workflow complexity; normal measurement runs still avoid trace overhead by design.

How To Verify

Typecheck: bun run typecheck -> ok
Perf metrics tests: bun test --preload ./happydom.ts ./src/testing/perf-metrics.test.ts -> 7 passed
Local perf suite: bun --cwd packages/app test:e2e:local:perf -> 5 passed, including terminal-side-panel-open
Trace gate smoke test: PAWWORK_PERF_TRACE=1 bun --cwd packages/app test:e2e:local:perf -> ok
Formatter output: bun ./script/compare-perf.ts --base e2e/perf-results/pr0.1-baseline.json --head e2e/perf-results/pr0.1-baseline.json --output /tmp/pr0.3-compare.json --comment-output /tmp/pr0.3-comment.md -> JSON + markdown generated

Screenshots or Recordings

None. No visible UI changes.

Checklist

Summary by CodeRabbit

New Features
- Automated perf-regression detection that posts a formatted perf-delta comment on PRs.
- New performance test scenario covering terminal side-panel interactions.
- CLI option to emit a rendered perf comment to a file for CI consumption.
Chores
- CI now runs a confirmed re-check before posting results, produces comparison artifacts, and emits diagnostic traces only when confirmed regressions occur; failing jobs exit accordingly.
- Playwright trace behavior made configurable via environment.
Tests
- Added tests to render and verify the perf comparison comment output.

coderabbitai · 2026-05-13T15:03:42Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0ae90249-87e8-4b9d-aaf3-bc402aec3314

📥 Commits

Reviewing files that changed from the base of the PR and between bf4ac7f and a5a827e.

📒 Files selected for processing (1)

.github/workflows/perf-probe-baseline.yml

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/perf-probe-baseline.yml

📝 Walkthrough

Walkthrough

This PR adds perf baseline comment rendering and CLI output, wires comment creation/update into the perf comparator workflow (with conditional trace captures and job failure on regressions), makes Playwright trace mode env-configurable, and adds a terminal-side-panel baseline perf test.

Changes

Perf Regression Detection with PR Comments and Traces

Layer / File(s)	Summary
Perf Baseline Comment Rendering `packages/app/src/testing/perf-metrics.ts`, `packages/app/src/testing/perf-metrics.test.ts`	Exports `PERF_COMMENT_MARKER` constant and `renderPerfBaselineComment()` function to generate Markdown comparison comments with delta formatting, scenario tables, and status labels. Tests verify marker presence, delta headers, and "warn"/"fail" status indicators.
Playwright Trace Configuration `packages/app/playwright.config.ts`	Adds conditional trace mode selection via `process.env.PAWWORK_PERF_TRACE` to toggle between always-on and on-first-retry Playwright trace capture.
Compare Script Comment Output `packages/app/script/compare-perf.ts`	Extends comparison script with `--comment-output` flag to write rendered baseline comments to file, integrating the comment rendering from the first layer.
Workflow Comment and Diagnostics `.github/workflows/perf-probe-baseline.yml`	Adds workflow permissions, expands PR trigger paths, copies head Playwright config into base, marks comparator step(s) with continue-on-error, writes a `perf-comment.md` artifact, posts/updates a PR comment matched by a fixed HTML marker, captures base/head perf traces when both comparisons fail, and exits with code 1 on confirmed regression.
Terminal Side Panel Baseline Test `packages/app/e2e/perf/perf-probe.spec.ts`	Adds new `terminal-side-panel-open` baseline test that runs three sessions, opens the terminal side panel, waits for focus idle, snapshots probe output per run, ensures the panel closes, and records summarized scenario results.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

[Task] UI rewrite v2 PR0: perf-gated regression CI #600: The workflow/compare/comment/tracing changes align with the perf-gated CI objectives described in this issue.

Possibly related PRs

Astro-Han/pawwork#607: Related baseline workflow and perf-probe pipeline changes that this PR extends with comment rendering and PR comment integration.
Astro-Han/pawwork#608: Earlier perf comparator plumbing that this PR builds upon by adding --comment-output and rendered comments.

🐰 I hop through metrics, whiskers twitching fast,

I stamp a tiny marker so comparisons last,
I render markdown rows with a cheerful thump,
When traces show a lag I nudge the devs to jump,
I bound away with data crumbs and a happy drum.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test(app): add PR0.3 perf diagnostics' directly and clearly describes the main change: adding PR0.3 perf diagnostics functionality to the app's test suite.
Description check	✅ Passed	The description is comprehensive and follows the template structure with all key sections present: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify, Screenshots, and a completed Checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/pr0-perf-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Suggested priority: P2 (includes user-path files (packages/app/src/testing/perf-metrics.test.ts, packages/app/src/testing/perf-metrics.ts)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/perf-probe-baseline.yml (1)
101-140: 💤 Low value

Consider extracting the marker constant to reduce duplication.

The marker string "" is hardcoded here and also defined as PERF_COMMENT_MARKER in packages/app/src/testing/perf-metrics.ts. While GitHub Actions can't directly import TypeScript constants, this duplication creates a maintenance risk if the marker changes.

Consider documenting the coupling in a comment, or extracting the marker to a shared location (e.g., a shell variable at the workflow level that could be sourced from package.json or a config file).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/perf-probe-baseline.yml around lines 101 - 140, The
hardcoded marker "" in the GitHub Action
(variable name marker in the script) duplicates PERF_COMMENT_MARKER from
packages/app/src/testing/perf-metrics.ts; to fix, expose the marker via a single
source consumed by the workflow (e.g., define PERF_COMMENT_MARKER as a
workflow-level env variable or read it from package.json/config and set env:
PERF_COMMENT_MARKER) and replace the inline string with
process.env.PERF_COMMENT_MARKER (or document the coupling in a clear comment
above the script if extraction is not feasible), ensuring you update references
to the script's marker variable accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/perf-probe-baseline.yml:
- Around line 101-140: The hardcoded marker "<!-- pawwork-perf-probe-baseline
-->" in the GitHub Action (variable name marker in the script) duplicates
PERF_COMMENT_MARKER from packages/app/src/testing/perf-metrics.ts; to fix,
expose the marker via a single source consumed by the workflow (e.g., define
PERF_COMMENT_MARKER as a workflow-level env variable or read it from
package.json/config and set env: PERF_COMMENT_MARKER) and replace the inline
string with process.env.PERF_COMMENT_MARKER (or document the coupling in a clear
comment above the script if extraction is not feasible), ensuring you update
references to the script's marker variable accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7c0fd135-798f-4be6-90a0-e369058df44b

📥 Commits

Reviewing files that changed from the base of the PR and between d3e4e1b and bfe130a.

📒 Files selected for processing (6)

.github/workflows/perf-probe-baseline.yml
packages/app/e2e/perf/perf-probe.spec.ts
packages/app/playwright.config.ts
packages/app/script/compare-perf.ts
packages/app/src/testing/perf-metrics.test.ts
packages/app/src/testing/perf-metrics.ts

gemini-code-assist

Code Review

This pull request introduces a new performance probe scenario for terminal side panel interactions and adds functionality to generate a markdown-formatted performance delta summary comment. The changes include updates to the Playwright configuration to enable tracing, a new test case in the performance probe suite, and helper functions for rendering performance comparison reports. I have reviewed the code and identified a potential issue regarding the terminal state reset in the new test case, as well as a readability improvement for the markdown table generation logic.

github-actions · 2026-05-13T15:10:02Z

Perf delta summary

Comparator: pass

Scenario	interaction median	interaction worst	long task max	tbt	frame gap p95	frame gap max	jank count	cls	status
homepage-cold	32 -> 24 (-8)	56 -> 40 (-16)	89 -> 70 (-19)	39 -> 20 (-19)	16.8 -> 16.8 (0)	166.7 -> 150 (-16.7)	4 -> 3 (-1)	0 -> 0 (0)	pass
session-streaming-long	40 -> 40 (0)	56 -> 56 (0)	79 -> 0 (-79)	29 -> 0 (-29)	33.3 -> 16.7 (-16.6)	83.3 -> 16.8 (-66.5)	1 -> 0 (-1)	0 -> 0 (0)	pass
tool-call-expand	16 -> 16 (0)	24 -> 16 (-8)	0 -> 0 (0)	0 -> 0 (0)	16.7 -> 16.8 (+0.1)	16.7 -> 16.8 (+0.1)	0 -> 0 (0)	0 -> 0 (0)	pass
terminal-side-panel-open	40 -> 40 (0)	48 -> 48 (0)	0 -> 0 (0)	0 -> 0 (0)	16.8 -> 16.8 (0)	16.8 -> 16.8 (0)	0 -> 0 (0)	0 -> 0 (0)	pass
session-scroll-reading	24 -> 32 (+8)	32 -> 32 (0)	0 -> 0 (0)	0 -> 0 (0)	16.8 -> 16.8 (0)	16.8 -> 16.8 (0)	0 -> 0 (0)	0.505 -> 0.505 (0)	warn: cls

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/perf-probe-baseline.yml:
- Around line 7-8: The workflow's on.pull_request.paths is missing
packages/app/playwright.config.ts so changes to that config can bypass the
perf-probe-baseline job; update the paths list used by the workflow (the
on.pull_request.paths block) to include "packages/app/playwright.config.ts"
alongside the existing "packages/app/src/**" and "packages/ui/src/**" entries so
config-only PRs trigger this workflow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 580d62fa-3bd0-4f90-be67-02e65c5600e2

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd27c2 and bf4ac7f.

📒 Files selected for processing (1)

.github/workflows/perf-probe-baseline.yml

Astro-Han added 3 commits May 13, 2026 22:59

test(app): add terminal side-panel perf scenario

62a0fa9

ci(app): capture traces for perf comparator failures

5ae6efb

ci(app): comment perf deltas on pull requests

bfe130a

github-actions Bot reviewed May 13, 2026

View reviewed changes

github-actions Bot added ci Continuous integration / GitHub Actions app Application behavior and product flows labels May 13, 2026

Astro-Han added the P2 Medium priority label May 13, 2026

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

ci(app): widen perf-probe-baseline path filter

dba9d4f

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Comment thread packages/app/e2e/perf/perf-probe.spec.ts

Comment thread packages/app/src/testing/perf-metrics.ts Outdated

Astro-Han added 4 commits May 13, 2026 23:10

ci(app): restore perf runner path trigger

2eddf3a

test(app): reset terminal state between perf scenario runs

52778c8

test(app): simplify perf comment row rendering

0cd27c2

ci(app): confirm perf regressions before trace fail

bf4ac7f

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread .github/workflows/perf-probe-baseline.yml

ci(app): run perf gate on playwright config changes

a5a827e

Astro-Han merged commit dc2ea6c into dev May 13, 2026
26 checks passed

Astro-Han deleted the codex/pr0-perf-gate branch May 13, 2026 16:13

This was referenced May 13, 2026

test(app): add PR0.4 low-end perf profile #610

Merged

test(app): add long-session input lag perf guard #624

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(app): add PR0.3 perf diagnostics#609

test(app): add PR0.3 perf diagnostics#609
Astro-Han merged 9 commits into
devfrom
codex/pr0-perf-gate

Astro-Han commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Astro-Han commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Related Issue

Human Review Status

Review Focus

Risk Notes

How To Verify

Screenshots or Recordings

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Perf delta summary

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Astro-Han commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading