Skip to content

test(app): add PR0.3 perf diagnostics#609

Merged
Astro-Han merged 9 commits into
devfrom
codex/pr0-perf-gate
May 13, 2026
Merged

test(app): add PR0.3 perf diagnostics#609
Astro-Han merged 9 commits into
devfrom
codex/pr0-perf-gate

Conversation

@Astro-Han
Copy link
Copy Markdown
Owner

@Astro-Han Astro-Han commented May 13, 2026

Summary

Add PR0.3 perf diagnostics without changing Area A behavior: a terminal side-panel perf scenario, failure-only Playwright trace reruns for comparator failures, and an upserted PR perf delta comment.

Why

PR0.2 can block regressions, but when the perf gate fails it still takes too much manual artifact digging to explain why. PR0.3 is the diagnostics layer: make failures replayable, make terminal/panel cost visible, and surface perf deltas directly on the PR.

Related Issue

Part of #600. Task anchor: task #78.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

  • packages/app/e2e/perf/perf-probe.spec.ts: the new terminal-side-panel-open scenario measures the real terminal open path without expanding scope into terminal history interactions.
  • .github/workflows/perf-probe-baseline.yml and packages/app/playwright.config.ts: trace is only enabled for diagnostic reruns after comparator failure, so normal measurement numbers stay uncontaminated.
  • packages/app/script/compare-perf.ts and packages/app/src/testing/perf-metrics.ts: perf comment markdown is stable, marker-based, and upserts one PR comment instead of spamming new comments on every push.

Risk Notes

Low. This PR only changes perf harness, CI diagnostics, and PR reporting. It does not change product behavior, thresholds, or Area A render architecture. The main risk is workflow complexity; normal measurement runs still avoid trace overhead by design.

How To Verify

Typecheck: bun run typecheck -> ok
Perf metrics tests: bun test --preload ./happydom.ts ./src/testing/perf-metrics.test.ts -> 7 passed
Local perf suite: bun --cwd packages/app test:e2e:local:perf -> 5 passed, including terminal-side-panel-open
Trace gate smoke test: PAWWORK_PERF_TRACE=1 bun --cwd packages/app test:e2e:local:perf -> ok
Formatter output: bun ./script/compare-perf.ts --base e2e/perf-results/pr0.1-baseline.json --head e2e/perf-results/pr0.1-baseline.json --output /tmp/pr0.3-compare.json --comment-output /tmp/pr0.3-comment.md -> JSON + markdown generated

Screenshots or Recordings

None. No visible UI changes.

Checklist

  • Human review status is stated above as pending, approved, or not required
  • I linked the related issue, or stated why there is no issue
  • This PR has type, primary area, and priority labels, or I requested maintainer labeling
  • I described the review focus and any meaningful risks
  • I listed the relevant verification steps and the key result for each
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope
  • I manually checked visible UI or copy changes when needed, with screenshots or recordings
  • I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes
  • I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant
  • I reviewed the final diff for unrelated changes and suspicious dependency changes
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English

Summary by CodeRabbit

  • New Features

    • Automated perf-regression detection that posts a formatted perf-delta comment on PRs.
    • New performance test scenario covering terminal side-panel interactions.
    • CLI option to emit a rendered perf comment to a file for CI consumption.
  • Chores

    • CI now runs a confirmed re-check before posting results, produces comparison artifacts, and emits diagnostic traces only when confirmed regressions occur; failing jobs exit accordingly.
    • Playwright trace behavior made configurable via environment.
  • Tests

    • Added tests to render and verify the perf comparison comment output.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0ae90249-87e8-4b9d-aaf3-bc402aec3314

📥 Commits

Reviewing files that changed from the base of the PR and between bf4ac7f and a5a827e.

📒 Files selected for processing (1)
  • .github/workflows/perf-probe-baseline.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/perf-probe-baseline.yml

📝 Walkthrough

Walkthrough

This PR adds perf baseline comment rendering and CLI output, wires comment creation/update into the perf comparator workflow (with conditional trace captures and job failure on regressions), makes Playwright trace mode env-configurable, and adds a terminal-side-panel baseline perf test.

Changes

Perf Regression Detection with PR Comments and Traces

Layer / File(s) Summary
Perf Baseline Comment Rendering
packages/app/src/testing/perf-metrics.ts, packages/app/src/testing/perf-metrics.test.ts
Exports PERF_COMMENT_MARKER constant and renderPerfBaselineComment() function to generate Markdown comparison comments with delta formatting, scenario tables, and status labels. Tests verify marker presence, delta headers, and "warn"/"fail" status indicators.
Playwright Trace Configuration
packages/app/playwright.config.ts
Adds conditional trace mode selection via process.env.PAWWORK_PERF_TRACE to toggle between always-on and on-first-retry Playwright trace capture.
Compare Script Comment Output
packages/app/script/compare-perf.ts
Extends comparison script with --comment-output flag to write rendered baseline comments to file, integrating the comment rendering from the first layer.
Workflow Comment and Diagnostics
.github/workflows/perf-probe-baseline.yml
Adds workflow permissions, expands PR trigger paths, copies head Playwright config into base, marks comparator step(s) with continue-on-error, writes a perf-comment.md artifact, posts/updates a PR comment matched by a fixed HTML marker, captures base/head perf traces when both comparisons fail, and exits with code 1 on confirmed regression.
Terminal Side Panel Baseline Test
packages/app/e2e/perf/perf-probe.spec.ts
Adds new terminal-side-panel-open baseline test that runs three sessions, opens the terminal side panel, waits for focus idle, snapshots probe output per run, ensures the panel closes, and records summarized scenario results.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • Astro-Han/pawwork#607: Related baseline workflow and perf-probe pipeline changes that this PR extends with comment rendering and PR comment integration.
  • Astro-Han/pawwork#608: Earlier perf comparator plumbing that this PR builds upon by adding --comment-output and rendered comments.

🐰 I hop through metrics, whiskers twitching fast,

I stamp a tiny marker so comparisons last,
I render markdown rows with a cheerful thump,
When traces show a lag I nudge the devs to jump,
I bound away with data crumbs and a happy drum.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test(app): add PR0.3 perf diagnostics' directly and clearly describes the main change: adding PR0.3 perf diagnostics functionality to the app's test suite.
Description check ✅ Passed The description is comprehensive and follows the template structure with all key sections present: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify, Screenshots, and a completed Checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/pr0-perf-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes user-path files (packages/app/src/testing/perf-metrics.test.ts, packages/app/src/testing/perf-metrics.ts)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@github-actions github-actions Bot added ci Continuous integration / GitHub Actions app Application behavior and product flows labels May 13, 2026
@Astro-Han Astro-Han added the P2 Medium priority label May 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/perf-probe-baseline.yml (1)

101-140: 💤 Low value

Consider extracting the marker constant to reduce duplication.

The marker string "<!-- pawwork-perf-probe-baseline -->" is hardcoded here and also defined as PERF_COMMENT_MARKER in packages/app/src/testing/perf-metrics.ts. While GitHub Actions can't directly import TypeScript constants, this duplication creates a maintenance risk if the marker changes.

Consider documenting the coupling in a comment, or extracting the marker to a shared location (e.g., a shell variable at the workflow level that could be sourced from package.json or a config file).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/perf-probe-baseline.yml around lines 101 - 140, The
hardcoded marker "<!-- pawwork-perf-probe-baseline -->" in the GitHub Action
(variable name marker in the script) duplicates PERF_COMMENT_MARKER from
packages/app/src/testing/perf-metrics.ts; to fix, expose the marker via a single
source consumed by the workflow (e.g., define PERF_COMMENT_MARKER as a
workflow-level env variable or read it from package.json/config and set env:
PERF_COMMENT_MARKER) and replace the inline string with
process.env.PERF_COMMENT_MARKER (or document the coupling in a clear comment
above the script if extraction is not feasible), ensuring you update references
to the script's marker variable accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/perf-probe-baseline.yml:
- Around line 101-140: The hardcoded marker "<!-- pawwork-perf-probe-baseline
-->" in the GitHub Action (variable name marker in the script) duplicates
PERF_COMMENT_MARKER from packages/app/src/testing/perf-metrics.ts; to fix,
expose the marker via a single source consumed by the workflow (e.g., define
PERF_COMMENT_MARKER as a workflow-level env variable or read it from
package.json/config and set env: PERF_COMMENT_MARKER) and replace the inline
string with process.env.PERF_COMMENT_MARKER (or document the coupling in a clear
comment above the script if extraction is not feasible), ensuring you update
references to the script's marker variable accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7c0fd135-798f-4be6-90a0-e369058df44b

📥 Commits

Reviewing files that changed from the base of the PR and between d3e4e1b and bfe130a.

📒 Files selected for processing (6)
  • .github/workflows/perf-probe-baseline.yml
  • packages/app/e2e/perf/perf-probe.spec.ts
  • packages/app/playwright.config.ts
  • packages/app/script/compare-perf.ts
  • packages/app/src/testing/perf-metrics.test.ts
  • packages/app/src/testing/perf-metrics.ts

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new performance probe scenario for terminal side panel interactions and adds functionality to generate a markdown-formatted performance delta summary comment. The changes include updates to the Playwright configuration to enable tracing, a new test case in the performance probe suite, and helper functions for rendering performance comparison reports. I have reviewed the code and identified a potential issue regarding the terminal state reset in the new test case, as well as a readability improvement for the markdown table generation logic.

Comment thread packages/app/e2e/perf/perf-probe.spec.ts
Comment thread packages/app/src/testing/perf-metrics.ts Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Perf delta summary

Comparator: pass

Scenario interaction median interaction worst long task max tbt frame gap p95 frame gap max jank count cls status
homepage-cold 32 -> 24 (-8) 56 -> 40 (-16) 89 -> 70 (-19) 39 -> 20 (-19) 16.8 -> 16.8 (0) 166.7 -> 150 (-16.7) 4 -> 3 (-1) 0 -> 0 (0) pass
session-streaming-long 40 -> 40 (0) 56 -> 56 (0) 79 -> 0 (-79) 29 -> 0 (-29) 33.3 -> 16.7 (-16.6) 83.3 -> 16.8 (-66.5) 1 -> 0 (-1) 0 -> 0 (0) pass
tool-call-expand 16 -> 16 (0) 24 -> 16 (-8) 0 -> 0 (0) 0 -> 0 (0) 16.7 -> 16.8 (+0.1) 16.7 -> 16.8 (+0.1) 0 -> 0 (0) 0 -> 0 (0) pass
terminal-side-panel-open 40 -> 40 (0) 48 -> 48 (0) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 16.8 (0) 16.8 -> 16.8 (0) 0 -> 0 (0) 0 -> 0 (0) pass
session-scroll-reading 24 -> 32 (+8) 32 -> 32 (0) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 16.8 (0) 16.8 -> 16.8 (0) 0 -> 0 (0) 0.505 -> 0.505 (0) warn: cls

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/perf-probe-baseline.yml:
- Around line 7-8: The workflow's on.pull_request.paths is missing
packages/app/playwright.config.ts so changes to that config can bypass the
perf-probe-baseline job; update the paths list used by the workflow (the
on.pull_request.paths block) to include "packages/app/playwright.config.ts"
alongside the existing "packages/app/src/**" and "packages/ui/src/**" entries so
config-only PRs trigger this workflow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 580d62fa-3bd0-4f90-be67-02e65c5600e2

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd27c2 and bf4ac7f.

📒 Files selected for processing (1)
  • .github/workflows/perf-probe-baseline.yml

Comment thread .github/workflows/perf-probe-baseline.yml
@Astro-Han Astro-Han merged commit dc2ea6c into dev May 13, 2026
26 checks passed
@Astro-Han Astro-Han deleted the codex/pr0-perf-gate branch May 13, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app Application behavior and product flows ci Continuous integration / GitHub Actions P2 Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant