ci(coverage): post sticky PR comment with vitest coverage delta#124
Conversation
Coverage floors in `vite.config.ts` (`statements: 74, branches: 64, functions: 83, lines: 75`) were a one-time baseline with no signal when actual coverage drifted above them — a 15pp regression could still pass the gate without the floor having drifted noticeably from reality. Daily review (#87 finding 42ede76.3) flagged the gap-grows-silently risk. Owner steered: rather than the bot's "advisory `::warning::` annotation", mirror the existing `size-limit-action` sticky-PR-comment pattern. A reviewable Markdown table on the PR is much higher signal than a buried CI annotation, and reuses infra reviewers already read. Adds: - `scripts/coverageThresholds.mjs` (+ `.d.mts` shim) — single source of truth for the floors and `DRIFT_WARN_PP = 5` envelope. Imported by `vite.config.ts` for the actual gate, by the comment script for rendering. The shim gives TS / ESLint type-checked references in `vite.config.ts` (the imported value otherwise resolves to `any`, tripping `@typescript-eslint/no-unsafe-assignment`). - `scripts/coverage-pr-comment.mjs` — reads `coverage-summary.json` for the PR (and optionally for base), renders a Markdown table with actual %, delta vs base, floor, and a status column (`✅` / `❌ below floor` / `⚠️ Npp above floor — consider re-baselining`). `--render-only` prints to stdout (used by `npm run coverage:report`); `--post` upserts a sticky comment via the GitHub REST API, identified by hidden marker `<!-- coverage-pr-comment -->`. - `npm run coverage:report` for local preview. CI wiring (`.github/workflows/ci.yml`): - `test-core` (ubuntu cell) uploads `coverage-summary.json` as artifact `coverage-summary` with `if: always()` — vitest still writes the summary on threshold failure, and that is exactly when the comment needs to surface which metric dropped. - New `coverage-pr-comment` job downloads the PR run's artifact, then the latest successful develop run's same artifact via `gh run download`. Permissions: `pull-requests: write` (post), `actions: read` (cross-run download). Not added to `ci-gate` needs — purely informational, never a blocker. - Adds `json-summary` to vitest's `coverage.reporter` list. Other: - `eslint.config.js`: `fetch` global for `scripts/**/*.mjs` (Node 22's built-in fetch); `scripts/*.d.mts` allowed in `parserOptions.projectService.allowDefaultProject` so the type shim parses. Refs #87 finding:42ede76.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
Coverage report
Base-branch coverage summary was not available, so deltas are omitted. Once Floors live in |
size-limit report 📦
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ae3919f579
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| BASE_RUN_ID="$(gh run list \ | ||
| --repo "${GITHUB_REPOSITORY}" \ | ||
| --branch "${{ github.event.pull_request.base.ref }}" \ | ||
| --workflow ci.yml \ | ||
| --status success \ | ||
| --limit 1 \ | ||
| --json databaseId \ | ||
| --jq '.[0].databaseId // empty')" |
There was a problem hiding this comment.
Skip docs-only base runs when selecting coverage baseline
This query selects only the latest successful base-branch run, but successful docs-only runs skip test-core (because that job is gated on needs.changes.outputs.code == 'true') and therefore have no coverage-summary artifact. In that case the next step just omits deltas, even when earlier successful runs do contain coverage data, so PRs can lose baseline comparison after docs-only merges until another code-changing run lands. Consider iterating over recent successful runs until one actually contains coverage-summary.
Useful? React with 👍 / 👎.
|
|
||
| async function upsertStickyComment({ repo, pr, body }) { | ||
| const existing = await ghJson(`/repos/${repo}/issues/${pr}/comments?per_page=100`); | ||
| const prior = existing.find((c) => typeof c.body === 'string' && c.body.includes(COMMENT_MARKER)); |
There was a problem hiding this comment.
Match sticky marker only on bot-authored comments
The upsert logic patches the first comment whose body contains <!-- coverage-pr-comment --> without verifying who authored it. Since issue comments are returned oldest-first, any user comment that includes that marker (accidentally or intentionally) can be selected and overwritten instead of the bot’s prior sticky comment, breaking the expected single-comment behavior. Filter matches by both marker and expected bot identity before issuing PATCH.
Useful? React with 👍 / 👎.
Two P2 follow-ups from Codex on PR #124: 1. The previous base-coverage selector used `gh run list --status success --limit 1` to pick the latest successful develop run. But docs-only runs succeed without producing the `coverage-summary` artifact (the `test-core` job is gated on `needs.changes.outputs.code == 'true'` and skips for docs-only diffs). After a docs merge, every subsequent PR would lose deltas until another code-changing run landed on develop. Switch to querying the artifacts API directly by name + base branch + non-expired. The result is by construction the most recent base-branch run that ACTUALLY has coverage data, regardless of how many docs-only runs landed in between. 2. The sticky-comment upsert previously matched the marker text alone, so any user comment that quoted the marker (e.g. while discussing the script's behavior in review) could be patched in place by the next coverage update — clobbering their reply. Filter matches additionally by `c.user?.type === 'Bot'` so only the bot-authored sticky comment is upserted; user replies that happen to mention the marker are left alone. Refs #87 finding:42ede76.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 205a2b2ecc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| set -euo pipefail | ||
| mkdir -p base-coverage | ||
| BASE_RUN_ID="$(gh api \ | ||
| "repos/${GITHUB_REPOSITORY}/actions/artifacts?name=coverage-summary&per_page=50" \ |
There was a problem hiding this comment.
Paginate artifact lookup before choosing base coverage run
In the coverage-pr-comment job, the base lookup calls the artifacts API with per_page=50 and then filters client-side by workflow_run.head_branch == BASE_REF. If the base branch artifact is older than the newest 50 coverage-summary artifacts (for example after many PR pushes between base-branch merges), BASE_RUN_ID is empty even though a valid base artifact still exists, so the PR comment silently drops deltas. This should page through results (or switch to a branch-scoped run query) before concluding no baseline is available.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| async function upsertStickyComment({ repo, pr, body }) { | ||
| const existing = await ghJson(`/repos/${repo}/issues/${pr}/comments?per_page=100`); |
There was a problem hiding this comment.
Traverse comment pages before upserting sticky coverage comment
The upsert logic searches only the first page of issue comments (per_page=100) for the marker. On PRs that already have more than 100 comments when this workflow first posts, the existing sticky comment can be on a later page, so subsequent runs fail to find it and keep creating new comments instead of updating in place. Iterating through pages (or querying newest-first and continuing until found) is needed to preserve single-comment behavior.
Useful? React with 👍 / 👎.
Two more P2 follow-ups from Codex on PR #124: 1. The artifacts API call used `per_page=50` with no pagination, so a base-branch artifact older than the newest 50 global coverage-summary artifacts could fall off the page entirely even though it still exists, silently dropping deltas. Switch to a branch-scoped runs query (`/actions/runs?branch=…`) and probe per-run artifacts, capped at 100 base-branch runs (5 pages × 20). Also picks the right run more reliably than sorting global artifacts client-side. 2. The sticky-comment upsert read only the first page of issue comments (`per_page=100`). On long-running PRs with > 100 comments, the bot's prior sticky could be on a later page, so subsequent runs would `POST` a fresh comment instead of patching in place — leading to a wall of duplicate coverage tables. Page through comments (capped at 50 pages = 5000 comments — far beyond any realistic PR) before deciding to create. Refs #87 finding:42ede76.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ddcafc2766
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "repos/${GITHUB_REPOSITORY}/actions/runs?branch=${BASE_REF}&per_page=20&page=${PAGE}" \ | ||
| --jq '.workflow_runs[].id')" | ||
| if [ -z "${RUNS}" ]; then break; fi | ||
| for RUN_ID in ${RUNS}; do | ||
| HAS_ART="$(gh api \ | ||
| "repos/${GITHUB_REPOSITORY}/actions/runs/${RUN_ID}/artifacts" \ | ||
| --jq '[.artifacts[] | select(.name == "coverage-summary" and .expired == false)] | length')" |
There was a problem hiding this comment.
Restrict base artifact search to successful CI runs
Fresh evidence: the new paging loop queries actions/runs?branch=... and accepts the first run with a coverage-summary artifact without checking run conclusion. Because this workflow uses cancel-in-progress: true, canceled or failed base-branch runs can still leave artifacts, so PR coverage deltas may be computed against an abandoned/invalid baseline instead of the last green base state.
Useful? React with 👍 / 👎.
| if (!res.ok) { | ||
| const body = await res.text(); | ||
| throw new Error(`GitHub API ${init.method ?? 'GET'} ${path} → ${res.status}: ${body}`); | ||
| } |
There was a problem hiding this comment.
Keep coverage comment posting best-effort on API errors
The script is documented as informational and "Always exits 0," but ghJson throws on any non-2xx response and the caller does not catch it. In fork PRs (read-only token) or transient GitHub API failures, this makes coverage-pr-comment fail hard instead of degrading gracefully, which produces noisy red checks for a non-gating report step.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ddcafc2766
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "repos/${GITHUB_REPOSITORY}/actions/runs?branch=${BASE_REF}&per_page=20&page=${PAGE}" \ | ||
| --jq '.workflow_runs[].id')" |
There was a problem hiding this comment.
Filter base-run lookup to successful runs
The base coverage selector currently queries actions/runs by branch only, so it can choose a failed/cancelled run that still uploaded coverage-summary and then compute PR deltas against that non-baseline data. This makes the sticky comment inaccurate in the exact case the description says should use the latest successful base run; e.g., a flaky/failed develop run can skew or invert the reported delta. Add a success filter (and ideally keep the same workflow scope) before picking BASE_RUN_ID.
Useful? React with 👍 / 👎.
P2 follow-up from Codex on PR #124: the previous iteration walked recent base-branch runs of any conclusion. The artifact upload runs `if: always()` so a failing develop run can ship its summary — without a success filter we could pick that as the baseline and report a misleading delta against not-actually-baseline data. Switch the runs query from `/actions/runs?branch=…` to the workflow-scoped `/actions/workflows/ci.yml/runs?branch=…&status=success`. This: - Filters to fully-green develop runs only — the baseline a reviewer expects when comparing their PR. - Constrains by workflow file so unrelated scheduled or release workflows on the same branch can't match. - Still iterates per-run artifacts to walk past docs-only runs that succeeded without uploading the summary. Refs #87 finding:42ede76.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b77361120
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| BASE_RUN_ID="" | ||
| for PAGE in 1 2 3 4 5; do | ||
| RUNS="$(gh api \ | ||
| "repos/${GITHUB_REPOSITORY}/actions/workflows/ci.yml/runs?branch=${BASE_REF}&status=success&per_page=20&page=${PAGE}" \ |
There was a problem hiding this comment.
Limit baseline run search to push-triggered CI runs
The base-coverage lookup currently queries actions/workflows/ci.yml/runs with only branch and status=success, which means it can pick successful runs from other events (not just branch pushes). Because this endpoint includes pull_request runs unless event is constrained, a newer PR run whose head branch matches BASE_REF can be selected and used as baseline even though its artifact comes from a PR merge ref rather than the base branch tip. That can skew or invert the reported PR coverage delta; add an event=push filter (or equivalent) before choosing BASE_RUN_ID.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b77361120
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| RUNS="$(gh api \ | ||
| "repos/${GITHUB_REPOSITORY}/actions/workflows/ci.yml/runs?branch=${BASE_REF}&status=success&per_page=20&page=${PAGE}" \ | ||
| --jq '.workflow_runs[].id')" |
There was a problem hiding this comment.
Keep base-run discovery best-effort on API failures
In the coverage-pr-comment workflow step, this gh api call runs under set -euo pipefail and is not guarded, so any non-zero response (for example transient GitHub API errors or restricted token scopes in some PR contexts) aborts the step before the comment is posted. That breaks the intended "best-effort" behavior described in the job comments, where missing base data should only omit deltas rather than fail the whole comment job.
Useful? React with 👍 / 👎.
| if (!res.ok) { | ||
| const body = await res.text(); | ||
| throw new Error(`GitHub API ${init.method ?? 'GET'} ${path} → ${res.status}: ${body}`); |
There was a problem hiding this comment.
Handle GitHub API write failures without exiting non-zero
This throw is uncaught in --post mode, so any non-2xx API response (notably 403 when PR-comment writes are unavailable, or transient 5xx) crashes the script and fails the job, even though the file-level contract says this helper is reviewer signal and should always exit 0. Failing hard here makes the informational coverage-comment check noisy and brittle instead of degrading gracefully.
Useful? React with 👍 / 👎.
P2 follow-up from Codex on PR #124: the file-level docstring promises "Always exits 0 — this is advisory, not a gate", but the top-level `upsertStickyComment` call could throw on any non-2xx GitHub API response (transient 5xx, rate limit, 403 from a fork PR without `pull-requests: write`). That throw bubbled up unhandled and crashed the script with a non-zero exit, failing the informational-only CI job. Wrap the upsert in try/catch. On failure, log the error and the rendered comment body to stdout (preserving reviewer signal in the job log) and exit 0. The advisory-not-gate contract is intact again. Refs #87 finding:42ede76.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
Two more P2 follow-ups from Codex on PR #124: 1. The base-run query filtered on `branch=$BASE_REF&status=success` but not by event. The runs endpoint includes `pull_request`- triggered runs whose `head_branch` happens to equal ${BASE_REF}; their artifact reflects the PR's code, not the base tip, and could skew the reported delta. Add `event=push` so only direct merges to the base branch (the actual baseline) match. 2. Under `set -euo pipefail` an unguarded `gh api` failure aborts the whole step before the deltas-omitted fallback can run, breaking the documented best-effort contract for transient errors / restricted-token PR contexts. Wrap each `gh api` invocation with `|| RUNS=""` / `|| HAS_ART="0"` so transient failures degrade to "no baseline found" instead of crashing the comment job. Refs #87 finding:42ede76.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
CI's `Format (Prettier)` check failed on PR #124 because the local pre-commit hook was running prettier only on `*.{ts,tsx}` (plus the existing `json/md/yml/yaml` entry), so the new `scripts/coverage-pr-comment.mjs` skipped formatting on each commit and drifted from the prettier-canonical layout (find callback formatted multi-line vs prettier's single-line preference). - Run prettier over `coverage-pr-comment.mjs` to bring it back in line. - Add `mjs` + `d.mts` to the lint-staged file glob so future edits to ESM scripts go through eslint --fix + prettier --write at commit time, the same as the rest of the TypeScript surface. Refs #87 finding:42ede76.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
## Summary
Doc-audit pass over `docs/plans` + `docs/specs`. Three things land
together:
- **`docs/archive/{plans,specs}/`** — new home for plans whose roadmap
rows have all shipped (or whose goals were folded into a successor)
and specs whose design is now reflected in code. Includes a
`README.md` explaining the policy; `CLAUDE.md` documents the
convention.
- **`git mv` 23 plans + 3 specs into the archive.** The active live
set is now the comprehensive polish-and-harden plan plus three
specs (post-tfjs improvements, mvp-demo, vision), each with a
refreshed status banner.
- **Refresh the live comprehensive plan** against current `develop`:
- PR column updated for rows 16/19/20/3/4/22 (now shipped via
PRs #91 / #98 / #104 / #110 / #113 / #111).
- New "Post-roadmap follow-ups" section covers PRs #92 → #125
(review-bot infra, tracker findings, demo + tfjs hotfixes,
tooling).
- Stale prose-baked counts dropped (size budgets now reference
`package.json#size-limit` only).
- Coverage-thresholds section gains a pointer to the sticky PR
comment shipped in PR #124.
## Other doc fixes
- `README.md`: drop the unverifiable "Phase A milestones (M0–M15) are
all green" claim — the milestones don't exist as documented IDs
anywhere; replace with a pointer to the live polish plan.
- `vision.md`: refresh cadence note (was pinned to 2026-04-19 + "next
review at 1.0").
- `2026-04-24-post-tfjs-improvements.md`: mark recommended-order items
that have shipped (PRs #61, #76, #77, #83, #84, #91, #94, #96,
#104, #113), link the active roadmap as the heir.
- `mvp-demo.md`: status banner explaining where active polish work is
now tracked.
## Mechanical
- Update inline cross-refs in `CLAUDE.md`, `eslint.config.js`,
`src/agent/{Agent,AgentModule}.ts`, `tests/unit/exports.test.ts`,
and `docs/daily-reviews/2026-04-25.md` to point at the new
`docs/archive/` paths so links keep resolving.
No code change beyond comment-path updates.
## Test plan
- [x] `npm run verify` green (`format:check` + `lint` + `typecheck` +
`test` + `build` + `docs`). 523 tests pass; the 2 lint warnings
are pre-existing (`CognitionPipeline.invokeSkillAction` complexity
+ `scoreFailure` param count) and on the ratchet menu.
- [x] `git ls-files docs/archive/` shows the moved files; renames are
preserved (`git log --follow` works for any moved file).
- [ ] Codex review: clean, no blockers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Luis Mendez <hallo@luis-mendez.de>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Coverage floors (
statements: 74, branches: 64, functions: 83, lines: 75) invite.config.tswere a one-time baseline. Dailyreview (#87 finding
42ede76.3) flagged the gap-grows-silently risk:when actual coverage drifts above a floor, no CI signal tells anyone
to re-baseline — so a 15pp regression could land without the floor
ever having moved meaningfully from reality.
Rather than the bot's suggested "advisory
::warning::annotation"(buried in CI logs), this PR mirrors the existing
size-limit-actionsticky-PR-comment pattern. One Markdown table per PR, edited in place
across pushes — high-signal, reviewable inline, reuses infra
reviewers already read.
What changed
Single source of truth. New
scripts/coverageThresholds.mjs(+.d.mtsshim) exportsCOVERAGE_THRESHOLDSandDRIFT_WARN_PP.vite.config.tsimports it for the gate; the comment script importsit for floor labels and drift warnings. Re-baselining now happens in
one place, with the baseline measurement + commit SHA cited in the
docstring.
Sticky PR comment. New
scripts/coverage-pr-comment.mjs:coverage/coverage-summary.jsonfor PR head, optionally forbase
(
✅/❌ below floor/⚠️ Npp above floor — consider re-baselining)--render-onlymode prints to stdout (npm run coverage:reportforlocal preview)
--postupserts a sticky comment via the GitHub REST API,identified by hidden marker
<!-- coverage-pr-comment -->CI wiring (
.github/workflows/ci.yml):test-core(ubuntu cell) uploadscoverage-summary.jsonasartifact
coverage-summarywithif: always()— vitest stillwrites the summary on threshold failure, and that is exactly when
reviewers need the comment to surface which metric dropped.
coverage-pr-commentjob: downloads PR run's artifact, thenthe latest successful develop-run's same artifact via
gh run download. Permissionspull-requests: write(post) +actions: read(cross-run download). NOT added toci-gateneeds— purely informational.
json-summaryto vitest'scoverage.reporterlist so thesummary file gets emitted.
eslint.config.js:
fetchglobal forscripts/**/*.mjs;scripts/*.d.mtsallowed inparserOptions.projectService.allowDefaultProject.Followup (out of scope)
Owner asked about complexity-metric tracking too — separate roadmap
row, separate PR. Tooling choice (file-level
code-complexity,function-level
eslint-plugin-sonarjs, dep-graphmadge) and signalcharacteristics (slow-moving, noisy per-PR) are different enough that
mixing them muddies both surfaces.
Refs #87 finding:42ede76.3
Test plan
npm run verify— 523 tests pass, lint + typecheck + build greennpm run coverage:reportrenders the same Markdown table thisPR's CI run will post (without the GitHub API call)
develop has not yet produced the artifact; expected and documented
in the comment body. Subsequent develop merges + re-pushes will
populate the deltas
@codex review