ci(fidelity): ratchet-down baseline + wire into lint.yml#72
ci(fidelity): ratchet-down baseline + wire into lint.yml#72patrick-chinchill merged 10 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds baseline-driven test-fidelity verification: CI now clones a pinned Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions (lint)
participant Repo as Pinned vercel/chat
participant Script as verify_test_fidelity.py
participant Baseline as scripts/fidelity_baseline.json
participant Summary as GitHub Step Summary
GH->>Repo: git clone (pinned tag) into /tmp/vercel-chat
GH->>Script: invoke with TS_ROOT=/tmp/vercel-chat --strict
Script->>Baseline: load baseline (if present)
Script->>Repo: scan TS test files / it() entries
Script->>Script: map TS it() → Python test_*()
alt upstream TS file missing
Script->>GH: exit non-zero (missing upstream file)
else compare results
alt missing entries not in baseline (NEW)
Script->>GH: mark NEW missing (will fail under --strict)
else missing entries covered by baseline
Script->>GH: mark baselined (not failing)
end
end
Script->>Baseline: optionally update baseline if --update-baseline
Script->>Summary: emit "Test fidelity" row with outcomes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request implements a baseline-enforced ratchet mechanism for test fidelity checks, enabling CI to pass with known missing tests while preventing new regressions. It introduces scripts/fidelity_baseline.json to track missing tests and adds --strict and --update-baseline flags to the verification script. Review feedback recommends centralizing the hardcoded version string to improve maintainability and optimizing the reporting loop by removing redundant set conversions.
| "it. To remove entries: port the TS test to its Python counterpart, " | ||
| "then regenerate this file with --update-baseline." | ||
| ), | ||
| "ts_parity": "chat@4.26.0", |
There was a problem hiding this comment.
The upstream version 4.26.0 is hardcoded here. This value is currently duplicated in at least five places: src/chat_sdk/__init__.py (as UPSTREAM_PARITY), CLAUDE.md, docs/UPSTREAM_SYNC.md, this script, and the generated baseline JSON. To improve maintainability and prevent these from getting out of sync, consider extracting this version dynamically from the source of truth in src/chat_sdk/__init__.py using a regex or by importing it.
| if missing: | ||
| for describe, ts_name, _py_name in missing[:5]: | ||
| print(f" MISSING: [{describe}] {ts_name}") | ||
| marker = "NEW" if (describe, ts_name) in set(file_new) else "baselined" |
There was a problem hiding this comment.
Creating a new set from file_new inside the loop is inefficient as it happens for every missing test. Since baseline_keys is already a set containing the baselined keys for the current file, you can check for membership in it directly (negated) to determine if a miss is 'NEW'.
| marker = "NEW" if (describe, ts_name) in set(file_new) else "baselined" | |
| marker = "NEW" if (describe, ts_name) not in baseline_keys else "baselined" |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d0a545f922
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| run: | | ||
| failures=0 | ||
| for var in RUFF_CHECK RUFF_FORMAT AUDIT PYREFLY; do | ||
| for var in RUFF_CHECK RUFF_FORMAT AUDIT FIDELITY PYREFLY; do |
There was a problem hiding this comment.
Include upstream clone outcome in failure gate
Add the clone step to the aggregate failure loop; otherwise a transient git clone failure (network outage, rate limit, tag fetch issue) does not fail this job. In that case TS_ROOT is missing, verify_test_fidelity.py skips all TS files and exits 0, so the new fidelity protection is silently bypassed while the workflow still reports success.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/chat_sdk/adapters/teams/format_converter.py (1)
191-200: Consider delegating full table rendering torender_gfm_table.
_table_to_gfmstill duplicates table line assembly logic. Reusingrender_gfm_tablewould keep output behavior centralized and reduce drift risk.♻️ Proposed refactor
-from chat_sdk.shared.card_utils import escape_table_cell +from chat_sdk.shared.card_utils import render_gfm_table ... - lines: list[str] = [] - # Header row - lines.append(f"| {' | '.join(escape_table_cell(c) for c in rows[0])} |") - # Separator - separators = ["---"] * len(rows[0]) - lines.append(f"| {' | '.join(separators)} |") - # Data rows - for row in rows[1:]: - lines.append(f"| {' | '.join(escape_table_cell(c) for c in row)} |") - return "\n".join(lines) + headers = rows[0] + data_rows = rows[1:] + return "\n".join(render_gfm_table(headers, data_rows))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/chat_sdk/adapters/teams/format_converter.py` around lines 191 - 200, The function _table_to_gfm duplicates table assembly logic; replace its body to delegate to the central render_gfm_table renderer instead of rebuilding lines manually: locate _table_to_gfm and have it call and return render_gfm_table(rows) (or adapt rows to the expected input of render_gfm_table), removing the duplicated header/separator/data assembly so all table formatting is centralized in render_gfm_table and escape_table_cell behavior remains consistent.tests/test_cards.py (1)
165-210: Add one Teams converter integration assertion for table escaping.Current additions validate shared helpers, but not the adapter wiring path. A small
TeamsFormatConverter._table_to_gfm-level test would protect against future regressions where the converter stops using shared escaping.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cards.py` around lines 165 - 210, Add a test that exercises TeamsFormatConverter._table_to_gfm to assert it uses the shared escaping/rendering helpers: call TeamsFormatConverter._table_to_gfm (or the public method that invokes it) with headers and cells containing pipes, backslashes, and newlines and assert the output contains the escaped pipe (r"\\|") and collapsed newlines as produced by escape_table_cell/render_gfm_table; this ensures the adapter wiring still delegates to shared.card_utils functions (reference TeamsFormatConverter._table_to_gfm, escape_table_cell, render_gfm_table).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/chat_sdk/adapters/teams/format_converter.py`:
- Around line 191-200: The function _table_to_gfm duplicates table assembly
logic; replace its body to delegate to the central render_gfm_table renderer
instead of rebuilding lines manually: locate _table_to_gfm and have it call and
return render_gfm_table(rows) (or adapt rows to the expected input of
render_gfm_table), removing the duplicated header/separator/data assembly so all
table formatting is centralized in render_gfm_table and escape_table_cell
behavior remains consistent.
In `@tests/test_cards.py`:
- Around line 165-210: Add a test that exercises
TeamsFormatConverter._table_to_gfm to assert it uses the shared
escaping/rendering helpers: call TeamsFormatConverter._table_to_gfm (or the
public method that invokes it) with headers and cells containing pipes,
backslashes, and newlines and assert the output contains the escaped pipe
(r"\\|") and collapsed newlines as produced by
escape_table_cell/render_gfm_table; this ensures the adapter wiring still
delegates to shared.card_utils functions (reference
TeamsFormatConverter._table_to_gfm, escape_table_cell, render_gfm_table).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4cd4131b-d855-4cc0-bbd1-842eeaf6651e
📒 Files selected for processing (3)
pyproject.tomlsrc/chat_sdk/adapters/teams/format_converter.pytests/test_cards.py
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
|
Review verdict: needs changes. Refreshed and reviewed latest head c26cd97 against main. The baseline passes with a fresh upstream clone pinned to chat@4.26.0, but fidelity enforcement can silently pass when the upstream TS root is missing: .github/workflows/lint.yml lines 60-72 make the clone continue-on-error, and scripts/verify_test_fidelity.py lines 280-284 skips every mapped TS file and exits 0. Reproduced with TS_ROOT=/tmp/definitely-missing-vercel-chat python3 scripts/verify_test_fidelity.py, which reported all TS files skipped and exited 0. Make the clone required or make the script fail when mapped TS files are unavailable. |
|
Valid — reproduced locally that |
Verify-test-fidelity ran out-of-band with no CI invocation, so missing upstream-TS-test translations drifted undetected (16 gaps at 4.26.0, 40 against upstream main). Now: - Default mode is baseline-enforced via `scripts/fidelity_baseline.json`. CI fails on new misses outside the baseline; accepts baselined misses as-is. Fixed misses are reported with a nudge to tighten the baseline. - `--strict` ignores the baseline (eventual target once baseline hits 0). - `--update-baseline` regenerates the baseline file. - Initial baseline captures the 16 `[post with Plan]` gaps tracked in #55, pinned to `chat@4.26.0`. - lint.yml clones `vercel/chat@4.26.0` and runs the check; aggregates into the existing "fail if any step failed" gate. CLAUDE.md + docs/UPSTREAM_SYNC.md document the workflow. https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ
PyPI search has no signal today — keywords field was absent, classifiers covered only Chat + Typed. Added 12 keywords covering high-volume search shapes (generic, per-platform, asyncio, vercel) and 4 classifiers (Communications parent, Internet, Libraries :: Application Frameworks, Libraries :: Python Modules) matching patterns used by slack-sdk, python-telegram-bot, discord.py, errbot. No runtime change; hatchling build output is unaffected. https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ
teams/format_converter.py had a byte-identical private copy of shared/card_utils.escape_table_cell. Removed the copy, imported the shared function, renamed the 2 call sites. Also added direct unit tests for shared.card_utils.escape_table_cell + render_gfm_table — both were previously covered only transitively through adapter tests. First sub-PR of #70 (shared-helper consolidation). Scope check: most of #70's five planned helpers already live in shared/card_utils.py — only escape_table_cell (this PR) and button-style mapping (Teams + Discord, remaining work) are still duplicated per-adapter. https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ
Previously `verify_test_fidelity.py` printed "SKIPPED (file not found)" for any mapped TS test whose source didn't exist under `TS_ROOT`, then summed 0 matches + 0 missing and exited 0 with "All TS tests have Python equivalents." Combined with `continue-on-error: true` on the upstream-clone step, that made a silently-failing clone report "fidelity check passed" in CI. Now the script tracks missing-TS-file hits separately from real successes. If any mapped TS file is absent at end-of-run, the script prints a clear "upstream checkout missing — cannot verify fidelity" message naming every missing path, includes the clone command hint, and exits 1. This fires before strict/baseline/update-baseline success branches so no mode can accidentally mask it. Reproducer: TS_ROOT=/tmp/definitely-missing uv run python scripts/verify_test_fidelity.py Before: exit 0. After: exit 1 with infra-level error message. Closes self-review gap #1 on #72.
…y baseline Two infra-level changes to the fidelity check: 1. `.github/workflows/lint.yml` — the `Clone upstream vercel/chat at pinned parity tag` step no longer carries `continue-on-error: true`. The clone is infrastructure the fidelity check depends on; if it fails, the job should fail there rather than swallow the failure and hope a later step catches it. Combined with the script-level guard from the previous commit, this is defense in depth. 2. CI now runs `scripts/verify_test_fidelity.py --strict`. Every `[post with Plan]` test that was baselined is now ported (PR #75 and PR #74 in the 0.4.26.2 bundle), so the repo ships at 0 missing. `scripts/fidelity_baseline.json` is reduced to `{"missing": {}}` (metadata retained so `--update-baseline` and the documented workflow still function for future upstream syncs). Closes self-review gap #2 on #72.
- CLAUDE.md fidelity section: document that CI runs `--strict` and that the script now fails loudly on a missing upstream checkout. Baseline mode is retained but described as the local-dev fallback rather than the CI default. - docs/UPSTREAM_SYNC.md: same update, plus clarify the two infra guardrails (no `continue-on-error` on the clone, script-level missing-TS-file check) and rewrite the workflow table for strict mode as the default. - CHANGELOG.md: add a `### CI / Internals` subsection under the unreleased 0.4.26.2 entry describing the fidelity-check wiring, the two infra guardrails, and the empty baseline. Closes #53, #72.
c26cd97 to
404bf3e
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/chat_sdk/adapters/teams/format_converter.py (1)
191-200: Prefer shared table renderer to avoid duplicate formatting logic.Lines 191-200 still reimplement behavior already provided by
render_gfm_tableinsrc/chat_sdk/shared/card_utils.py.♻️ Proposed refactor
-from chat_sdk.shared.card_utils import escape_table_cell +from chat_sdk.shared.card_utils import render_gfm_table @@ - lines: list[str] = [] - # Header row - lines.append(f"| {' | '.join(escape_table_cell(c) for c in rows[0])} |") - # Separator - separators = ["---"] * len(rows[0]) - lines.append(f"| {' | '.join(separators)} |") - # Data rows - for row in rows[1:]: - lines.append(f"| {' | '.join(escape_table_cell(c) for c in row)} |") - return "\n".join(lines) + return "\n".join(render_gfm_table(rows[0], rows[1:]))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/chat_sdk/adapters/teams/format_converter.py` around lines 191 - 200, This block duplicates table rendering logic; replace the manual construction using escape_table_cell and join with a call to the shared helper render_gfm_table from src/chat_sdk/shared/card_utils.py: import render_gfm_table at the top of src/chat_sdk/adapters/teams/format_converter.py and call render_gfm_table(rows) (or pass the same `rows` variable) instead of building `lines` and joining them; remove the duplicated separators/code around the for-loop and ensure any existing escape_table_cell usage is not redundantly applied before calling render_gfm_table.tests/test_cards.py (1)
202-206: Consider exact-line assertions for stronger regression detection.Lines 204-205 currently use substring checks; exact row checks would also catch delimiter/spacing regressions.
🔍 Suggested test tightening
def test_cells_are_escaped(self): lines = render_gfm_table(["col"], [["pipe|inside"], ["has\nnewline"]]) - assert r"pipe\|inside" in lines[2] - assert "has newline" in lines[3] + assert lines[2] == r"| pipe\|inside |" + assert lines[3] == "| has newline |"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cards.py` around lines 202 - 206, Update the test_cells_are_escaped to assert exact rendered table rows rather than substrings: locate the test_cells_are_escaped function and change the two substring assertions against lines[2] and lines[3] to full-line equality checks that match the exact GFM table row formatting produced by render_gfm_table (including pipe delimiters, spacing, and the escaped pipe backslash for "pipe|inside" and the newline-normalized form for "has\nnewline"). Ensure the expected strings reflect the exact characters render_gfm_table outputs (e.g., escaped pipes and single-space cell separation) so the test will fail on delimiter/spacing regressions..github/workflows/lint.yml (1)
81-92: Surfaceclone_upstreamfailures explicitly in the summary.If the clone step fails,
fidelityis skipped and the summary currently points at the downstream symptom instead of the root cause. Addingsteps.clone_upstream.outcometo the table makes the new infra guardrail much easier to triage.Suggested tweak
echo "| Ruff check | ${{ steps.ruff_check.outcome }} |" >> $GITHUB_STEP_SUMMARY echo "| Ruff format | ${{ steps.ruff_format.outcome }} |" >> $GITHUB_STEP_SUMMARY echo "| Test audit | ${{ steps.audit.outcome }} |" >> $GITHUB_STEP_SUMMARY + echo "| Clone upstream | ${{ steps.clone_upstream.outcome }} |" >> $GITHUB_STEP_SUMMARY echo "| Test fidelity | ${{ steps.fidelity.outcome }} |" >> $GITHUB_STEP_SUMMARY echo "| Pyrefly | ${{ steps.pyrefly.outcome }} |" >> $GITHUB_STEP_SUMMARYAlso applies to: 100-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/lint.yml around lines 81 - 92, Add an explicit row for the clone step outcome to the GitHub Actions summary so upstream clone failures are visible: in the "Report results" step (the block that writes to $GITHUB_STEP_SUMMARY and currently echoes rows for ruff_check, ruff_format, audit, fidelity, pyrefly) add an entry that echoes the value of steps.clone_upstream.outcome (e.g., echo "| Clone upstream | ${{ steps.clone_upstream.outcome }} |") and make the same change in the duplicate summary block later in the file (the second report block covering lines ~100-117) so both summary places show clone_upstream outcome for easier triage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Around line 107-114: Add an explicit local upstream checkout step to the
validation instructions so users know how to create /tmp/vercel-chat before
running scripts; update the CLAUDE.md validation command block that references
verify_test_fidelity.py to include a note and the checkout command (e.g., clone
or checkout the upstream repo into /tmp/vercel-chat), and mention that
verify_test_fidelity.py will fail if the mapped TS files under /tmp/vercel-chat
are missing so readers run the checkout first.
In `@scripts/verify_test_fidelity.py`:
- Around line 255-260: In main(), add an early mutual-exclusion check that
rejects using --strict together with --update-baseline: if both strict_mode and
update_baseline are true, print a clear error/banner and exit with non-zero
status; ensure this guard appears before baseline is loaded or possibly
rewritten (around where baseline is set) and similarly add the same
mutually-exclusive check before any later branches that handle update_baseline
(references: main, strict_mode, update_baseline, baseline) so the script never
proceeds to rewrite the baseline when strict mode was requested (also apply the
same guard to the other locations noted near the second and third branches).
- Around line 14-19: Update the module docstring at the top of
verify_test_fidelity.py and the generated baseline comment text (the string used
when writing scripts/fidelity_baseline.json; see the code that composes or
writes the baseline header around lines ~233-240) to reflect the current CI
contract: CI already runs with --strict, so the baseline is advisory/history
only and regenerating it does not change CI behavior; change wording to state
that default/baseline mode is for local tracking and that CI fails unless
--strict is explicitly relaxed locally. Ensure both the top-level docstring and
the generated baseline comment use the new phrasing consistently.
- Around line 218-227: The load_baseline function currently ignores the JSON's
ts_parity; update load_baseline to read data.get("ts_parity") and validate it
against the current expected parity constant (e.g., EXPECTED_TS_PARITY) before
using the "missing" entries; if the parity mismatches, fail fast by raising an
informative exception or returning an empty baseline and logging an error so old
baselines aren't applied to new upstream checkouts—apply the same parity check
in the other baseline-loading path referenced around the later block (the same
ts_parity key handling).
---
Nitpick comments:
In @.github/workflows/lint.yml:
- Around line 81-92: Add an explicit row for the clone step outcome to the
GitHub Actions summary so upstream clone failures are visible: in the "Report
results" step (the block that writes to $GITHUB_STEP_SUMMARY and currently
echoes rows for ruff_check, ruff_format, audit, fidelity, pyrefly) add an entry
that echoes the value of steps.clone_upstream.outcome (e.g., echo "| Clone
upstream | ${{ steps.clone_upstream.outcome }} |") and make the same change in
the duplicate summary block later in the file (the second report block covering
lines ~100-117) so both summary places show clone_upstream outcome for easier
triage.
In `@src/chat_sdk/adapters/teams/format_converter.py`:
- Around line 191-200: This block duplicates table rendering logic; replace the
manual construction using escape_table_cell and join with a call to the shared
helper render_gfm_table from src/chat_sdk/shared/card_utils.py: import
render_gfm_table at the top of src/chat_sdk/adapters/teams/format_converter.py
and call render_gfm_table(rows) (or pass the same `rows` variable) instead of
building `lines` and joining them; remove the duplicated separators/code around
the for-loop and ensure any existing escape_table_cell usage is not redundantly
applied before calling render_gfm_table.
In `@tests/test_cards.py`:
- Around line 202-206: Update the test_cells_are_escaped to assert exact
rendered table rows rather than substrings: locate the test_cells_are_escaped
function and change the two substring assertions against lines[2] and lines[3]
to full-line equality checks that match the exact GFM table row formatting
produced by render_gfm_table (including pipe delimiters, spacing, and the
escaped pipe backslash for "pipe|inside" and the newline-normalized form for
"has\nnewline"). Ensure the expected strings reflect the exact characters
render_gfm_table outputs (e.g., escaped pipes and single-space cell separation)
so the test will fail on delimiter/spacing regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b1c510c3-d12e-4f9f-9f6e-d51ed8f71fbe
📒 Files selected for processing (9)
.github/workflows/lint.ymlCHANGELOG.mdCLAUDE.mddocs/UPSTREAM_SYNC.mdpyproject.tomlscripts/fidelity_baseline.jsonscripts/verify_test_fidelity.pysrc/chat_sdk/adapters/teams/format_converter.pytests/test_cards.py
✅ Files skipped from review due to trivial changes (3)
- scripts/fidelity_baseline.json
- pyproject.toml
- docs/UPSTREAM_SYNC.md
- CLAUDE.md: add local clone command to the quick reference so the "fails if missing" rule is actionable - verify_test_fidelity.py docstring/baseline-comment: reflect strict-mode CI contract (was still describing baseline mode) - load_baseline: validate ts_parity against UPSTREAM_PARITY so a stale baseline can't silently mask upstream drift - argparse: reject --strict + --update-baseline together (exit 2) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/verify_test_fidelity.py`:
- Line 282: The baseline writer currently hardcodes the parity string
("chat@4.26.0") causing parity mismatches with load_baseline(); update
write_baseline() to call the same parity resolution used by load_baseline()—use
_current_parity_tag() or UPSTREAM_PARITY (the dynamic source) to populate the
"ts_parity" field so written baselines reflect the current parity tag rather
than a hardcoded version.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 213ee06a-615f-4fb4-a3fb-48939d721599
📒 Files selected for processing (2)
CLAUDE.mdscripts/verify_test_fidelity.py
… claim Addresses 2 medium findings from local adversarial review on #72: - write_baseline: preserve existing _comment rather than overwriting with boilerplate on each --update-baseline run. - CHANGELOG / CLAUDE.md / fidelity_baseline.json: "strict fidelity" now qualified as "strict fidelity for mapped core files" since MAPPING covers 8 of 17 packages/chat/src/*.test.ts files. Follow-ups filed for (a) MAPPING expansion to full core-package scope, (b) pinning upstream clone SHA in lint.yml, (c) tightening fuzzy_match against hyphen-stripping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/verify_test_fidelity.py (1)
299-302:⚠️ Potential issue | 🟠 MajorStop hardcoding the upstream parity tag in runtime output.
load_baseline()now validates against_current_parity_tag(), butwrite_baseline()still emits"chat@4.26.0"and the recovery hint tells users to clone that same fixed tag. After the nextUPSTREAM_PARITYbump,--update-baselinewill write a baseline that immediately fails baseline mode, and the error output will point at the wrong checkout.♻️ Proposed fix
def write_baseline(path: Path, all_missing: dict[str, list], total_ts: int) -> None: @@ - payload = { + parity = _current_parity_tag() + if parity is None: + raise RuntimeError("Could not determine UPSTREAM_PARITY for baseline write.") + + payload = { "_comment": existing_comment if existing_comment is not None else _DEFAULT_BASELINE_COMMENT, - "ts_parity": "chat@4.26.0", + "ts_parity": parity, "total_ts_tests": total_ts, "total_missing": sum(len(v) for v in all_missing.values()), "missing": { @@ if missing_ts_files: + parity = _current_parity_tag() or "chat@4.26.0" print( f"\nupstream checkout missing — cannot verify fidelity. " f"{len(missing_ts_files)} mapped TS file(s) not found under TS_ROOT={TS_ROOT!r}:" @@ print( "\nClone the upstream repo at the pinned parity tag, e.g.:\n" - " git clone --depth 1 --branch chat@4.26.0 " + f" git clone --depth 1 --branch {parity} " "https://github.com/vercel/chat.git /tmp/vercel-chat\n" "then re-run with TS_ROOT=/tmp/vercel-chat." )Also applies to: 429-431
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/verify_test_fidelity.py` around lines 299 - 302, The payload currently hardcodes "chat@4.26.0" which will drift from the runtime check; replace the literal in the payload construction (and any recovery hint text emitted by write_baseline()) with the dynamic value returned by _current_parity_tag(), and ensure write_baseline() uses the same _current_parity_tag() call as load_baseline() so --update-baseline writes a baseline that validates against the current parity tag and the recovery hint points to the correct checkout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/UPSTREAM_SYNC.md`:
- Around line 98-100: The documentation row implies a file-scoped escape hatch
that doesn't exist; update the text to state the real options: because
verify_test_fidelity.py and lint.yml only toggle fidelity repo-wide, you must
either keep CI strict (run with --strict and port the new TypeScript tests) or
intentionally relax the repo-wide fidelity invocation (e.g., change the lint.yml
call to remove --strict or run --update-baseline) in a separate commit; replace
“switch the workflow back to non-strict default for that file” with this
explicit two‑choice guidance referencing verify_test_fidelity.py, lint.yml,
--strict, and --update-baseline.
---
Duplicate comments:
In `@scripts/verify_test_fidelity.py`:
- Around line 299-302: The payload currently hardcodes "chat@4.26.0" which will
drift from the runtime check; replace the literal in the payload construction
(and any recovery hint text emitted by write_baseline()) with the dynamic value
returned by _current_parity_tag(), and ensure write_baseline() uses the same
_current_parity_tag() call as load_baseline() so --update-baseline writes a
baseline that validates against the current parity tag and the recovery hint
points to the correct checkout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f2eb2343-dd23-4fac-a577-1b8ed673f9a3
📒 Files selected for processing (5)
CHANGELOG.mdCLAUDE.mddocs/UPSTREAM_SYNC.mdscripts/fidelity_baseline.jsonscripts/verify_test_fidelity.py
✅ Files skipped from review due to trivial changes (1)
- scripts/fidelity_baseline.json
Round-2 local review caught 3 spots the round-1 sweep missed: - script module docstring still said "zero missing against chat@4.26.0" unqualified (renders in --help and source) - lint.yml step title "Test fidelity check (strict — zero missing)" renders unqualified in the Actions UI - CHANGELOG entry had "Closes #53, #72"; a PR can't close itself Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… default comment Round-3 local review caught two gaps: - write_baseline hardcoded ts_parity="chat@4.26.0". Would self-trap on the next upstream sync (regenerated baseline carries stale tag -> next run fails with re-run-update-baseline advice user just ran). Now reads _current_parity_tag(). - _DEFAULT_BASELINE_COMMENT didn't mention scope framing. Every other user-facing surface carries the "mapped core files / 8 of 17 / MAPPING" language; rm-ing the baseline and regenerating silently dropped it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #53.
Summary
scripts/verify_test_fidelity.pyalready exited non-zero on missing tests, but nothing in CI invoked it. Upstream-TS-test translations drifted undetected: 16 gaps against the pinnedchat@4.26.0tag at the time #53 was filed.This PR wires the check into CI. Every one of those 16 gaps has since been ported by other PRs in the 0.4.26.2 bundle (#74, #75), so CI now runs in strict mode with an empty baseline.
What changed
Script (
scripts/verify_test_fidelity.py)Flags:
scripts/fidelity_baseline.json. Passes if the set of missing tests is a subset of the baseline. Fails on any new miss outside it.--strict: ignore baseline, fail on any missing. This is what CI runs.--update-baseline: regenerate the baseline file from the current state.Infra guardrail: if any mapped TS test file is missing under
TS_ROOT(e.g. upstream clone failed or wrong path), the script fails with exit 1 and a clear "upstream checkout missing — cannot verify fidelity" message naming every missing path. Before this guardrail, a missing checkout produced a silent 0/0 pass with "All TS tests have Python equivalents." Applies in every mode including--strictand--update-baseline, before any success branches.Reproducer:
Baseline (
scripts/fidelity_baseline.json)Shipped empty —
{"missing": {}}. The 16[post with Plan]gaps that originally populated it were ported in #75, and the 6[Streaming]StreamingPlan tests in #74. The file is kept (not deleted) so--update-baselineand the baseline-mode workflow still function for future syncs.CI (
.github/workflows/lint.yml)Two new steps:
Clone upstream vercel/chat at pinned parity tag— shallow clone ofchat@4.26.0to/tmp/vercel-chat. Nocontinue-on-error— a failed clone aborts the job loudly at the infra layer.Test fidelity check (strict — zero missing)— runsverify_test_fidelity.py --strictwithTS_ROOT=/tmp/vercel-chat. Stillcontinue-on-error: trueso its outcome flows through the existing aggregate-failure gate alongside ruff, audit, and pyrefly.Defense in depth: if the clone ever regresses to
continue-on-error: truein a future edit, the script-level guardrail still fails the fidelity step.Docs
docs/UPSTREAM_SYNC.mdTest fidelity (strict mode) subsection rewritten — documents strict as the CI default, lists the two infra guardrails, and keeps the baseline-mode procedure for future upstream syncs.CLAUDE.mdfidelity paragraph rewritten accordingly.CHANGELOG.mdgains a### CI / Internalsentry under0.4.26.2describing the wiring, the guardrails, and the empty baseline.Self-review gaps closed
An earlier iteration of this PR (before rebasing onto the 0.4.26.2 bundle) shipped baseline mode with:
continue-on-error: trueon the clone step combined with (1) → failed clone silently passes CI —continue-on-erroris now removed from the clone.Both are fixed in this revision. See the
ci(fidelity): fail script when mapped TS files are missingandci(fidelity): drop continue-on-error on clone, run --strict with empty baselinecommits.Verification
Test plan
Lint & Type Checkworkflow:Test fidelitystep showsTOTAL: 588/588 matched (100%), 0 missingin strict mode.TS_ROOT=/tmp/bogus): confirm CI'sTest fidelitystep fails withexit 1and theupstream checkout missingmessage, and that the aggregate-failure gate fails the job.https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ
Generated by Claude Code
Summary by CodeRabbit
Tests
CI / Workflows
Documentation
Packaging
Refactor