ci(fidelity): ratchet-down baseline + wire into lint.yml by patrick-chinchill · Pull Request #72 · Chinchill-AI/chat-sdk-python

patrick-chinchill · 2026-04-24T02:38:31Z

Closes #53.

Summary

scripts/verify_test_fidelity.py already exited non-zero on missing tests, but nothing in CI invoked it. Upstream-TS-test translations drifted undetected: 16 gaps against the pinned chat@4.26.0 tag at the time #53 was filed.

This PR wires the check into CI. Every one of those 16 gaps has since been ported by other PRs in the 0.4.26.2 bundle (#74, #75), so CI now runs in strict mode with an empty baseline.

What changed

Script (`scripts/verify_test_fidelity.py`)

Flags:

Default mode: baseline-enforced (retained for local dev during upstream syncs). Reads scripts/fidelity_baseline.json. Passes if the set of missing tests is a subset of the baseline. Fails on any new miss outside it.
--strict: ignore baseline, fail on any missing. This is what CI runs.
--update-baseline: regenerate the baseline file from the current state.

Infra guardrail: if any mapped TS test file is missing under TS_ROOT (e.g. upstream clone failed or wrong path), the script fails with exit 1 and a clear "upstream checkout missing — cannot verify fidelity" message naming every missing path. Before this guardrail, a missing checkout produced a silent 0/0 pass with "All TS tests have Python equivalents." Applies in every mode including --strict and --update-baseline, before any success branches.

Reproducer:

$ TS_ROOT=/tmp/definitely-missing uv run python scripts/verify_test_fidelity.py; echo "exit=$?"
... exit=1 with the "upstream checkout missing" message

Baseline (`scripts/fidelity_baseline.json`)

Shipped empty — {"missing": {}}. The 16 [post with Plan] gaps that originally populated it were ported in #75, and the 6 [Streaming] StreamingPlan tests in #74. The file is kept (not deleted) so --update-baseline and the baseline-mode workflow still function for future syncs.

CI (`.github/workflows/lint.yml`)

Two new steps:

Clone upstream vercel/chat at pinned parity tag — shallow clone of chat@4.26.0 to /tmp/vercel-chat. No continue-on-error — a failed clone aborts the job loudly at the infra layer.
Test fidelity check (strict — zero missing) — runs verify_test_fidelity.py --strict with TS_ROOT=/tmp/vercel-chat. Still continue-on-error: true so its outcome flows through the existing aggregate-failure gate alongside ruff, audit, and pyrefly.

Defense in depth: if the clone ever regresses to continue-on-error: true in a future edit, the script-level guardrail still fails the fidelity step.

Docs

docs/UPSTREAM_SYNC.md Test fidelity (strict mode) subsection rewritten — documents strict as the CI default, lists the two infra guardrails, and keeps the baseline-mode procedure for future upstream syncs.
CLAUDE.md fidelity paragraph rewritten accordingly.
CHANGELOG.md gains a ### CI / Internals entry under 0.4.26.2 describing the wiring, the guardrails, and the empty baseline.

Self-review gaps closed

An earlier iteration of this PR (before rebasing onto the 0.4.26.2 bundle) shipped baseline mode with:

The script silently skipping missing TS files and reporting 0/0 pass — now fails exit 1.
continue-on-error: true on the clone step combined with (1) → failed clone silently passes CI — continue-on-error is now removed from the clone.

Both are fixed in this revision. See the ci(fidelity): fail script when mapped TS files are missing and ci(fidelity): drop continue-on-error on clone, run --strict with empty baseline commits.

Verification

$ TS_ROOT=/private/tmp/vercel-chat uv run python scripts/verify_test_fidelity.py --strict → exit 0, 588/588 matched, 0 missing
$ TS_ROOT=/tmp/definitely-missing-vercel-chat uv run python scripts/verify_test_fidelity.py → exit 1, "upstream checkout missing" message
$ uv run ruff check src/ tests/ scripts/ → All checks passed
$ uv run ruff format --check src/ tests/ scripts/ → 193 files already formatted
$ uv run pyrefly check → 0 errors
$ uv run python scripts/audit_test_quality.py → Hard failures: 0
$ uv run pytest tests/ --tb=short -q → 3640 passed, 2 skipped

Test plan

CI Lint & Type Check workflow: Test fidelity step shows TOTAL: 588/588 matched (100%), 0 missing in strict mode.
Simulate missing-clone in a scratch branch (TS_ROOT=/tmp/bogus): confirm CI's Test fidelity step fails with exit 1 and the upstream checkout missing message, and that the aggregate-failure gate fails the job.
Simulate failed clone (break the branch name on the clone step in a scratch branch): confirm the job aborts at the clone step, not silently continues.

https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ

Generated by Claude Code

Summary by CodeRabbit

Tests
- Baseline-driven test-fidelity check with strict mode and baseline update; CI fails on new missing translations by default.
- Added unit tests for shared table-rendering utilities.
CI / Workflows
- Fidelity check integrated into workflow reporting and final gating; upstream parity clone is required for strict validation.
Documentation
- Expanded guidance for fidelity checks, baseline maintenance, and upstream sync procedures.
Packaging
- Expanded package metadata keywords and classifiers.
Refactor
- Unified table-cell escaping to a shared utility for consistent rendering.

coderabbitai · 2026-04-24T02:38:37Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds baseline-driven test-fidelity verification: CI now clones a pinned vercel/chat tag and runs scripts/verify_test_fidelity.py --strict against a new baseline JSON; the verifier gained --strict and --update-baseline modes, baseline staleness guards, and reporting integrated into the workflow summary.

Changes

Cohort / File(s)	Summary
Fidelity verifier & baseline `scripts/verify_test_fidelity.py`, `scripts/fidelity_baseline.json`	Introduce baseline persistence and comparison, `--strict` (fail on new missing mappings) and `--update-baseline` modes, stale-baseline guard via `UPSTREAM_PARITY`, and fail-on-missing-upstream-file behavior.
CI integration `.github/workflows/lint.yml`	Clone a pinned `vercel/chat` checkout into `/tmp/vercel-chat`, run verifier with `TS_ROOT` pointed at that clone and `--strict`, add “Test fidelity” to the GitHub step summary, and include fidelity outcome in the overall step-gate.
Docs / guidance `CLAUDE.md`, `docs/UPSTREAM_SYNC.md`	Document strict CI fidelity, baseline semantics, pinned parity tag (`chat@4.26.0`), local non-strict workflow, commands to clone upstream and run the verifier, and baseline regeneration via `--update-baseline`.
Teams table rendering & tests `src/chat_sdk/adapters/teams/format_converter.py`, `tests/test_cards.py`	Replace local `_escape_table_cell` with shared `escape_table_cell` utility; update usage and add unit tests for `escape_table_cell` and `render_gfm_table`.
Packaging metadata `pyproject.toml`	Add `project.keywords` and expand Trove `classifiers` in package metadata.
Changelog & minor docs `CHANGELOG.md`, `CLAUDE.md` (doc edits)	Record CI/internals change: strict fidelity enforcement, removal of `continue-on-error` for upstream clone, and initial empty baseline; update fidelity docs.

Sequence Diagram(s)

sequenceDiagram
  participant GH as GitHub Actions (lint)
  participant Repo as Pinned vercel/chat
  participant Script as verify_test_fidelity.py
  participant Baseline as scripts/fidelity_baseline.json
  participant Summary as GitHub Step Summary

  GH->>Repo: git clone (pinned tag) into /tmp/vercel-chat
  GH->>Script: invoke with TS_ROOT=/tmp/vercel-chat --strict
  Script->>Baseline: load baseline (if present)
  Script->>Repo: scan TS test files / it() entries
  Script->>Script: map TS it() → Python test_*()
  alt upstream TS file missing
    Script->>GH: exit non-zero (missing upstream file)
  else compare results
    alt missing entries not in baseline (NEW)
      Script->>GH: mark NEW missing (will fail under --strict)
    else missing entries covered by baseline
      Script->>GH: mark baselined (not failing)
    end
  end
  Script->>Baseline: optionally update baseline if --update-baseline
  Script->>Summary: emit "Test fidelity" row with outcomes

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Extend verify_test_fidelity.py MAPPING to cover all packages/chat/src/*.test.ts #78 — MAPPING coverage concern: this PR documents and enforces fidelity over a pinned subset and notes MAPPING currently covers 8 of 17 files; issue requests expanding MAPPING to more TS tests.

Poem

🐇 I hopped through branches, cloned the stream,
Pinned a tag, checked tests, chased each missing dream.
New gaps flagged loud, old ones logged and filed,
CI stays strict while locals run more mild.
A carrot for parity — hop, reconcile!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: wiring a test fidelity baseline and enforcement into CI via the lint workflow.
Linked Issues check	✅ Passed	The PR fully implements acceptance criteria from `#53`: verify_test_fidelity.py --strict exits non-zero on missing tests, CI invokes strict mode via lint.yml, and existing gaps were handled by shipping an empty baseline after porting.
Out of Scope Changes check	✅ Passed	All changes are scoped to test fidelity implementation: CI workflow updates, script baseline logic, documentation, and supporting metadata updates. No unrelated refactoring or feature additions detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/compare-chat-py-port-eh67x

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements a baseline-enforced ratchet mechanism for test fidelity checks, enabling CI to pass with known missing tests while preventing new regressions. It introduces scripts/fidelity_baseline.json to track missing tests and adds --strict and --update-baseline flags to the verification script. Review feedback recommends centralizing the hardcoded version string to improve maintainability and optimizing the reporting loop by removing redundant set conversions.

gemini-code-assist · 2026-04-24T02:40:16Z

+            "it. To remove entries: port the TS test to its Python counterpart, "
+            "then regenerate this file with --update-baseline."
+        ),
+        "ts_parity": "chat@4.26.0",


The upstream version 4.26.0 is hardcoded here. This value is currently duplicated in at least five places: src/chat_sdk/__init__.py (as UPSTREAM_PARITY), CLAUDE.md, docs/UPSTREAM_SYNC.md, this script, and the generated baseline JSON. To improve maintainability and prevent these from getting out of sync, consider extracting this version dynamically from the source of truth in src/chat_sdk/__init__.py using a regex or by importing it.

gemini-code-assist · 2026-04-24T02:40:16Z

        if missing:
            for describe, ts_name, _py_name in missing[:5]:
-                print(f"    MISSING: [{describe}] {ts_name}")
+                marker = "NEW" if (describe, ts_name) in set(file_new) else "baselined"


Creating a new set from file_new inside the loop is inefficient as it happens for every missing test. Since baseline_keys is already a set containing the baselined keys for the current file, you can check for membership in it directly (negated) to determine if a miss is 'NEW'.

Suggested change

marker = "NEW" if (describe, ts_name) in set(file_new) else "baselined"

marker = "NEW" if (describe, ts_name) not in baseline_keys else "baselined"

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0a545f922

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T02:45:11Z

        run: |
          failures=0
-          for var in RUFF_CHECK RUFF_FORMAT AUDIT PYREFLY; do
+          for var in RUFF_CHECK RUFF_FORMAT AUDIT FIDELITY PYREFLY; do


Include upstream clone outcome in failure gate

Add the clone step to the aggregate failure loop; otherwise a transient git clone failure (network outage, rate limit, tag fetch issue) does not fail this job. In that case TS_ROOT is missing, verify_test_fidelity.py skips all TS files and exits 0, so the new fidelity protection is silently bypassed while the workflow still reports success.

Useful? React with 👍 / 👎.

coderabbitai

🧹 Nitpick comments (2)

src/chat_sdk/adapters/teams/format_converter.py (1)

191-200: Consider delegating full table rendering to render_gfm_table.

_table_to_gfm still duplicates table line assembly logic. Reusing render_gfm_table would keep output behavior centralized and reduce drift risk.

♻️ Proposed refactor

-from chat_sdk.shared.card_utils import escape_table_cell
+from chat_sdk.shared.card_utils import render_gfm_table
...
-        lines: list[str] = []
-        # Header row
-        lines.append(f"| {' | '.join(escape_table_cell(c) for c in rows[0])} |")
-        # Separator
-        separators = ["---"] * len(rows[0])
-        lines.append(f"| {' | '.join(separators)} |")
-        # Data rows
-        for row in rows[1:]:
-            lines.append(f"| {' | '.join(escape_table_cell(c) for c in row)} |")
-        return "\n".join(lines)
+        headers = rows[0]
+        data_rows = rows[1:]
+        return "\n".join(render_gfm_table(headers, data_rows))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/chat_sdk/adapters/teams/format_converter.py` around lines 191 - 200, The
function _table_to_gfm duplicates table assembly logic; replace its body to
delegate to the central render_gfm_table renderer instead of rebuilding lines
manually: locate _table_to_gfm and have it call and return
render_gfm_table(rows) (or adapt rows to the expected input of
render_gfm_table), removing the duplicated header/separator/data assembly so all
table formatting is centralized in render_gfm_table and escape_table_cell
behavior remains consistent.

tests/test_cards.py (1)

165-210: Add one Teams converter integration assertion for table escaping.

Current additions validate shared helpers, but not the adapter wiring path. A small TeamsFormatConverter._table_to_gfm-level test would protect against future regressions where the converter stops using shared escaping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cards.py` around lines 165 - 210, Add a test that exercises
TeamsFormatConverter._table_to_gfm to assert it uses the shared
escaping/rendering helpers: call TeamsFormatConverter._table_to_gfm (or the
public method that invokes it) with headers and cells containing pipes,
backslashes, and newlines and assert the output contains the escaped pipe
(r"\\|") and collapsed newlines as produced by
escape_table_cell/render_gfm_table; this ensures the adapter wiring still
delegates to shared.card_utils functions (reference
TeamsFormatConverter._table_to_gfm, escape_table_cell, render_gfm_table).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/chat_sdk/adapters/teams/format_converter.py`:
- Around line 191-200: The function _table_to_gfm duplicates table assembly
logic; replace its body to delegate to the central render_gfm_table renderer
instead of rebuilding lines manually: locate _table_to_gfm and have it call and
return render_gfm_table(rows) (or adapt rows to the expected input of
render_gfm_table), removing the duplicated header/separator/data assembly so all
table formatting is centralized in render_gfm_table and escape_table_cell
behavior remains consistent.

In `@tests/test_cards.py`:
- Around line 165-210: Add a test that exercises
TeamsFormatConverter._table_to_gfm to assert it uses the shared
escaping/rendering helpers: call TeamsFormatConverter._table_to_gfm (or the
public method that invokes it) with headers and cells containing pipes,
backslashes, and newlines and assert the output contains the escaped pipe
(r"\\|") and collapsed newlines as produced by
escape_table_cell/render_gfm_table; this ensures the adapter wiring still
delegates to shared.card_utils functions (reference
TeamsFormatConverter._table_to_gfm, escape_table_cell, render_gfm_table).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4cd4131b-d855-4cc0-bbd1-842eeaf6651e

📥 Commits

Reviewing files that changed from the base of the PR and between d0a545f and c26cd97.

📒 Files selected for processing (3)

pyproject.toml
src/chat_sdk/adapters/teams/format_converter.py
tests/test_cards.py

✅ Files skipped from review due to trivial changes (1)

pyproject.toml

patrick-chinchill · 2026-04-24T09:05:47Z

Review verdict: needs changes. Refreshed and reviewed latest head c26cd97 against main. The baseline passes with a fresh upstream clone pinned to chat@4.26.0, but fidelity enforcement can silently pass when the upstream TS root is missing: .github/workflows/lint.yml lines 60-72 make the clone continue-on-error, and scripts/verify_test_fidelity.py lines 280-284 skips every mapped TS file and exits 0. Reproduced with TS_ROOT=/tmp/definitely-missing-vercel-chat python3 scripts/verify_test_fidelity.py, which reported all TS files skipped and exited 0. Make the clone required or make the script fail when mapped TS files are unavailable.

patrick-chinchill · 2026-04-24T10:03:08Z

Valid — reproduced locally that TS_ROOT=/tmp/nonexistent exits 0 with "skipped" for every mapped file. The continue-on-error on the clone + the skip-on-missing in the script combine to a silent pass. Fix should either (a) fail the clone step with continue-on-error: false, (b) make the script fail when ≥N mapped TS files are missing, or (c) both. Not blocking the 0.4.26.2 release bundle but worth pinning down before we trust the baseline check.

Verify-test-fidelity ran out-of-band with no CI invocation, so missing upstream-TS-test translations drifted undetected (16 gaps at 4.26.0, 40 against upstream main). Now: - Default mode is baseline-enforced via `scripts/fidelity_baseline.json`. CI fails on new misses outside the baseline; accepts baselined misses as-is. Fixed misses are reported with a nudge to tighten the baseline. - `--strict` ignores the baseline (eventual target once baseline hits 0). - `--update-baseline` regenerates the baseline file. - Initial baseline captures the 16 `[post with Plan]` gaps tracked in #55, pinned to `chat@4.26.0`. - lint.yml clones `vercel/chat@4.26.0` and runs the check; aggregates into the existing "fail if any step failed" gate. CLAUDE.md + docs/UPSTREAM_SYNC.md document the workflow. https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ

PyPI search has no signal today — keywords field was absent, classifiers covered only Chat + Typed. Added 12 keywords covering high-volume search shapes (generic, per-platform, asyncio, vercel) and 4 classifiers (Communications parent, Internet, Libraries :: Application Frameworks, Libraries :: Python Modules) matching patterns used by slack-sdk, python-telegram-bot, discord.py, errbot. No runtime change; hatchling build output is unaffected. https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ

teams/format_converter.py had a byte-identical private copy of shared/card_utils.escape_table_cell. Removed the copy, imported the shared function, renamed the 2 call sites. Also added direct unit tests for shared.card_utils.escape_table_cell + render_gfm_table — both were previously covered only transitively through adapter tests. First sub-PR of #70 (shared-helper consolidation). Scope check: most of #70's five planned helpers already live in shared/card_utils.py — only escape_table_cell (this PR) and button-style mapping (Teams + Discord, remaining work) are still duplicated per-adapter. https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ

Previously `verify_test_fidelity.py` printed "SKIPPED (file not found)" for any mapped TS test whose source didn't exist under `TS_ROOT`, then summed 0 matches + 0 missing and exited 0 with "All TS tests have Python equivalents." Combined with `continue-on-error: true` on the upstream-clone step, that made a silently-failing clone report "fidelity check passed" in CI. Now the script tracks missing-TS-file hits separately from real successes. If any mapped TS file is absent at end-of-run, the script prints a clear "upstream checkout missing — cannot verify fidelity" message naming every missing path, includes the clone command hint, and exits 1. This fires before strict/baseline/update-baseline success branches so no mode can accidentally mask it. Reproducer: TS_ROOT=/tmp/definitely-missing uv run python scripts/verify_test_fidelity.py Before: exit 0. After: exit 1 with infra-level error message. Closes self-review gap #1 on #72.

…y baseline Two infra-level changes to the fidelity check: 1. `.github/workflows/lint.yml` — the `Clone upstream vercel/chat at pinned parity tag` step no longer carries `continue-on-error: true`. The clone is infrastructure the fidelity check depends on; if it fails, the job should fail there rather than swallow the failure and hope a later step catches it. Combined with the script-level guard from the previous commit, this is defense in depth. 2. CI now runs `scripts/verify_test_fidelity.py --strict`. Every `[post with Plan]` test that was baselined is now ported (PR #75 and PR #74 in the 0.4.26.2 bundle), so the repo ships at 0 missing. `scripts/fidelity_baseline.json` is reduced to `{"missing": {}}` (metadata retained so `--update-baseline` and the documented workflow still function for future upstream syncs). Closes self-review gap #2 on #72.

- CLAUDE.md fidelity section: document that CI runs `--strict` and that the script now fails loudly on a missing upstream checkout. Baseline mode is retained but described as the local-dev fallback rather than the CI default. - docs/UPSTREAM_SYNC.md: same update, plus clarify the two infra guardrails (no `continue-on-error` on the clone, script-level missing-TS-file check) and rewrite the workflow table for strict mode as the default. - CHANGELOG.md: add a `### CI / Internals` subsection under the unreleased 0.4.26.2 entry describing the fidelity-check wiring, the two infra guardrails, and the empty baseline. Closes #53, #72.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

src/chat_sdk/adapters/teams/format_converter.py (1)

191-200: Prefer shared table renderer to avoid duplicate formatting logic.

Lines 191-200 still reimplement behavior already provided by render_gfm_table in src/chat_sdk/shared/card_utils.py.

♻️ Proposed refactor

-from chat_sdk.shared.card_utils import escape_table_cell
+from chat_sdk.shared.card_utils import render_gfm_table
@@
-        lines: list[str] = []
-        # Header row
-        lines.append(f"| {' | '.join(escape_table_cell(c) for c in rows[0])} |")
-        # Separator
-        separators = ["---"] * len(rows[0])
-        lines.append(f"| {' | '.join(separators)} |")
-        # Data rows
-        for row in rows[1:]:
-            lines.append(f"| {' | '.join(escape_table_cell(c) for c in row)} |")
-        return "\n".join(lines)
+        return "\n".join(render_gfm_table(rows[0], rows[1:]))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/chat_sdk/adapters/teams/format_converter.py` around lines 191 - 200, This
block duplicates table rendering logic; replace the manual construction using
escape_table_cell and join with a call to the shared helper render_gfm_table
from src/chat_sdk/shared/card_utils.py: import render_gfm_table at the top of
src/chat_sdk/adapters/teams/format_converter.py and call render_gfm_table(rows)
(or pass the same `rows` variable) instead of building `lines` and joining them;
remove the duplicated separators/code around the for-loop and ensure any
existing escape_table_cell usage is not redundantly applied before calling
render_gfm_table.

tests/test_cards.py (1)

202-206: Consider exact-line assertions for stronger regression detection.

Lines 204-205 currently use substring checks; exact row checks would also catch delimiter/spacing regressions.

🔍 Suggested test tightening

     def test_cells_are_escaped(self):
         lines = render_gfm_table(["col"], [["pipe|inside"], ["has\nnewline"]])
-        assert r"pipe\|inside" in lines[2]
-        assert "has newline" in lines[3]
+        assert lines[2] == r"| pipe\|inside |"
+        assert lines[3] == "| has newline |"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_cards.py` around lines 202 - 206, Update the
test_cells_are_escaped to assert exact rendered table rows rather than
substrings: locate the test_cells_are_escaped function and change the two
substring assertions against lines[2] and lines[3] to full-line equality checks
that match the exact GFM table row formatting produced by render_gfm_table
(including pipe delimiters, spacing, and the escaped pipe backslash for
"pipe|inside" and the newline-normalized form for "has\nnewline"). Ensure the
expected strings reflect the exact characters render_gfm_table outputs (e.g.,
escaped pipes and single-space cell separation) so the test will fail on
delimiter/spacing regressions.

.github/workflows/lint.yml (1)

81-92: Surface clone_upstream failures explicitly in the summary.

If the clone step fails, fidelity is skipped and the summary currently points at the downstream symptom instead of the root cause. Adding steps.clone_upstream.outcome to the table makes the new infra guardrail much easier to triage.

Suggested tweak

           echo "| Ruff check | ${{ steps.ruff_check.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Ruff format | ${{ steps.ruff_format.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Test audit | ${{ steps.audit.outcome }} |" >> $GITHUB_STEP_SUMMARY
+          echo "| Clone upstream | ${{ steps.clone_upstream.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Test fidelity | ${{ steps.fidelity.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Pyrefly | ${{ steps.pyrefly.outcome }} |" >> $GITHUB_STEP_SUMMARY

Also applies to: 100-117

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/lint.yml around lines 81 - 92, Add an explicit row for the
clone step outcome to the GitHub Actions summary so upstream clone failures are
visible: in the "Report results" step (the block that writes to
$GITHUB_STEP_SUMMARY and currently echoes rows for ruff_check, ruff_format,
audit, fidelity, pyrefly) add an entry that echoes the value of
steps.clone_upstream.outcome (e.g., echo "| Clone upstream | ${{
steps.clone_upstream.outcome }} |") and make the same change in the duplicate
summary block later in the file (the second report block covering lines
~100-117) so both summary places show clone_upstream outcome for easier triage.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 107-114: Add an explicit local upstream checkout step to the
validation instructions so users know how to create /tmp/vercel-chat before
running scripts; update the CLAUDE.md validation command block that references
verify_test_fidelity.py to include a note and the checkout command (e.g., clone
or checkout the upstream repo into /tmp/vercel-chat), and mention that
verify_test_fidelity.py will fail if the mapped TS files under /tmp/vercel-chat
are missing so readers run the checkout first.

In `@scripts/verify_test_fidelity.py`:
- Around line 255-260: In main(), add an early mutual-exclusion check that
rejects using --strict together with --update-baseline: if both strict_mode and
update_baseline are true, print a clear error/banner and exit with non-zero
status; ensure this guard appears before baseline is loaded or possibly
rewritten (around where baseline is set) and similarly add the same
mutually-exclusive check before any later branches that handle update_baseline
(references: main, strict_mode, update_baseline, baseline) so the script never
proceeds to rewrite the baseline when strict mode was requested (also apply the
same guard to the other locations noted near the second and third branches).
- Around line 14-19: Update the module docstring at the top of
verify_test_fidelity.py and the generated baseline comment text (the string used
when writing scripts/fidelity_baseline.json; see the code that composes or
writes the baseline header around lines ~233-240) to reflect the current CI
contract: CI already runs with --strict, so the baseline is advisory/history
only and regenerating it does not change CI behavior; change wording to state
that default/baseline mode is for local tracking and that CI fails unless
--strict is explicitly relaxed locally. Ensure both the top-level docstring and
the generated baseline comment use the new phrasing consistently.
- Around line 218-227: The load_baseline function currently ignores the JSON's
ts_parity; update load_baseline to read data.get("ts_parity") and validate it
against the current expected parity constant (e.g., EXPECTED_TS_PARITY) before
using the "missing" entries; if the parity mismatches, fail fast by raising an
informative exception or returning an empty baseline and logging an error so old
baselines aren't applied to new upstream checkouts—apply the same parity check
in the other baseline-loading path referenced around the later block (the same
ts_parity key handling).

---

Nitpick comments:
In @.github/workflows/lint.yml:
- Around line 81-92: Add an explicit row for the clone step outcome to the
GitHub Actions summary so upstream clone failures are visible: in the "Report
results" step (the block that writes to $GITHUB_STEP_SUMMARY and currently
echoes rows for ruff_check, ruff_format, audit, fidelity, pyrefly) add an entry
that echoes the value of steps.clone_upstream.outcome (e.g., echo "| Clone
upstream | ${{ steps.clone_upstream.outcome }} |") and make the same change in
the duplicate summary block later in the file (the second report block covering
lines ~100-117) so both summary places show clone_upstream outcome for easier
triage.

In `@src/chat_sdk/adapters/teams/format_converter.py`:
- Around line 191-200: This block duplicates table rendering logic; replace the
manual construction using escape_table_cell and join with a call to the shared
helper render_gfm_table from src/chat_sdk/shared/card_utils.py: import
render_gfm_table at the top of src/chat_sdk/adapters/teams/format_converter.py
and call render_gfm_table(rows) (or pass the same `rows` variable) instead of
building `lines` and joining them; remove the duplicated separators/code around
the for-loop and ensure any existing escape_table_cell usage is not redundantly
applied before calling render_gfm_table.

In `@tests/test_cards.py`:
- Around line 202-206: Update the test_cells_are_escaped to assert exact
rendered table rows rather than substrings: locate the test_cells_are_escaped
function and change the two substring assertions against lines[2] and lines[3]
to full-line equality checks that match the exact GFM table row formatting
produced by render_gfm_table (including pipe delimiters, spacing, and the
escaped pipe backslash for "pipe|inside" and the newline-normalized form for
"has\nnewline"). Ensure the expected strings reflect the exact characters
render_gfm_table outputs (e.g., escaped pipes and single-space cell separation)
so the test will fail on delimiter/spacing regressions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b1c510c3-d12e-4f9f-9f6e-d51ed8f71fbe

📥 Commits

Reviewing files that changed from the base of the PR and between c26cd97 and 404bf3e.

📒 Files selected for processing (9)

.github/workflows/lint.yml
CHANGELOG.md
CLAUDE.md
docs/UPSTREAM_SYNC.md
pyproject.toml
scripts/fidelity_baseline.json
scripts/verify_test_fidelity.py
src/chat_sdk/adapters/teams/format_converter.py
tests/test_cards.py

✅ Files skipped from review due to trivial changes (3)

scripts/fidelity_baseline.json
pyproject.toml
docs/UPSTREAM_SYNC.md

- CLAUDE.md: add local clone command to the quick reference so the "fails if missing" rule is actionable - verify_test_fidelity.py docstring/baseline-comment: reflect strict-mode CI contract (was still describing baseline mode) - load_baseline: validate ts_parity against UPSTREAM_PARITY so a stale baseline can't silently mask upstream drift - argparse: reject --strict + --update-baseline together (exit 2) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/verify_test_fidelity.py`:
- Line 282: The baseline writer currently hardcodes the parity string
("chat@4.26.0") causing parity mismatches with load_baseline(); update
write_baseline() to call the same parity resolution used by load_baseline()—use
_current_parity_tag() or UPSTREAM_PARITY (the dynamic source) to populate the
"ts_parity" field so written baselines reflect the current parity tag rather
than a hardcoded version.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 213ee06a-615f-4fb4-a3fb-48939d721599

📥 Commits

Reviewing files that changed from the base of the PR and between 404bf3e and 9400d31.

📒 Files selected for processing (2)

CLAUDE.md
scripts/verify_test_fidelity.py

… claim Addresses 2 medium findings from local adversarial review on #72: - write_baseline: preserve existing _comment rather than overwriting with boilerplate on each --update-baseline run. - CHANGELOG / CLAUDE.md / fidelity_baseline.json: "strict fidelity" now qualified as "strict fidelity for mapped core files" since MAPPING covers 8 of 17 packages/chat/src/*.test.ts files. Follow-ups filed for (a) MAPPING expansion to full core-package scope, (b) pinning upstream clone SHA in lint.yml, (c) tightening fuzzy_match against hyphen-stripping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

scripts/verify_test_fidelity.py (1)

299-302: ⚠️ Potential issue | 🟠 Major

Stop hardcoding the upstream parity tag in runtime output.

load_baseline() now validates against _current_parity_tag(), but write_baseline() still emits "chat@4.26.0" and the recovery hint tells users to clone that same fixed tag. After the next UPSTREAM_PARITY bump, --update-baseline will write a baseline that immediately fails baseline mode, and the error output will point at the wrong checkout.

♻️ Proposed fix

 def write_baseline(path: Path, all_missing: dict[str, list], total_ts: int) -> None:
@@
-    payload = {
+    parity = _current_parity_tag()
+    if parity is None:
+        raise RuntimeError("Could not determine UPSTREAM_PARITY for baseline write.")
+
+    payload = {
         "_comment": existing_comment if existing_comment is not None else _DEFAULT_BASELINE_COMMENT,
-        "ts_parity": "chat@4.26.0",
+        "ts_parity": parity,
         "total_ts_tests": total_ts,
         "total_missing": sum(len(v) for v in all_missing.values()),
         "missing": {
@@
     if missing_ts_files:
+        parity = _current_parity_tag() or "chat@4.26.0"
         print(
             f"\nupstream checkout missing — cannot verify fidelity. "
             f"{len(missing_ts_files)} mapped TS file(s) not found under TS_ROOT={TS_ROOT!r}:"
@@
         print(
             "\nClone the upstream repo at the pinned parity tag, e.g.:\n"
-            "  git clone --depth 1 --branch chat@4.26.0 "
+            f"  git clone --depth 1 --branch {parity} "
             "https://github.com/vercel/chat.git /tmp/vercel-chat\n"
             "then re-run with TS_ROOT=/tmp/vercel-chat."
         )

Also applies to: 429-431

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/verify_test_fidelity.py` around lines 299 - 302, The payload
currently hardcodes "chat@4.26.0" which will drift from the runtime check;
replace the literal in the payload construction (and any recovery hint text
emitted by write_baseline()) with the dynamic value returned by
_current_parity_tag(), and ensure write_baseline() uses the same
_current_parity_tag() call as load_baseline() so --update-baseline writes a
baseline that validates against the current parity tag and the recovery hint
points to the correct checkout.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/UPSTREAM_SYNC.md`:
- Around line 98-100: The documentation row implies a file-scoped escape hatch
that doesn't exist; update the text to state the real options: because
verify_test_fidelity.py and lint.yml only toggle fidelity repo-wide, you must
either keep CI strict (run with --strict and port the new TypeScript tests) or
intentionally relax the repo-wide fidelity invocation (e.g., change the lint.yml
call to remove --strict or run --update-baseline) in a separate commit; replace
“switch the workflow back to non-strict default for that file” with this
explicit two‑choice guidance referencing verify_test_fidelity.py, lint.yml,
--strict, and --update-baseline.

---

Duplicate comments:
In `@scripts/verify_test_fidelity.py`:
- Around line 299-302: The payload currently hardcodes "chat@4.26.0" which will
drift from the runtime check; replace the literal in the payload construction
(and any recovery hint text emitted by write_baseline()) with the dynamic value
returned by _current_parity_tag(), and ensure write_baseline() uses the same
_current_parity_tag() call as load_baseline() so --update-baseline writes a
baseline that validates against the current parity tag and the recovery hint
points to the correct checkout.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f2eb2343-dd23-4fac-a577-1b8ed673f9a3

📥 Commits

Reviewing files that changed from the base of the PR and between 9400d31 and 7b27e2d.

📒 Files selected for processing (5)

CHANGELOG.md
CLAUDE.md
docs/UPSTREAM_SYNC.md
scripts/fidelity_baseline.json
scripts/verify_test_fidelity.py

✅ Files skipped from review due to trivial changes (1)

scripts/fidelity_baseline.json

Round-2 local review caught 3 spots the round-1 sweep missed: - script module docstring still said "zero missing against chat@4.26.0" unqualified (renders in --help and source) - lint.yml step title "Test fidelity check (strict — zero missing)" renders unqualified in the Actions UI - CHANGELOG entry had "Closes #53, #72"; a PR can't close itself Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… default comment Round-3 local review caught two gaps: - write_baseline hardcoded ts_parity="chat@4.26.0". Would self-trap on the next upstream sync (regenerated baseline carries stale tag -> next run fails with re-run-update-baseline advice user just ran). Now reads _current_parity_tag(). - _DEFAULT_BASELINE_COMMENT didn't mention scope framing. Every other user-facing surface carries the "mapped core files / 8 of 17 / MAPPING" language; rm-ing the baseline and regenerating silently dropped it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

patrick-chinchill marked this pull request as ready for review April 24, 2026 02:42

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

claude and others added 6 commits April 24, 2026 03:24

patrick-chinchill force-pushed the claude/compare-chat-py-port-eh67x branch from c26cd97 to 404bf3e Compare April 24, 2026 10:28

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread CLAUDE.md Outdated

Comment thread scripts/verify_test_fidelity.py Outdated

Comment thread scripts/verify_test_fidelity.py

Comment thread scripts/verify_test_fidelity.py

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread scripts/verify_test_fidelity.py Outdated

This was referenced Apr 24, 2026

Extend verify_test_fidelity.py MAPPING to cover all packages/chat/src/*.test.ts #78

Open

Pin upstream clone SHA in lint.yml + log it for fidelity reproducibility #79

Open

Tighten fuzzy_match() to avoid accidental hyphen-stripped matches #80

Open

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread docs/UPSTREAM_SYNC.md

patrick-chinchill and others added 2 commits April 24, 2026 10:49

patrick-chinchill merged commit c36c8aa into main Apr 24, 2026
10 checks passed

patrick-chinchill deleted the claude/compare-chat-py-port-eh67x branch April 24, 2026 20:21

	marker = "NEW" if (describe, ts_name) in set(file_new) else "baselined"
	marker = "NEW" if (describe, ts_name) not in baseline_keys else "baselined"

Conversation

patrick-chinchill commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Script (scripts/verify_test_fidelity.py)

Baseline (scripts/fidelity_baseline.json)

CI (.github/workflows/lint.yml)

Docs

Self-review gaps closed

Verification

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

patrick-chinchill commented Apr 24, 2026

Uh oh!

patrick-chinchill commented Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

patrick-chinchill commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

Script (`scripts/verify_test_fidelity.py`)

Baseline (`scripts/fidelity_baseline.json`)

CI (`.github/workflows/lint.yml`)

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading