Skip to content

ci(fidelity): ratchet-down baseline + wire into lint.yml#72

Merged
patrick-chinchill merged 10 commits intomainfrom
claude/compare-chat-py-port-eh67x
Apr 24, 2026
Merged

ci(fidelity): ratchet-down baseline + wire into lint.yml#72
patrick-chinchill merged 10 commits intomainfrom
claude/compare-chat-py-port-eh67x

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

@patrick-chinchill patrick-chinchill commented Apr 24, 2026

Closes #53.

Summary

scripts/verify_test_fidelity.py already exited non-zero on missing tests, but nothing in CI invoked it. Upstream-TS-test translations drifted undetected: 16 gaps against the pinned chat@4.26.0 tag at the time #53 was filed.

This PR wires the check into CI. Every one of those 16 gaps has since been ported by other PRs in the 0.4.26.2 bundle (#74, #75), so CI now runs in strict mode with an empty baseline.

What changed

Script (scripts/verify_test_fidelity.py)

Flags:

  • Default mode: baseline-enforced (retained for local dev during upstream syncs). Reads scripts/fidelity_baseline.json. Passes if the set of missing tests is a subset of the baseline. Fails on any new miss outside it.
  • --strict: ignore baseline, fail on any missing. This is what CI runs.
  • --update-baseline: regenerate the baseline file from the current state.

Infra guardrail: if any mapped TS test file is missing under TS_ROOT (e.g. upstream clone failed or wrong path), the script fails with exit 1 and a clear "upstream checkout missing — cannot verify fidelity" message naming every missing path. Before this guardrail, a missing checkout produced a silent 0/0 pass with "All TS tests have Python equivalents." Applies in every mode including --strict and --update-baseline, before any success branches.

Reproducer:

$ TS_ROOT=/tmp/definitely-missing uv run python scripts/verify_test_fidelity.py; echo "exit=$?"
... exit=1 with the "upstream checkout missing" message

Baseline (scripts/fidelity_baseline.json)

Shipped empty — {"missing": {}}. The 16 [post with Plan] gaps that originally populated it were ported in #75, and the 6 [Streaming] StreamingPlan tests in #74. The file is kept (not deleted) so --update-baseline and the baseline-mode workflow still function for future syncs.

CI (.github/workflows/lint.yml)

Two new steps:

  1. Clone upstream vercel/chat at pinned parity tag — shallow clone of chat@4.26.0 to /tmp/vercel-chat. No continue-on-error — a failed clone aborts the job loudly at the infra layer.
  2. Test fidelity check (strict — zero missing) — runs verify_test_fidelity.py --strict with TS_ROOT=/tmp/vercel-chat. Still continue-on-error: true so its outcome flows through the existing aggregate-failure gate alongside ruff, audit, and pyrefly.

Defense in depth: if the clone ever regresses to continue-on-error: true in a future edit, the script-level guardrail still fails the fidelity step.

Docs

  • docs/UPSTREAM_SYNC.md Test fidelity (strict mode) subsection rewritten — documents strict as the CI default, lists the two infra guardrails, and keeps the baseline-mode procedure for future upstream syncs.
  • CLAUDE.md fidelity paragraph rewritten accordingly.
  • CHANGELOG.md gains a ### CI / Internals entry under 0.4.26.2 describing the wiring, the guardrails, and the empty baseline.

Self-review gaps closed

An earlier iteration of this PR (before rebasing onto the 0.4.26.2 bundle) shipped baseline mode with:

  1. The script silently skipping missing TS files and reporting 0/0 pass — now fails exit 1.
  2. continue-on-error: true on the clone step combined with (1) → failed clone silently passes CI — continue-on-error is now removed from the clone.

Both are fixed in this revision. See the ci(fidelity): fail script when mapped TS files are missing and ci(fidelity): drop continue-on-error on clone, run --strict with empty baseline commits.

Verification

$ TS_ROOT=/private/tmp/vercel-chat uv run python scripts/verify_test_fidelity.py --strict → exit 0, 588/588 matched, 0 missing
$ TS_ROOT=/tmp/definitely-missing-vercel-chat uv run python scripts/verify_test_fidelity.py → exit 1, "upstream checkout missing" message
$ uv run ruff check src/ tests/ scripts/ → All checks passed
$ uv run ruff format --check src/ tests/ scripts/ → 193 files already formatted
$ uv run pyrefly check → 0 errors
$ uv run python scripts/audit_test_quality.py → Hard failures: 0
$ uv run pytest tests/ --tb=short -q → 3640 passed, 2 skipped

Test plan

  • CI Lint & Type Check workflow: Test fidelity step shows TOTAL: 588/588 matched (100%), 0 missing in strict mode.
  • Simulate missing-clone in a scratch branch (TS_ROOT=/tmp/bogus): confirm CI's Test fidelity step fails with exit 1 and the upstream checkout missing message, and that the aggregate-failure gate fails the job.
  • Simulate failed clone (break the branch name on the clone step in a scratch branch): confirm the job aborts at the clone step, not silently continues.

https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ


Generated by Claude Code

Summary by CodeRabbit

  • Tests

    • Baseline-driven test-fidelity check with strict mode and baseline update; CI fails on new missing translations by default.
    • Added unit tests for shared table-rendering utilities.
  • CI / Workflows

    • Fidelity check integrated into workflow reporting and final gating; upstream parity clone is required for strict validation.
  • Documentation

    • Expanded guidance for fidelity checks, baseline maintenance, and upstream sync procedures.
  • Packaging

    • Expanded package metadata keywords and classifiers.
  • Refactor

    • Unified table-cell escaping to a shared utility for consistent rendering.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds baseline-driven test-fidelity verification: CI now clones a pinned vercel/chat tag and runs scripts/verify_test_fidelity.py --strict against a new baseline JSON; the verifier gained --strict and --update-baseline modes, baseline staleness guards, and reporting integrated into the workflow summary.

Changes

Cohort / File(s) Summary
Fidelity verifier & baseline
scripts/verify_test_fidelity.py, scripts/fidelity_baseline.json
Introduce baseline persistence and comparison, --strict (fail on new missing mappings) and --update-baseline modes, stale-baseline guard via UPSTREAM_PARITY, and fail-on-missing-upstream-file behavior.
CI integration
.github/workflows/lint.yml
Clone a pinned vercel/chat checkout into /tmp/vercel-chat, run verifier with TS_ROOT pointed at that clone and --strict, add “Test fidelity” to the GitHub step summary, and include fidelity outcome in the overall step-gate.
Docs / guidance
CLAUDE.md, docs/UPSTREAM_SYNC.md
Document strict CI fidelity, baseline semantics, pinned parity tag (chat@4.26.0), local non-strict workflow, commands to clone upstream and run the verifier, and baseline regeneration via --update-baseline.
Teams table rendering & tests
src/chat_sdk/adapters/teams/format_converter.py, tests/test_cards.py
Replace local _escape_table_cell with shared escape_table_cell utility; update usage and add unit tests for escape_table_cell and render_gfm_table.
Packaging metadata
pyproject.toml
Add project.keywords and expand Trove classifiers in package metadata.
Changelog & minor docs
CHANGELOG.md, CLAUDE.md (doc edits)
Record CI/internals change: strict fidelity enforcement, removal of continue-on-error for upstream clone, and initial empty baseline; update fidelity docs.

Sequence Diagram(s)

sequenceDiagram
  participant GH as GitHub Actions (lint)
  participant Repo as Pinned vercel/chat
  participant Script as verify_test_fidelity.py
  participant Baseline as scripts/fidelity_baseline.json
  participant Summary as GitHub Step Summary

  GH->>Repo: git clone (pinned tag) into /tmp/vercel-chat
  GH->>Script: invoke with TS_ROOT=/tmp/vercel-chat --strict
  Script->>Baseline: load baseline (if present)
  Script->>Repo: scan TS test files / it() entries
  Script->>Script: map TS it() → Python test_*()
  alt upstream TS file missing
    Script->>GH: exit non-zero (missing upstream file)
  else compare results
    alt missing entries not in baseline (NEW)
      Script->>GH: mark NEW missing (will fail under --strict)
    else missing entries covered by baseline
      Script->>GH: mark baselined (not failing)
    end
  end
  Script->>Baseline: optionally update baseline if --update-baseline
  Script->>Summary: emit "Test fidelity" row with outcomes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐇 I hopped through branches, cloned the stream,
Pinned a tag, checked tests, chased each missing dream.
New gaps flagged loud, old ones logged and filed,
CI stays strict while locals run more mild.
A carrot for parity — hop, reconcile!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: wiring a test fidelity baseline and enforcement into CI via the lint workflow.
Linked Issues check ✅ Passed The PR fully implements acceptance criteria from #53: verify_test_fidelity.py --strict exits non-zero on missing tests, CI invokes strict mode via lint.yml, and existing gaps were handled by shipping an empty baseline after porting.
Out of Scope Changes check ✅ Passed All changes are scoped to test fidelity implementation: CI workflow updates, script baseline logic, documentation, and supporting metadata updates. No unrelated refactoring or feature additions detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/compare-chat-py-port-eh67x

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a baseline-enforced ratchet mechanism for test fidelity checks, enabling CI to pass with known missing tests while preventing new regressions. It introduces scripts/fidelity_baseline.json to track missing tests and adds --strict and --update-baseline flags to the verification script. Review feedback recommends centralizing the hardcoded version string to improve maintainability and optimizing the reporting loop by removing redundant set conversions.

Comment thread scripts/verify_test_fidelity.py Outdated
"it. To remove entries: port the TS test to its Python counterpart, "
"then regenerate this file with --update-baseline."
),
"ts_parity": "chat@4.26.0",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The upstream version 4.26.0 is hardcoded here. This value is currently duplicated in at least five places: src/chat_sdk/__init__.py (as UPSTREAM_PARITY), CLAUDE.md, docs/UPSTREAM_SYNC.md, this script, and the generated baseline JSON. To improve maintainability and prevent these from getting out of sync, consider extracting this version dynamically from the source of truth in src/chat_sdk/__init__.py using a regex or by importing it.

if missing:
for describe, ts_name, _py_name in missing[:5]:
print(f" MISSING: [{describe}] {ts_name}")
marker = "NEW" if (describe, ts_name) in set(file_new) else "baselined"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new set from file_new inside the loop is inefficient as it happens for every missing test. Since baseline_keys is already a set containing the baselined keys for the current file, you can check for membership in it directly (negated) to determine if a miss is 'NEW'.

Suggested change
marker = "NEW" if (describe, ts_name) in set(file_new) else "baselined"
marker = "NEW" if (describe, ts_name) not in baseline_keys else "baselined"

@patrick-chinchill patrick-chinchill marked this pull request as ready for review April 24, 2026 02:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0a545f922

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

run: |
failures=0
for var in RUFF_CHECK RUFF_FORMAT AUDIT PYREFLY; do
for var in RUFF_CHECK RUFF_FORMAT AUDIT FIDELITY PYREFLY; do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include upstream clone outcome in failure gate

Add the clone step to the aggregate failure loop; otherwise a transient git clone failure (network outage, rate limit, tag fetch issue) does not fail this job. In that case TS_ROOT is missing, verify_test_fidelity.py skips all TS files and exits 0, so the new fidelity protection is silently bypassed while the workflow still reports success.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/chat_sdk/adapters/teams/format_converter.py (1)

191-200: Consider delegating full table rendering to render_gfm_table.

_table_to_gfm still duplicates table line assembly logic. Reusing render_gfm_table would keep output behavior centralized and reduce drift risk.

♻️ Proposed refactor
-from chat_sdk.shared.card_utils import escape_table_cell
+from chat_sdk.shared.card_utils import render_gfm_table
...
-        lines: list[str] = []
-        # Header row
-        lines.append(f"| {' | '.join(escape_table_cell(c) for c in rows[0])} |")
-        # Separator
-        separators = ["---"] * len(rows[0])
-        lines.append(f"| {' | '.join(separators)} |")
-        # Data rows
-        for row in rows[1:]:
-            lines.append(f"| {' | '.join(escape_table_cell(c) for c in row)} |")
-        return "\n".join(lines)
+        headers = rows[0]
+        data_rows = rows[1:]
+        return "\n".join(render_gfm_table(headers, data_rows))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/chat_sdk/adapters/teams/format_converter.py` around lines 191 - 200, The
function _table_to_gfm duplicates table assembly logic; replace its body to
delegate to the central render_gfm_table renderer instead of rebuilding lines
manually: locate _table_to_gfm and have it call and return
render_gfm_table(rows) (or adapt rows to the expected input of
render_gfm_table), removing the duplicated header/separator/data assembly so all
table formatting is centralized in render_gfm_table and escape_table_cell
behavior remains consistent.
tests/test_cards.py (1)

165-210: Add one Teams converter integration assertion for table escaping.

Current additions validate shared helpers, but not the adapter wiring path. A small TeamsFormatConverter._table_to_gfm-level test would protect against future regressions where the converter stops using shared escaping.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cards.py` around lines 165 - 210, Add a test that exercises
TeamsFormatConverter._table_to_gfm to assert it uses the shared
escaping/rendering helpers: call TeamsFormatConverter._table_to_gfm (or the
public method that invokes it) with headers and cells containing pipes,
backslashes, and newlines and assert the output contains the escaped pipe
(r"\\|") and collapsed newlines as produced by
escape_table_cell/render_gfm_table; this ensures the adapter wiring still
delegates to shared.card_utils functions (reference
TeamsFormatConverter._table_to_gfm, escape_table_cell, render_gfm_table).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/chat_sdk/adapters/teams/format_converter.py`:
- Around line 191-200: The function _table_to_gfm duplicates table assembly
logic; replace its body to delegate to the central render_gfm_table renderer
instead of rebuilding lines manually: locate _table_to_gfm and have it call and
return render_gfm_table(rows) (or adapt rows to the expected input of
render_gfm_table), removing the duplicated header/separator/data assembly so all
table formatting is centralized in render_gfm_table and escape_table_cell
behavior remains consistent.

In `@tests/test_cards.py`:
- Around line 165-210: Add a test that exercises
TeamsFormatConverter._table_to_gfm to assert it uses the shared
escaping/rendering helpers: call TeamsFormatConverter._table_to_gfm (or the
public method that invokes it) with headers and cells containing pipes,
backslashes, and newlines and assert the output contains the escaped pipe
(r"\\|") and collapsed newlines as produced by
escape_table_cell/render_gfm_table; this ensures the adapter wiring still
delegates to shared.card_utils functions (reference
TeamsFormatConverter._table_to_gfm, escape_table_cell, render_gfm_table).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4cd4131b-d855-4cc0-bbd1-842eeaf6651e

📥 Commits

Reviewing files that changed from the base of the PR and between d0a545f and c26cd97.

📒 Files selected for processing (3)
  • pyproject.toml
  • src/chat_sdk/adapters/teams/format_converter.py
  • tests/test_cards.py
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml

@patrick-chinchill
Copy link
Copy Markdown
Collaborator Author

Review verdict: needs changes. Refreshed and reviewed latest head c26cd97 against main. The baseline passes with a fresh upstream clone pinned to chat@4.26.0, but fidelity enforcement can silently pass when the upstream TS root is missing: .github/workflows/lint.yml lines 60-72 make the clone continue-on-error, and scripts/verify_test_fidelity.py lines 280-284 skips every mapped TS file and exits 0. Reproduced with TS_ROOT=/tmp/definitely-missing-vercel-chat python3 scripts/verify_test_fidelity.py, which reported all TS files skipped and exited 0. Make the clone required or make the script fail when mapped TS files are unavailable.

@patrick-chinchill
Copy link
Copy Markdown
Collaborator Author

Valid — reproduced locally that TS_ROOT=/tmp/nonexistent exits 0 with "skipped" for every mapped file. The continue-on-error on the clone + the skip-on-missing in the script combine to a silent pass. Fix should either (a) fail the clone step with continue-on-error: false, (b) make the script fail when ≥N mapped TS files are missing, or (c) both. Not blocking the 0.4.26.2 release bundle but worth pinning down before we trust the baseline check.

claude and others added 6 commits April 24, 2026 03:24
Verify-test-fidelity ran out-of-band with no CI invocation, so missing
upstream-TS-test translations drifted undetected (16 gaps at 4.26.0,
40 against upstream main). Now:

- Default mode is baseline-enforced via `scripts/fidelity_baseline.json`.
  CI fails on new misses outside the baseline; accepts baselined misses
  as-is. Fixed misses are reported with a nudge to tighten the baseline.
- `--strict` ignores the baseline (eventual target once baseline hits 0).
- `--update-baseline` regenerates the baseline file.
- Initial baseline captures the 16 `[post with Plan]` gaps tracked in #55,
  pinned to `chat@4.26.0`.
- lint.yml clones `vercel/chat@4.26.0` and runs the check; aggregates into
  the existing "fail if any step failed" gate.

CLAUDE.md + docs/UPSTREAM_SYNC.md document the workflow.

https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ
PyPI search has no signal today — keywords field was absent, classifiers
covered only Chat + Typed. Added 12 keywords covering high-volume search
shapes (generic, per-platform, asyncio, vercel) and 4 classifiers
(Communications parent, Internet, Libraries :: Application Frameworks,
Libraries :: Python Modules) matching patterns used by slack-sdk,
python-telegram-bot, discord.py, errbot.

No runtime change; hatchling build output is unaffected.

https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ
teams/format_converter.py had a byte-identical private copy of
shared/card_utils.escape_table_cell. Removed the copy, imported the
shared function, renamed the 2 call sites. Also added direct unit
tests for shared.card_utils.escape_table_cell + render_gfm_table —
both were previously covered only transitively through adapter tests.

First sub-PR of #70 (shared-helper consolidation). Scope check: most
of #70's five planned helpers already live in shared/card_utils.py —
only escape_table_cell (this PR) and button-style mapping (Teams +
Discord, remaining work) are still duplicated per-adapter.

https://claude.ai/code/session_01WhrgpELQJJSakBnwSNuwGJ
Previously `verify_test_fidelity.py` printed "SKIPPED (file not found)"
for any mapped TS test whose source didn't exist under `TS_ROOT`, then
summed 0 matches + 0 missing and exited 0 with "All TS tests have
Python equivalents." Combined with `continue-on-error: true` on the
upstream-clone step, that made a silently-failing clone report
"fidelity check passed" in CI.

Now the script tracks missing-TS-file hits separately from real
successes. If any mapped TS file is absent at end-of-run, the script
prints a clear "upstream checkout missing — cannot verify fidelity"
message naming every missing path, includes the clone command hint,
and exits 1. This fires before strict/baseline/update-baseline
success branches so no mode can accidentally mask it.

Reproducer:
  TS_ROOT=/tmp/definitely-missing uv run python scripts/verify_test_fidelity.py
Before: exit 0.  After: exit 1 with infra-level error message.

Closes self-review gap #1 on #72.
…y baseline

Two infra-level changes to the fidelity check:

1. `.github/workflows/lint.yml` — the `Clone upstream vercel/chat at
   pinned parity tag` step no longer carries `continue-on-error: true`.
   The clone is infrastructure the fidelity check depends on; if it
   fails, the job should fail there rather than swallow the failure
   and hope a later step catches it. Combined with the script-level
   guard from the previous commit, this is defense in depth.

2. CI now runs `scripts/verify_test_fidelity.py --strict`. Every
   `[post with Plan]` test that was baselined is now ported (PR #75
   and PR #74 in the 0.4.26.2 bundle), so the repo ships at 0 missing.
   `scripts/fidelity_baseline.json` is reduced to `{"missing": {}}`
   (metadata retained so `--update-baseline` and the documented
   workflow still function for future upstream syncs).

Closes self-review gap #2 on #72.
- CLAUDE.md fidelity section: document that CI runs `--strict` and that
  the script now fails loudly on a missing upstream checkout. Baseline
  mode is retained but described as the local-dev fallback rather than
  the CI default.
- docs/UPSTREAM_SYNC.md: same update, plus clarify the two infra
  guardrails (no `continue-on-error` on the clone, script-level
  missing-TS-file check) and rewrite the workflow table for strict
  mode as the default.
- CHANGELOG.md: add a `### CI / Internals` subsection under the
  unreleased 0.4.26.2 entry describing the fidelity-check wiring,
  the two infra guardrails, and the empty baseline. Closes #53, #72.
@patrick-chinchill patrick-chinchill force-pushed the claude/compare-chat-py-port-eh67x branch from c26cd97 to 404bf3e Compare April 24, 2026 10:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
src/chat_sdk/adapters/teams/format_converter.py (1)

191-200: Prefer shared table renderer to avoid duplicate formatting logic.

Lines 191-200 still reimplement behavior already provided by render_gfm_table in src/chat_sdk/shared/card_utils.py.

♻️ Proposed refactor
-from chat_sdk.shared.card_utils import escape_table_cell
+from chat_sdk.shared.card_utils import render_gfm_table
@@
-        lines: list[str] = []
-        # Header row
-        lines.append(f"| {' | '.join(escape_table_cell(c) for c in rows[0])} |")
-        # Separator
-        separators = ["---"] * len(rows[0])
-        lines.append(f"| {' | '.join(separators)} |")
-        # Data rows
-        for row in rows[1:]:
-            lines.append(f"| {' | '.join(escape_table_cell(c) for c in row)} |")
-        return "\n".join(lines)
+        return "\n".join(render_gfm_table(rows[0], rows[1:]))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/chat_sdk/adapters/teams/format_converter.py` around lines 191 - 200, This
block duplicates table rendering logic; replace the manual construction using
escape_table_cell and join with a call to the shared helper render_gfm_table
from src/chat_sdk/shared/card_utils.py: import render_gfm_table at the top of
src/chat_sdk/adapters/teams/format_converter.py and call render_gfm_table(rows)
(or pass the same `rows` variable) instead of building `lines` and joining them;
remove the duplicated separators/code around the for-loop and ensure any
existing escape_table_cell usage is not redundantly applied before calling
render_gfm_table.
tests/test_cards.py (1)

202-206: Consider exact-line assertions for stronger regression detection.

Lines 204-205 currently use substring checks; exact row checks would also catch delimiter/spacing regressions.

🔍 Suggested test tightening
     def test_cells_are_escaped(self):
         lines = render_gfm_table(["col"], [["pipe|inside"], ["has\nnewline"]])
-        assert r"pipe\|inside" in lines[2]
-        assert "has newline" in lines[3]
+        assert lines[2] == r"| pipe\|inside |"
+        assert lines[3] == "| has newline |"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cards.py` around lines 202 - 206, Update the
test_cells_are_escaped to assert exact rendered table rows rather than
substrings: locate the test_cells_are_escaped function and change the two
substring assertions against lines[2] and lines[3] to full-line equality checks
that match the exact GFM table row formatting produced by render_gfm_table
(including pipe delimiters, spacing, and the escaped pipe backslash for
"pipe|inside" and the newline-normalized form for "has\nnewline"). Ensure the
expected strings reflect the exact characters render_gfm_table outputs (e.g.,
escaped pipes and single-space cell separation) so the test will fail on
delimiter/spacing regressions.
.github/workflows/lint.yml (1)

81-92: Surface clone_upstream failures explicitly in the summary.

If the clone step fails, fidelity is skipped and the summary currently points at the downstream symptom instead of the root cause. Adding steps.clone_upstream.outcome to the table makes the new infra guardrail much easier to triage.

Suggested tweak
           echo "| Ruff check | ${{ steps.ruff_check.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Ruff format | ${{ steps.ruff_format.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Test audit | ${{ steps.audit.outcome }} |" >> $GITHUB_STEP_SUMMARY
+          echo "| Clone upstream | ${{ steps.clone_upstream.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Test fidelity | ${{ steps.fidelity.outcome }} |" >> $GITHUB_STEP_SUMMARY
           echo "| Pyrefly | ${{ steps.pyrefly.outcome }} |" >> $GITHUB_STEP_SUMMARY

Also applies to: 100-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/lint.yml around lines 81 - 92, Add an explicit row for the
clone step outcome to the GitHub Actions summary so upstream clone failures are
visible: in the "Report results" step (the block that writes to
$GITHUB_STEP_SUMMARY and currently echoes rows for ruff_check, ruff_format,
audit, fidelity, pyrefly) add an entry that echoes the value of
steps.clone_upstream.outcome (e.g., echo "| Clone upstream | ${{
steps.clone_upstream.outcome }} |") and make the same change in the duplicate
summary block later in the file (the second report block covering lines
~100-117) so both summary places show clone_upstream outcome for easier triage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 107-114: Add an explicit local upstream checkout step to the
validation instructions so users know how to create /tmp/vercel-chat before
running scripts; update the CLAUDE.md validation command block that references
verify_test_fidelity.py to include a note and the checkout command (e.g., clone
or checkout the upstream repo into /tmp/vercel-chat), and mention that
verify_test_fidelity.py will fail if the mapped TS files under /tmp/vercel-chat
are missing so readers run the checkout first.

In `@scripts/verify_test_fidelity.py`:
- Around line 255-260: In main(), add an early mutual-exclusion check that
rejects using --strict together with --update-baseline: if both strict_mode and
update_baseline are true, print a clear error/banner and exit with non-zero
status; ensure this guard appears before baseline is loaded or possibly
rewritten (around where baseline is set) and similarly add the same
mutually-exclusive check before any later branches that handle update_baseline
(references: main, strict_mode, update_baseline, baseline) so the script never
proceeds to rewrite the baseline when strict mode was requested (also apply the
same guard to the other locations noted near the second and third branches).
- Around line 14-19: Update the module docstring at the top of
verify_test_fidelity.py and the generated baseline comment text (the string used
when writing scripts/fidelity_baseline.json; see the code that composes or
writes the baseline header around lines ~233-240) to reflect the current CI
contract: CI already runs with --strict, so the baseline is advisory/history
only and regenerating it does not change CI behavior; change wording to state
that default/baseline mode is for local tracking and that CI fails unless
--strict is explicitly relaxed locally. Ensure both the top-level docstring and
the generated baseline comment use the new phrasing consistently.
- Around line 218-227: The load_baseline function currently ignores the JSON's
ts_parity; update load_baseline to read data.get("ts_parity") and validate it
against the current expected parity constant (e.g., EXPECTED_TS_PARITY) before
using the "missing" entries; if the parity mismatches, fail fast by raising an
informative exception or returning an empty baseline and logging an error so old
baselines aren't applied to new upstream checkouts—apply the same parity check
in the other baseline-loading path referenced around the later block (the same
ts_parity key handling).

---

Nitpick comments:
In @.github/workflows/lint.yml:
- Around line 81-92: Add an explicit row for the clone step outcome to the
GitHub Actions summary so upstream clone failures are visible: in the "Report
results" step (the block that writes to $GITHUB_STEP_SUMMARY and currently
echoes rows for ruff_check, ruff_format, audit, fidelity, pyrefly) add an entry
that echoes the value of steps.clone_upstream.outcome (e.g., echo "| Clone
upstream | ${{ steps.clone_upstream.outcome }} |") and make the same change in
the duplicate summary block later in the file (the second report block covering
lines ~100-117) so both summary places show clone_upstream outcome for easier
triage.

In `@src/chat_sdk/adapters/teams/format_converter.py`:
- Around line 191-200: This block duplicates table rendering logic; replace the
manual construction using escape_table_cell and join with a call to the shared
helper render_gfm_table from src/chat_sdk/shared/card_utils.py: import
render_gfm_table at the top of src/chat_sdk/adapters/teams/format_converter.py
and call render_gfm_table(rows) (or pass the same `rows` variable) instead of
building `lines` and joining them; remove the duplicated separators/code around
the for-loop and ensure any existing escape_table_cell usage is not redundantly
applied before calling render_gfm_table.

In `@tests/test_cards.py`:
- Around line 202-206: Update the test_cells_are_escaped to assert exact
rendered table rows rather than substrings: locate the test_cells_are_escaped
function and change the two substring assertions against lines[2] and lines[3]
to full-line equality checks that match the exact GFM table row formatting
produced by render_gfm_table (including pipe delimiters, spacing, and the
escaped pipe backslash for "pipe|inside" and the newline-normalized form for
"has\nnewline"). Ensure the expected strings reflect the exact characters
render_gfm_table outputs (e.g., escaped pipes and single-space cell separation)
so the test will fail on delimiter/spacing regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b1c510c3-d12e-4f9f-9f6e-d51ed8f71fbe

📥 Commits

Reviewing files that changed from the base of the PR and between c26cd97 and 404bf3e.

📒 Files selected for processing (9)
  • .github/workflows/lint.yml
  • CHANGELOG.md
  • CLAUDE.md
  • docs/UPSTREAM_SYNC.md
  • pyproject.toml
  • scripts/fidelity_baseline.json
  • scripts/verify_test_fidelity.py
  • src/chat_sdk/adapters/teams/format_converter.py
  • tests/test_cards.py
✅ Files skipped from review due to trivial changes (3)
  • scripts/fidelity_baseline.json
  • pyproject.toml
  • docs/UPSTREAM_SYNC.md

Comment thread CLAUDE.md Outdated
Comment thread scripts/verify_test_fidelity.py Outdated
Comment thread scripts/verify_test_fidelity.py
Comment thread scripts/verify_test_fidelity.py
- CLAUDE.md: add local clone command to the quick reference so the
  "fails if missing" rule is actionable
- verify_test_fidelity.py docstring/baseline-comment: reflect strict-mode
  CI contract (was still describing baseline mode)
- load_baseline: validate ts_parity against UPSTREAM_PARITY so a stale
  baseline can't silently mask upstream drift
- argparse: reject --strict + --update-baseline together (exit 2)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/verify_test_fidelity.py`:
- Line 282: The baseline writer currently hardcodes the parity string
("chat@4.26.0") causing parity mismatches with load_baseline(); update
write_baseline() to call the same parity resolution used by load_baseline()—use
_current_parity_tag() or UPSTREAM_PARITY (the dynamic source) to populate the
"ts_parity" field so written baselines reflect the current parity tag rather
than a hardcoded version.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 213ee06a-615f-4fb4-a3fb-48939d721599

📥 Commits

Reviewing files that changed from the base of the PR and between 404bf3e and 9400d31.

📒 Files selected for processing (2)
  • CLAUDE.md
  • scripts/verify_test_fidelity.py

Comment thread scripts/verify_test_fidelity.py Outdated
… claim

Addresses 2 medium findings from local adversarial review on #72:

- write_baseline: preserve existing _comment rather than overwriting
  with boilerplate on each --update-baseline run.
- CHANGELOG / CLAUDE.md / fidelity_baseline.json: "strict fidelity"
  now qualified as "strict fidelity for mapped core files" since
  MAPPING covers 8 of 17 packages/chat/src/*.test.ts files.

Follow-ups filed for (a) MAPPING expansion to full core-package scope,
(b) pinning upstream clone SHA in lint.yml, (c) tightening fuzzy_match
against hyphen-stripping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/verify_test_fidelity.py (1)

299-302: ⚠️ Potential issue | 🟠 Major

Stop hardcoding the upstream parity tag in runtime output.

load_baseline() now validates against _current_parity_tag(), but write_baseline() still emits "chat@4.26.0" and the recovery hint tells users to clone that same fixed tag. After the next UPSTREAM_PARITY bump, --update-baseline will write a baseline that immediately fails baseline mode, and the error output will point at the wrong checkout.

♻️ Proposed fix
 def write_baseline(path: Path, all_missing: dict[str, list], total_ts: int) -> None:
@@
-    payload = {
+    parity = _current_parity_tag()
+    if parity is None:
+        raise RuntimeError("Could not determine UPSTREAM_PARITY for baseline write.")
+
+    payload = {
         "_comment": existing_comment if existing_comment is not None else _DEFAULT_BASELINE_COMMENT,
-        "ts_parity": "chat@4.26.0",
+        "ts_parity": parity,
         "total_ts_tests": total_ts,
         "total_missing": sum(len(v) for v in all_missing.values()),
         "missing": {
@@
     if missing_ts_files:
+        parity = _current_parity_tag() or "chat@4.26.0"
         print(
             f"\nupstream checkout missing — cannot verify fidelity. "
             f"{len(missing_ts_files)} mapped TS file(s) not found under TS_ROOT={TS_ROOT!r}:"
@@
         print(
             "\nClone the upstream repo at the pinned parity tag, e.g.:\n"
-            "  git clone --depth 1 --branch chat@4.26.0 "
+            f"  git clone --depth 1 --branch {parity} "
             "https://github.com/vercel/chat.git /tmp/vercel-chat\n"
             "then re-run with TS_ROOT=/tmp/vercel-chat."
         )

Also applies to: 429-431

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/verify_test_fidelity.py` around lines 299 - 302, The payload
currently hardcodes "chat@4.26.0" which will drift from the runtime check;
replace the literal in the payload construction (and any recovery hint text
emitted by write_baseline()) with the dynamic value returned by
_current_parity_tag(), and ensure write_baseline() uses the same
_current_parity_tag() call as load_baseline() so --update-baseline writes a
baseline that validates against the current parity tag and the recovery hint
points to the correct checkout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/UPSTREAM_SYNC.md`:
- Around line 98-100: The documentation row implies a file-scoped escape hatch
that doesn't exist; update the text to state the real options: because
verify_test_fidelity.py and lint.yml only toggle fidelity repo-wide, you must
either keep CI strict (run with --strict and port the new TypeScript tests) or
intentionally relax the repo-wide fidelity invocation (e.g., change the lint.yml
call to remove --strict or run --update-baseline) in a separate commit; replace
“switch the workflow back to non-strict default for that file” with this
explicit two‑choice guidance referencing verify_test_fidelity.py, lint.yml,
--strict, and --update-baseline.

---

Duplicate comments:
In `@scripts/verify_test_fidelity.py`:
- Around line 299-302: The payload currently hardcodes "chat@4.26.0" which will
drift from the runtime check; replace the literal in the payload construction
(and any recovery hint text emitted by write_baseline()) with the dynamic value
returned by _current_parity_tag(), and ensure write_baseline() uses the same
_current_parity_tag() call as load_baseline() so --update-baseline writes a
baseline that validates against the current parity tag and the recovery hint
points to the correct checkout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f2eb2343-dd23-4fac-a577-1b8ed673f9a3

📥 Commits

Reviewing files that changed from the base of the PR and between 9400d31 and 7b27e2d.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • CLAUDE.md
  • docs/UPSTREAM_SYNC.md
  • scripts/fidelity_baseline.json
  • scripts/verify_test_fidelity.py
✅ Files skipped from review due to trivial changes (1)
  • scripts/fidelity_baseline.json

Comment thread docs/UPSTREAM_SYNC.md
patrick-chinchill and others added 2 commits April 24, 2026 10:49
Round-2 local review caught 3 spots the round-1 sweep missed:
- script module docstring still said "zero missing against chat@4.26.0"
  unqualified (renders in --help and source)
- lint.yml step title "Test fidelity check (strict — zero missing)"
  renders unqualified in the Actions UI
- CHANGELOG entry had "Closes #53, #72"; a PR can't close itself

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… default comment

Round-3 local review caught two gaps:

- write_baseline hardcoded ts_parity="chat@4.26.0". Would self-trap on the
  next upstream sync (regenerated baseline carries stale tag -> next run
  fails with re-run-update-baseline advice user just ran). Now reads
  _current_parity_tag().
- _DEFAULT_BASELINE_COMMENT didn't mention scope framing. Every other
  user-facing surface carries the "mapped core files / 8 of 17 / MAPPING"
  language; rm-ing the baseline and regenerating silently dropped it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@patrick-chinchill patrick-chinchill merged commit c36c8aa into main Apr 24, 2026
10 checks passed
@patrick-chinchill patrick-chinchill deleted the claude/compare-chat-py-port-eh67x branch April 24, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test fidelity check should fail CI when upstream tests aren't ported

2 participants