fix(adapters): bug-fix sweep from upstream 4.27.0 (slack/discord/telegram)#89
fix(adapters): bug-fix sweep from upstream 4.27.0 (slack/discord/telegram)#89patrick-chinchill wants to merge 7 commits intomainfrom
Conversation
Bundles 5 small upstream bug fixes into one PR. Each is independent and covered by a regression test. - vercel/chat#394 (slack): preserve email addresses in @mention regex. ``user@domain.com`` no longer extracts ``@domain`` as a mention. - vercel/chat#292 (slack): guard Slack API calls against empty ``thread_ts`` to fix ``invalid_thread_ts`` errors. - vercel/chat#256 (discord): remove duplicate text when posting card messages. ``content`` is omitted on the create path and explicitly cleared on the edit path (Discord PATCH preserves omitted fields). - vercel/chat#395 (slack): enrich link previews with unfurl metadata from attachments. Routes ``message_changed`` events through a new ``_handle_message_changed`` so the message handler sees unfurled links instead of bare URLs. New cache + poll window. - vercel/chat#407 (telegram): rewrite format converter to emit MarkdownV2 (``*bold*`` etc.) instead of legacy ``Markdown``. Adds proper escaping for the 18 special characters MarkdownV2 reserves in normal text, the narrower set inside code blocks, and the parens/backslash escape inside link URLs. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Address CodeQL alert on PR #89: ``"wikipedia.org" in result`` matches an arbitrary substring and tripped the URL-substring-sanitization heuristic. Replace with a longer anchored fragment ``"https://en.wikipedia.org/wiki/Foo_"`` which both passes CodeQL and is a stronger render assertion. This is test-only — no behavior change, no security boundary. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements and bug fixes across the Discord, Slack, and Telegram adapters. Key changes include preventing duplicate text in Discord card messages by omitting fallback content, implementing asynchronous link unfurl metadata enrichment for Slack by handling message_changed events, and migrating the Telegram adapter from legacy Markdown to MarkdownV2 with a new format converter and comprehensive escaping logic. Additionally, it fixes a Slack regex issue that incorrectly identified email addresses as mentions and adds validation for thread_ts in Slack streaming. Review feedback suggests adding logging for state backend failures during Slack link enrichment to improve debuggability.
| except Exception: | ||
| return links |
There was a problem hiding this comment.
While returning the original links is a safe fallback, swallowing the exception silently can make it difficult to debug issues with the state backend. It would be beneficial to log the exception to provide visibility into why link enrichment might be failing.
| except Exception: | |
| return links | |
| except Exception as exc: | |
| self._logger.warn("Failed to get unfurl data from state", {"error": exc, "message_ts": message_ts}) | |
| return links |
Address gemini-code-assist review on PR #89 (line 1844). The previous ``except Exception: return links`` swallowed state-backend errors silently, making it hard to debug why link enrichment isn't firing. Log a warning with the exception and message_ts before returning the fallback. Behavior unchanged otherwise — bare URLs still surface to handlers when state reads fail. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill
left a comment
There was a problem hiding this comment.
Review of bug-fix sweep from upstream chat@4.27.0
Read PR + 2 follow-up commits (3bd5149 CodeQL test fix, 0de162c unfurl logging) against upstream f55378a (chat@4.27.0) commits c26ee6c, 53c6b68, 7e5b447, ded6f78, b9a1961. Applied porting hazards (docs/UPSTREAM_SYNC.md) + adversarial checks (docs/SELF_REVIEW.md). Ran the touched test files locally — all 443 pass.
1. Slack @mention regex (vercel/chat#394) — ✅ Looks good
Python regex (?<![<\w])@(\w+) is character-for-character identical to TS BARE_MENTION_REGEX. Hoisted to module-level constant matching upstream. Both call sites updated. Six regression tests cover email-not-mangled, mailto-preserved, leading @user still wrapped, @@user adversarial, dotted-subdomain emails. Test docstrings include "What to fix if this fails:" line per CLAUDE.md principle #8.
🔵 Nit: @user@example.com (mention immediately followed by an email-shaped tail) is not in the test sweep. Verified manually: produces <@user>@example.com (the @user is wrapped, the email tail is preserved). That's parity with upstream TS but worth pinning a test on.
2. Slack empty thread_ts guard (vercel/chat#292) — ✅ Looks good
thread_ts or None normalization at every Slack API call site (post_message, post_ephemeral, schedule_message, upload_files, start_typing). stream correctly raises ValidationError early. Adversarial thread_ts == "0" test is present and passes ("0" is truthy in Python so it survives the or None collapse — the test pins this behavior). Truthiness hazard #1 from UPSTREAM_SYNC.md correctly applied.
3. Discord card text dedup (vercel/chat#256) — ✅ Looks good
Both Python sites updated:
post_message: omitscontentwhen card present (parity with TS).edit_message: explicitly setscontent = ""(parity with TS — comment correctly notes "Discord PATCH preserves omitted fields").
The third TS site (postChannelMessage) is not ported — confirmed the Python Discord adapter has no post_channel_message method (declared on Adapter protocol but unimplemented for Discord; only Google Chat / Telegram implement it). This is correct, not a port gap. Tests cover create, edit, and the "card with no text content" adversarial. The assert "content" in payload check on edit is the load-bearing assertion — correct.
4. Slack link unfurl enrichment (vercel/chat#395) — 🟡 Medium + ✅
Faithful port of _handle_message_changed, _extract_links (inline attachments), _enrich_links (poll loop with 2000ms / 150ms cadence), 1h cache TTL. Trailing-slash normalization in both directions. Async write in _handle_message_changed is correctly tracked via create_task + add_done_callback (hazard #5 applied). Follow-up commit 0de162c added the requested state-backend error logging in _enrich_links — verified at src/chat_sdk/adapters/slack/adapter.py:1843-1847.
🟡 Medium — merge precedence diverges from TS: _merge_unfurl_into_preview (adapter.py:1748-1755) preserves preview.<field> if non-None, falling back to unfurl. TS does { ...link, ...unfurl } — unfurl OVERRIDES the link's existing fields (except title, which is short-circuited above). For description / image_url / site_name, behavior diverges when the preview already has a value: Python keeps the preview value, TS overwrites with the (possibly fresher) unfurl. In practice rare since _create_link_preview returns blank fields, but it's a silent semantic divergence not flagged in UPSTREAM_SYNC.md non-parity table. Either match TS spread-overwrite semantics, or document the divergence + add a regression test.
🔵 Nit: TS url.replace(TRAILING_SLASH_PATTERN, "") strips a single trailing /; Python _TRAILING_SLASH_PATTERN.sub("", url) strips ALL (re.sub default count=0). For https://x.com// they differ. Use .sub("", url, count=1) for parity.
🔵 Nit: 1h cache TTL — if a message is edited multiple times, each message_changed overwrites prior unfurls (parity with TS state.set). No merge across edits, no reset. Worth one regression test pinning the overwrite semantics.
🔵 Nit: _enrich_links makes the message-parse path block up to 2000ms inside the chat factory. This serializes one webhook for 2s when no message_changed is in flight (e.g. message with no real unfurls). Parity with TS, but worth noting in a comment that the 2000ms is a per-message worst case.
5. Telegram MarkdownV2 (vercel/chat#407) — 🔴 Critical
Renderer port is faithful. 18 special chars in normal text, 2 in code blocks, 2 in link URLs — all match TS regex character-for-character. from_markdown(card_to_fallback_text(card)) correctly handles the Python-specific **title** shape (Python card_to_fallback_text is hard-coded to **, no boldFormat arg needed). Headings → bold, lists with escaped \- and \1., blockquote > per line, table preprocessed → ASCII code block. resolve_parse_mode correctly returns MarkdownV2 for {ast}/{markdown}/cards/JSX, None for plain str + {raw}.
🔴 Critical — truncation produces invalid MarkdownV2: TS commit b9a1961 explicitly added truncateForTelegram + trimToMarkdownV2SafeBoundary because the naive slice + "..." fails on MarkdownV2:
.is reserved → bare"..."is a parse error- The slice can leave an orphan trailing
\(escapes the appended ellipsis or nothing) - The slice can cut through a paired entity (
*bold*,`code`) leaving it unclosed
Python truncate_message (src/chat_sdk/adapters/telegram/adapter.py:1739-1741) still calls _truncate_to_utf16(text, TELEGRAM_MESSAGE_LIMIT) with default ellipsis="..." — none of the three failure modes are guarded. truncate_caption has the same gap. Any MarkdownV2 message that exceeds 4096 chars (or 1024 for captions) will trigger Bad Request: can't parse entities — exactly the bug class this PR claims to fix. Existing length tests only check len(result) <= 4096, not parser validity.
Port truncateForTelegram + trimToMarkdownV2SafeBoundary + findUnescapedPositions + endsWithOrphanBackslash. Add the 8 length-limit tests from upstream (* crosses 4096, ` crosses 4096, orphan \ at boundary).
🔵 Nit: TS renderer throws on unknown node types via node satisfies never. Python falls back to "render children else escape value else empty" — silently swallows new mdast node kinds instead of failing the build. Acceptable trade-off (Python has no exhaustiveness check), but worth a comment.
🔵 Nit: Test file uses generic class TestMarkdownV2Escaping for all 11 tests but doesn't cover URL with ( (escape rule says only ) and \ are escaped — a URL with unbalanced ( will likely break Telegram's parser).
Cross-cutting
- ✅ All 5 fixes have regression tests with
What to fix if this fails:docstrings (CLAUDE.md principle #8). - ✅ AsyncMock used correctly throughout (no MagicMock-where-AsyncMock bugs surfaced — principle #2).
- ✅ Truthiness hazard correctly applied (
thread_ts or None, with"0"adversarial test). - ✅ Spawned task in
_handle_message_changedis tracked + error-handled (hazard #5). - 🟡 The Telegram truncation gap is a regression-class issue carried over from the pre-MarkdownV2 code — the parse_mode change widens it (legacy Markdown was permissive; MarkdownV2 is strict). Recommend landing the truncation port in this PR or as an immediate follow-up before any 0.4.27 release.
Posted by an automated reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code
The previous truncation in ``TelegramAdapter.truncate_message`` / ``truncate_caption`` called ``_truncate_to_utf16`` with a literal ``...`` ellipsis, which Telegram's MarkdownV2 parser rejects: ``.`` is a reserved character; the slice can leave an orphan trailing ``\``; and the slice can cut through a paired entity (``*bold*``, `` `code` ``, ``[label](url)``) leaving it unclosed. Any MarkdownV2 message exceeding 4096 chars (or 1024 for captions) triggered ``Bad Request: can't parse entities``. Port of upstream b9a1961 (chat@4.27.0) plus the streaming-chunk safety trim from f46a6fb (chat#446): - ``truncate_for_telegram(text, limit, parse_mode)``: MarkdownV2 path uses an escaped ``\.\.\.`` ellipsis and walks back past unbalanced entity delimiters or orphan backslashes before appending. Plain text keeps the literal ``...``. - ``find_unescaped_positions(text, marker)``: scans for unescaped occurrences of an entity marker, accounting for arbitrary runs of escape backslashes. - ``ends_with_orphan_backslash(text)``: True when the trailing run of ``\`` has odd parity. - ``_find_unescaped_positions_outside_code``: skips occurrences inside fenced and inline code spans (Telegram treats markers there as literal text). - ``_trim_to_markdown_v2_safe_boundary``: best-effort backwards walk past unbalanced delimiters / orphan backslashes / unmatched ``[``. ``truncate_message`` / ``truncate_caption`` now accept an optional ``parse_mode``; ``post_message`` / ``edit_message`` / ``send_document`` plumb it through. Plain-mode behaviour is unchanged (still UTF-16 aware). Tests: 8 length-limit tests on the MarkdownV2 path (escaped ellipsis, orphan backslash, unclosed bold / code / open bracket, all-special input, balanced no-op, plain passthrough); 4 streaming-chunk safety trims; 4 ``find_unescaped_positions`` tests; 5 ``ends_with_orphan_backslash`` tests; 3 adapter-level integration tests verifying ``truncate_message`` / ``truncate_caption`` dispatch on parse_mode. Each docstring includes "What to fix if this fails:" pointing at the relevant helper. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
… fixes
Address review findings on the chat@4.27.0 adapter bug-fix sweep:
1. Slack ``_merge_unfurl_into_preview``: TS does
``{ ...preview, ...unfurl }`` which lets unfurl OVERRIDE the
preview's existing fields (except ``title``, which is short-
circuited at the ``_enrich_links`` call site when the preview
already has one). The previous Python implementation preserved
non-None preview fields, so a preview that picked up a description
from one source could never be overwritten by a more authoritative
unfurl. Switch to "unfurl wins when present" semantics for
``description`` / ``image_url`` / ``site_name``; ``title`` matches
TS via the existing ``_enrich_links`` short-circuit. Add a
regression test (``preview.description = "old"``,
``unfurl.description = "new"`` -> output is ``"new"``).
2. Slack ``_TRAILING_SLASH_PATTERN.sub("", url)``: TS
``url.replace(TRAILING_SLASH_PATTERN, "")`` strips a single
trailing ``/``; Python's ``re.sub`` defaults to all matches. Pin
``count=1`` for parity (also locks down behaviour if the regex
ever loosens beyond an end-anchored match).
3. Slack: regression test that two ``message_changed`` events for the
same message ts overwrite the cached unfurl (rather than merging
in stale entries). Locks in the ``state.set(...)`` overwrite
semantics that the 1h cache TTL relies on.
4. Slack ``_extract_links``: regression test that a URL containing
an unbalanced ``(`` survives the angle-bracket regex (a tightening
of the URL pattern would silently drop wikipedia-style links).
5. Slack ``_enrich_links`` call site: comment noting that each
message containing a not-yet-unfurled link adds up to ~2s of
latency worst-case (``_UNFURL_WAIT_MS``).
6. Telegram MarkdownV2 renderer fallback: comment explaining the
trade-off vs upstream's ``node satisfies never`` exhaustive check
(Python silently degrades unknown nodes to escaped text rather
than raising, prioritising delivery over compile-time signal).
7. Slack ``test_slack_format.py``: add a missing test for the bare-
mention regex with an email tail (``@user@example.com`` ->
``<@user>@example.com``) — pins the boundary case between a real
mention and an email-style suffix glued to it.
https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
The new test_message_changed_overwrites_cached_unfurl_not_merge test asserts ``url_literal in cache_dict`` — a dict-key membership check — but CodeQL's incomplete-URL-substring-sanitization heuristic fires on the bare ``in`` syntax. Switch to ``cache.get(url) is not None`` / ``is None``; same semantics, no CodeQL false positive. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill
left a comment
There was a problem hiding this comment.
Re-review (post-fix sweep)
Verified the two new commits (17379c7, f21f047) against upstream f55378a (chat@4.27.0) plus the post-4.27.0 streaming-safety patch (f46a6fb / chat#446). All 268 tests in the touched files pass; ruff + audit clean.
Critical fix verification — Telegram MarkdownV2 truncation
find_unescaped_positions,ends_with_orphan_backslash,_trim_to_markdown_v2_safe_boundary,truncate_for_telegrammatch upstream line-by-line (loop boundlen+1, marker set, ellipsis constants, slice-then-trim ordering).- The Python port adds
_find_unescaped_positions_outside_code— this is a forward-port of f46a6fb (chat#446) which is not yet in the 4.27.0 tag. Code, comments, and the commit message correctly attribute it. Acceptable port-ahead. truncate_message/truncate_captionplumbparse_modethrough totruncate_for_telegramforparse_mode == "MarkdownV2"; plain falls back to UTF-16 with literal.... All 3 call sites inadapter.py(lines 1009, 1095, 1635 —post_message,edit_message,send_documentcaption) passparse_mode.- Adversarial sweep (run live):
- empty input ->
""(no crash) - input shorter than ellipsis -> returned verbatim
- code fence with
*args, **kwargsunder limit -> preserved (theOutsideCodehelper correctly ignores asterisks inside```) - unclosed code fence past cut -> trim back to before the opener; backticks even
[label](http://x.co...cut mid-URL -> brackets balanced;(/)not tracked (matches upstream — known shared limitation)- all-special escaped (
escape_markdown_v2(".")*200) -> length OK, no orphan trailing\ \\\\(escaped backslash run) -> trailing parity preserved, no orphan\- inline
`code with * inside`cut after opener -> drops everything from the unpaired backtick onward (clean)
- empty input ->
Other addressed items (verified)
- 🟡
_merge_unfurl_into_preview: nowunfurl winsfor description/image_url/site_name; title short-circuited at_enrich_linkscall site. Regression testtest_enrich_links_unfurl_overrides_existing_descriptionpinsold -> new. Matches TS{...preview, ...unfurl}semantically (the only divergence — Python keeps preview when unfurl key isNonerather than overwriting withundefined— is benign because_create_link_previewnever sets these fields). - 🔵
@user@example.comregex test added (test_slack_format.py:571). - 🔵
_TRAILING_SLASH_PATTERN.sub("", url, count=1)pinned at both call sites. - 🔵 Multi-edit cache overwrite test added (
test_message_changed_overwrites_cached_unfurl_not_merge). - 🔵 2000ms
_UNFURL_WAIT_MSlatency comment present. - 🔵
node satisfies neverdivergence comment present informat_converter.py:159. - 🔵 URL-with-
(regression test added (test_extract_links_url_with_open_paren_survives_parser).
Upstream parity hunt (new findings)
- 🔵 Nit — Discord
post_channel_messagemissing entirely. Upstream has 3cardToFallbackText-> "don't include text" sites:postMessage(l.814),editMessage(l.1178),postChannelMessage(l.2398). Python only has 2 (post_message,edit_message); there is nopost_channel_messagemethod on the Discord adapter. Pre-existing gap, not introduced by this PR. Worth a follow-up issue. - 🔵 Nit —
_enrich_linksshort-circuit usesis not Nonevs TS truthy. Python(link.title is not None) or (link.fetch_message is not None)differs from TSl.title || l.fetchMessage: an empty-string title would short-circuit in Python but not TS. Pre-existing, vanishingly unlikely in practice (Slack doesn't emit""titles). - ✅ Telegram MarkdownV2 renderer covers every node type upstream's
markdown.tsswitch handles (root/paragraph/text/strong/emphasis/delete/inlineCode/code/link/blockquote/list/listItem/heading/thematicBreak/break/image/html/linkReference/imageReference/definition/footnoteDefinition/yaml/footnoteReference/table/tableRow/tableCell). No drop-through gaps. - ✅
handle_message_changed: matches upstream — only attachments-with-from_url/original_urlflow; no other message_changed branches in upstream we missed.
Re-review verdict: PASS — Critical Telegram MarkdownV2 truncation fix is sufficient and faithful to upstream. The two follow-up nits (Discord post_channel_message, _enrich_links truthy) are pre-existing parity gaps unrelated to this sweep and should not block merge.
Posted by an automated re-reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code
Final upstream-coverage audit before merging the 7 sync PRs (#84-#90) identified one undocumented N/A item: vercel/chat#415 (Teams SDK 2.0.8 + User-Agent) is a JS-only botbuilder dependency bump. The Python Teams adapter uses raw aiohttp (no botbuilder), so there is no equivalent dependency to bump. The optional User-Agent: Vercel.ChatSDK header on the ~9 outbound aiohttp call sites is a defense-in-depth nice-to-have; deferred as a follow-up rather than landed in this sync. Updates: - CHANGELOG.md: tick all completed items and link them to their PRs (#84, #85, #86, #87, #88, #89, #90, plus already-merged PR #74). Document #415 inline as N/A. - docs/UPSTREAM_SYNC.md non-parity table: add row for Teams User-Agent header divergence so future syncers don't try to "port" the JS bump. Item #6 (concurrency.maxConcurrent) is already implementation-covered in the Python port (existing divergence row at L492). The 4 new TS concurrency tests in chat.test.ts have Python-specific equivalents at test_chat_faithful.py L2969-3055 that don't name-match — leaving as deferred fidelity-baseline polish since the behavior is verified. Verdict from the coverage audit: all 18 substantive ports across PRs #84-#90 are upstream-verified. No commits in chat@4.26.0..f55378a were missed. Ready to start merging. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Summary
Bundles 5 small upstream bug fixes from
vercel/chat@4.27.0into one PR. Each is independent, covered by a regression test, and clearly scoped.@mentionregex eats email addresses (@user@example.comextracted@example)thread_tspassed to Slack API causedinvalid_thread_tscontentand the card embed)message_changedevents were being filtered**bold**) instead of legacy MarkdownV2 (*bold*), causingcan't parse entitieserrorsPer-fix porting notes
Slack
@mentionregex (vercel/chat#394)Updated regex preserves
@user@example.com— does not extract the@exampletoken. Regression test intest_slack_format.py.Slack empty
thread_ts(vercel/chat#292)Guard
thread_tsbefore passing toclient.chat_postMessageetc. Empty string and missing key both treated as "no thread". Regression test intest_slack_api.py.Discord card text dedup (vercel/chat#256)
contentwhen posting a card. Discord renders both the message content AND the card embed whencontentis set.content: ""(PATCH preserves omitted fields). Without this, leftover text from a previous edit persists alongside the card.test_discord_extended.pycover both paths.Slack link unfurl enrichment (vercel/chat#395)
message_changedis no longer in_IGNORED_SUBTYPES— routed to new_handle_message_changed.message_changedevent 100-2000ms after the original. Added_UNFURL_WAIT_MS = 2000poll window with_UNFURL_POLL_MS = 150cadence so the message handler sees enriched links.test_slack_webhook.py.Telegram MarkdownV2 rewrite (vercel/chat#407)
Markdownparse_mode withMarkdownV2. Standard markdown emitted**bold**; Telegram MarkdownV2 wants*bold*._*[]()~\>#+-=|{}.!`), code blocks (only`and\), inline-link URLs (only)and\).test_telegram_format.pycover headings, bold, italic, strikethrough, code blocks, inline code, links, lists, blockquotes, escape edge cases.Test plan
uv run ruff check src/ tests/ scripts/— cleanuv run ruff format --check src/ tests/ scripts/— cleanuv run python scripts/audit_test_quality.py— 0 hard failures (39 pre-existing warnings unchanged)uv run pytest tests/ --tb=short -q— 3702 pass, 2 skipped, 1 pre-existing failure (tests/test_github_webhook.py::TestGitHubAdapterConstructor::test_throws_when_no_auth, unrelated to this PR — fails on main today)11 files changed, +963 / −54.
Upstream refs
c26ee6c)53c6b68)7e5b447)ded6f78)b9a1961)https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code