feat(slack): dynamic bot_token resolver and custom webhook_verifier (vercel/chat#421)#87
feat(slack): dynamic bot_token resolver and custom webhook_verifier (vercel/chat#421)#87patrick-chinchill wants to merge 6 commits intomainfrom
Conversation
…ercel/chat#421) Allow ``bot_token`` to be a zero-arg callable returning ``str | Awaitable[str]`` so apps can rotate or lazily fetch tokens; the resolver is invoked per call (rotation-safe). Add ``webhook_verifier`` as an alternative to ``signing_secret`` for custom request verification — returns truthy/string to accept (string substitutes the body for downstream parsing), falsy or raises to reject with 401. Mirrors the upstream TS PR. Notable adaptations: - Per-request ``ContextVar`` (``_resolved_default_token``) caches the most recent resolver result so the existing sync ``_get_token()`` call sites see a primed value during dispatch without leaking across concurrent webhooks (hazard #6: ContextVar boundaries). - ``handle_webhook`` awaits the resolver once at the top after the signature/verifier check, so dispatch + sync API helpers downstream observe rotation. - ``schedule_message`` ``cancel()`` re-resolves in single-workspace mode and snapshots ``ctx.token`` in multi-workspace mode (matches upstream rotation-safe semantics for 12h-TTL Slack rotated tokens). - ``Attachment.fetchData`` snapshots ``ctx.token`` for multi-workspace and defers to the resolver in single-workspace. - An explicit ``webhook_verifier`` opts out of the ``SLACK_SIGNING_SECRET`` env fallback so a deployment-set env can't silently shadow the verifier. - ``current_token_async()`` added alongside the existing sync ``current_token`` property for callers that need to invoke the resolver. SECURITY: the default verifier path continues to use ``hmac.compare_digest`` and a 5-minute timestamp tolerance check. When a custom verifier is configured, both are bypassed — implementers MUST do constant-time comparison + timestamp/replay validation themselves. The ``SlackWebhookVerifier`` docstring captures the contract. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
||
| from __future__ import annotations | ||
|
|
||
| from collections.abc import Awaitable, Callable |
There was a problem hiding this comment.
Code Review
This pull request introduces dynamic bot token resolution and custom webhook verification to the Slack adapter, enabling token rotation and alternative authentication methods. It implements per-request token isolation using ContextVar and updates the configuration and initialization logic accordingly. The review feedback correctly identified a flaw in the error handling of asynchronous token resolvers, suggesting that the await call be moved inside the try-except block to ensure all exceptions are caught.
| try: | ||
| result = provider() | ||
| except Exception as exc: | ||
| self._logger.error("Bot token resolver raised", {"error": exc}) | ||
| raise | ||
| if inspect.isawaitable(result): | ||
| token = await result | ||
| else: | ||
| token = result |
There was a problem hiding this comment.
The try...except block currently only wraps the initial call to the provider. If the provider is an async function, calling it merely returns a coroutine object; the actual execution (and any potential exceptions) happens when it is awaited. Moving the await inside the try block ensures that exceptions from both synchronous and asynchronous resolvers are consistently logged and handled by the adapter's logger.
| try: | |
| result = provider() | |
| except Exception as exc: | |
| self._logger.error("Bot token resolver raised", {"error": exc}) | |
| raise | |
| if inspect.isawaitable(result): | |
| token = await result | |
| else: | |
| token = result | |
| try: | |
| result = provider() | |
| if inspect.isawaitable(result): | |
| token = await result | |
| else: | |
| token = result | |
| except Exception as exc: | |
| self._logger.error("Bot token resolver raised", {"error": exc}) | |
| raise |
Address gemini-code-assist review on #87 (line 469). When the resolver is an async function, calling provider() returns a coroutine; the actual exception is raised at the await, which was outside the try block. Move the await inside the try so async resolver failures are logged through self._logger.error and propagate consistently with sync resolver failures. Adds test_async_resolver_exception_is_logged_and_propagated regression test that fails before the fix. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill
left a comment
There was a problem hiding this comment.
Reviewed against upstream vercel/chat@2531e9c and docs/UPSTREAM_SYNC.md hazards #3, #6, #11, #12.
Findings
🟡 Medium — test_default_verifier_uses_constant_time_compare is too weak (security regression test)
tests/test_slack_dynamic_token_and_verifier.py:670 does inspect.getsource(SlackAdapter._verify_signature) and asserts "compare_digest" in src. The function body in adapter.py:1042-1046 has the literal "compare_digest" in two comments in addition to the actual call on line 1046. A future maintainer who swaps the call to == while leaving the explanatory comment in place will pass this test. Tighten to e.g. "hmac.compare_digest(" (with the open paren) or re.search(r"\bhmac\.compare_digest\s*\(", src). The hazard #12 invariant deserves a stronger guard.
🟡 Medium — schedule_message().cancel() rotation safety has zero test coverage
The PR description and adapter.py:2728-2780 add the multi-workspace-snapshots / single-workspace-re-resolves split for cancel(), but tests/test_slack_dynamic_token_and_verifier.py contains no test that asserts: (a) cancel re-invokes the resolver in single-workspace mode (so a rotated token is observed), or (b) cancel uses the snapshotted ctx.token in multi-workspace mode (and does NOT touch the resolver/installation store). Same gap for Attachment.fetch_data (adapter.py:2076-2100 and 2199-2204) — the new "re-resolve at fetch time" behavior for single-workspace, and the snapshot for multi-workspace, are uncovered. These are exactly the rotation invariants the PR body advertises; without tests, a future refactor that re-introduces the captured bot_token = self._get_token() from the old code will pass CI silently.
🟡 Medium — SlackWebhookVerifier docstring is missing the third documented failure mode
The PR body's "Security model" section enumerates three failure modes (timing-side-channel, replay, body-substitution). The type alias docstring in src/chat_sdk/adapters/slack/types.py:34-39 only covers the first two. Body-substitution misuse is the most novel of the three (it is the new escape hatch surface this PR introduces) — please add a bullet warning that returning attacker-controlled bytes as the substituted body grants payload injection downstream. Otherwise an implementer reading only the docstring (not the PR description) won't know.
🔵 Nit — Resolver not isinstance(token, str) or not token rejection lacks coverage for non-string returns
_resolve_default_token (adapter.py:470) rejects empty string AND non-string values, but tests only exercise "" (test_resolver_returning_empty_string_raises). Add cases for None, int, dict returns so a refactor that drops the isinstance check fails.
🔵 Nit — current_token error-message regex is too loose
test_sync_current_token_with_resolver_before_resolution_raises matches "resolver has not been invoked" but does not assert the message points users at current_token_async(). The actual message does, but the test wouldn't catch a regression that drops the helpful pointer. Tighten to match="current_token_async".
🔵 Nit — SlackBotToken = "str | SlackBotTokenResolver" is a runtime string, not a type alias
types.py:23 assigns a string literal to SlackBotToken. With from __future__ import annotations the usages in annotations work, but typing.get_type_hints(SlackAdapterConfig) would fail to resolve it. Prefer from typing import TypeAlias; SlackBotToken: TypeAlias = str | SlackBotTokenResolver (or use a Union[str, SlackBotTokenResolver]).
✅ Looks good
- Hazard #6 ContextVar isolation:
_resolved_default_token.set(token)afterawait resultis correct — each asyncio Task gets its own context copy, so the post-await set lands in the right context. The concurrent isolation test (test_concurrent_resolver_invocations_do_not_leak_across_requests) actually exercises the race window viaasyncio.gather+ an event-gated interleave; it would fail with a shared instance attribute. Multi-workspace isolation test does the same for the InstallationStore path. - Hazard #3: resolver is invoked explicitly per request at the top of
handle_webhook(adapter.py:928-936); no implicit team_id pickup. Synccurrent_tokenraises with a pointer tocurrent_token_async()when the resolver hasn't fired. - Hazard #11: the
_get_clientLRU cache eviction-without-close behavior is pre-existing and documented in code comments — not introduced here. - Hazard #12:
signing_secretprecedence overwebhook_verifieris correct and tested (test_signing_secret_takes_precedence_over_verifier); env-fallback opt-out is tested (test_verifier_opts_out_of_env_signing_secret). The follow-up commit's async-resolver-exception logging is correctly tested bytest_async_resolver_exception_is_logged_and_propagated. - Single-workspace static-string fast path:
_default_bot_token_cacheprimed in the constructor preserves the pre-PR sync-only behavior for the common case.
Posted by an automated reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code
- Tighten ``test_default_verifier_uses_constant_time_compare`` to require an actual ``hmac.compare_digest(`` call via regex (substring "compare_digest" matches comments too). - Add 4 rotation/snapshot tests for ``schedule_message().cancel()`` and ``Attachment.fetch_data``: assert single-workspace mode re-resolves the token via the dynamic resolver, while multi-workspace mode uses the ctx_token snapshot captured at construction time and does NOT consult InstallationStore at call time. - Document body-substitution misuse in ``SlackWebhookVerifier`` docstring (returning attacker-controlled bytes grants payload injection — third documented failure mode alongside missing constant-time compare and replay protection). - Add resolver non-string return tests (``None``, ``int``, ``dict``) — only ``""`` was covered. - Tighten the ``current_token`` error-message regex to require ``current_token_async`` so callers know which async accessor to use. - Make ``SlackBotToken`` a real ``TypeAlias`` instead of a runtime string — the previous form was a plain ``str`` at runtime, defeating type-checker recognition. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill
left a comment
There was a problem hiding this comment.
Re-review of fixes in 2ba5f77
Verified each item from the previous-round fix list against src/chat_sdk/adapters/slack/{adapter.py,types.py} at HEAD and against upstream 2531e9c (vercel/chat#421).
Verified fixed (from prior round)
test_default_verifier_uses_constant_time_comparenow doesre.search(r"\bhmac\.compare_digest\s*\(", src)againstinspect.getsource(SlackAdapter._verify_signature)— the call-form regex would catch a swap to==or a comment-only mention. Good.- Four rotation-safety tests are present and exercise the right invariants:
test_schedule_message_cancel_re_resolves_token_in_single_workspace_modeasserts the resolver counter goes from 1→2 acrossschedule_message+cancel, and thatchat_deleteScheduledMessageis invoked withxoxb-new.test_schedule_message_cancel_uses_snapshot_in_multi_workspace_modespies onget_installationand asserts cancel issues zero further calls — good rotation/snapshot delineation.test_attachment_fetch_data_re_resolves_token_in_single_workspace_modeasserts the resolver is not called at attachment-creation time and is re-invoked on eachfetch_data().test_attachment_fetch_data_uses_snapshot_in_multi_workspace_modeassertsget_installationis not consulted at fetch time when ctx_token was captured.
SlackWebhookVerifierdocstring (types.py:25-45) covers the three failure modes — constant-time comparison (timing channel), replay protection, and body-substitution safety with the specific warning that returning attacker-controlled bytes grants payload injection.- Resolver non-string return tests are present:
test_resolver_returning_none_raises,test_resolver_returning_int_raises,test_resolver_returning_dict_raises(in addition to empty-string). - Sync
current_tokenerror message now includes "Use the async API (handle_webhook / current_token_async)…" and the test regex isr"current_token_async", which would catch a regression to a generic message. SlackBotTokenis nowTypeAlias = str | SlackBotTokenResolver.
All 37 tests in tests/test_slack_dynamic_token_and_verifier.py pass locally.
Findings (new)
Medium — single-workspace resolver gates URL verification (divergence from upstream)
handle_webhook at src/chat_sdk/adapters/slack/adapter.py:928-936 awaits _resolve_default_token() before the url_verification branch at :978. Upstream only invokes the provider via withToken({...}) at the per-API-call site, so a URL verification challenge succeeds even when the resolver is broken (no API call needed). In Python, a flaky/down secret-manager would make Slack's initial URL verification ping return 500 instead of echoing the challenge. Either short-circuit URL verification before the resolver call (parse JSON → handle URL verification → then resolve), or wrap the resolver invocation so URL verification stays oblivious to it. Add a regression test: resolver that raises + URL verification body must still return 200.
Medium — direct API calls outside webhook flow with resolver bot_token will raise
By design, _get_token (sync) does not invoke the resolver and the per-request _resolved_default_token ContextVar is only primed inside handle_webhook. A consumer that calls adapter.post_message(...) from a cron job / background task with bot_token=callable hits AuthenticationError("Bot token resolver has not been invoked yet…") — the docstring even says use _resolve_token_async first, but post_message/add_reaction/upload_files/etc. don't. Upstream's getToken is async and resolves on every API call, so cron-style usage works there. Either (a) make the public adapter methods call _resolve_token_async() before _get_client(), or (b) document this as a Python divergence in docs/UPSTREAM_SYNC.md "Known Non-Parity" with a clear "wrap your call in current_token_async() first" recipe.
Nit — missing parity test: "treats a function botToken as single-workspace mode"
Upstream index.test.ts covers this (initialize() must call auth.test with the resolved token when bot_token is a function and set bot_user_id). The Python adapter does this at adapter.py:532-545, but no test asserts it. Easy add: configure resolver, mock auth_test, call initialize(), assert _bot_user_id populated and the resolver was awaited.
Nit — within-request token caching is a divergence worth a code breadcrumb
Upstream calls the provider on every API call within a single request. Python caches per-request via _resolved_default_token ContextVar (so the sync _get_token path works). Functionally equivalent in the rotation sense (TTL >> request lifetime), but the divergence isn't called out at the _resolve_default_token site or in docs/UPSTREAM_SYNC.md. Add a comment + a one-liner row in the non-parity table so a future syncer doesn't try to "fix" it back.
Nit — _FakeRequest constructor type leaks to the verifier in tests
test_verifier_receives_request_and_body correctly asserts captured[0][0] is request. Worth confirming request exposes a body-readable shape that real ASGI/WSGI request objects do — the test currently passes a _FakeRequest, so verifiers that probe request.headers/request.method aren't covered. Optional: add one test using a starlette.requests.Request (or a richer fake) to ensure the verifier path doesn't accidentally rely on _FakeRequest-only attributes.
Re-review verdict: FOLLOW-UP NEEDED
Two medium parity gaps that aren't blockers but should land before this is called "synced": the URL-verification gating regression and the cron-mode resolver gap. The two nits + 1 doc breadcrumb are quick adds.
Posted by an automated re-reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code
Address re-review on PR #87 (Medium #1). Slack's url_verification ping arrives at app-install / event-subscription time and only expects the challenge echo — no bot token / API call required. Previously the single-workspace resolver was invoked at handle_webhook entry, BEFORE the url_verification short-circuit, so a flaky/down secret manager would block app installation with a 500. Move the JSON peek for url_verification ahead of _resolve_default_token() and short-circuit there. Mirrors upstream where getToken() is only called at per-API-call sites, never at webhook entry. Adds test_url_verification_bypasses_broken_resolver: configures a resolver that raises and asserts URL verification still returns 200 with the challenge echo. Also documents two related divergences in docs/UPSTREAM_SYNC.md non-parity table (Medium #2 + Nit from re-review): - Slack bot_token resolver invocation site: TS resolves on every API call site (cron-mode works); Python resolves once at handle_webhook entry into a ContextVar (cron callers must await current_token_async() first). - Within-request resolver caching scope: TS calls per API call; Python caches per request to keep _get_token sync. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
…n fix The previous version used a url_verification payload, which is now special-cased to short-circuit BEFORE the resolver runs (per the PR #87 follow-up fix). Switch to an event_callback payload — the actual dispatch path that DOES need a token — to exercise the invariant. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Final upstream-coverage audit before merging the 7 sync PRs (#84-#90) identified one undocumented N/A item: vercel/chat#415 (Teams SDK 2.0.8 + User-Agent) is a JS-only botbuilder dependency bump. The Python Teams adapter uses raw aiohttp (no botbuilder), so there is no equivalent dependency to bump. The optional User-Agent: Vercel.ChatSDK header on the ~9 outbound aiohttp call sites is a defense-in-depth nice-to-have; deferred as a follow-up rather than landed in this sync. Updates: - CHANGELOG.md: tick all completed items and link them to their PRs (#84, #85, #86, #87, #88, #89, #90, plus already-merged PR #74). Document #415 inline as N/A. - docs/UPSTREAM_SYNC.md non-parity table: add row for Teams User-Agent header divergence so future syncers don't try to "port" the JS bump. Item #6 (concurrency.maxConcurrent) is already implementation-covered in the Python port (existing divergence row at L492). The 4 new TS concurrency tests in chat.test.ts have Python-specific equivalents at test_chat_faithful.py L2969-3055 that don't name-match — leaving as deferred fidelity-baseline polish since the behavior is verified. Verdict from the coverage audit: all 18 substantive ports across PRs #84-#90 are upstream-verified. No commits in chat@4.26.0..f55378a were missed. Ready to start merging. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Summary
Ports upstream vercel/chat#421 (commit
2531e9c) —feat(slack): dynamic botToken resolver and custom webhookVerifier— into the Python SDK.SlackAdapterConfig.bot_tokennow acceptsstr | Callable[[], str | Awaitable[str]]. Existing static-string usage is unchanged. The resolver is invoked per call so token rotation and lazy retrieval from a secret manager just work.SlackAdapterConfig.webhook_verifieris a new escape hatch: a sync/async(request, body) -> bool | str | Nonethat replaces the built-in HMAC + timestamp verification. Returning a string substitutes the verified body for downstream parsing (matches upstream "verifier returns canonical body" semantics).signing_secretcontinues to take precedence overwebhook_verifierwhen both are set; passing an explicitwebhook_verifieropts out of theSLACK_SIGNING_SECRETenv fallback so a deployment-set env can't silently shadow the verifier.schedule_messagecancel()is now rotation-safe: multi-workspace snapshotsctx.token, single-workspace re-resolves at cancel time.Attachment.fetch_datamirrors this — multi-workspace snapshots, single-workspace defers to the resolver.SlackAdapter.current_token_async()companion to the existing synccurrent_tokenproperty for callers that need to invoke the resolver explicitly.Python-specific design notes
ContextVar(_resolved_default_token) caches the most recently resolved default token per request.handle_webhookawaits the resolver once at the top (after the verifier/signature check) so all downstream sync_get_token()call sites observe the freshly-resolved value without leaking across concurrent webhooks (hazard chore: bump to 0.0.1a3 #6: ContextVar boundaries).current_tokenproperty remains forcurrent_token/current_clientcallers that don't expect to await — for resolver-mode adapters accessed before any webhook has fired it raisesAuthenticationErrorwith a pointer tocurrent_token_async()(hazard fix: add contents:read permission for checkout #3: explicit context > globals)._get_client()cache is keyed by token, so rotating tokens warm a new entry per rotation and the LRU eviction reclaims old ones (hazard docs: Add comprehensive project documentation #11: session lifecycle). Pre-existing.Security model — custom
webhook_verifierThe default
_verify_signatureuseshmac.compare_digestand a 5-minutex-slack-request-timestamptolerance window. The custom verifier replaces both. Worst-case failure modes when an implementer gets it wrong:==on signature bytes leaks the expected MAC bit by bit. Implementers must usehmac.compare_digest(or platform equivalent).x-slack-request-timestamp(or an equivalent freshness signal) inside the verifier.strreturn value replaces the body for downstream parsing. A verifier that returns attacker-controlled bytes without validating them grants payload injection.The
SlackWebhookVerifiertype alias docstring captures the contract. The default path (no custom verifier) still useshmac.compare_digest— a regression test (TestSecurityProperties::test_default_verifier_uses_constant_time_compare) inspects the source so any future swap to==fails CI.Tests
29 new tests in
tests/test_slack_dynamic_token_and_verifier.py:str, sync callable, async callable, env-fallback opt-out, signing_secret precedence.current_token_async(), per-call rotation, error propagation, empty/non-string returns rejected, lazy (not invoked at construction)._resolve_default_tokencalls each see their own token (would fail with a shared instance attribute — verified by simulation). Multi-workspace concurrent webhooks for two teams each see their own InstallationStore token.handle_webhook; resolved value is visible to sync_get_token()from inside_process_event_payload.compare_digest; custom verifier can accept requests the default check would reject (documents the escape hatch).Test plan
uv run ruff check src/ tests/ scripts/uv run ruff format --check src/ tests/ scripts/uv run python scripts/audit_test_quality.py— 0 hard failuresuv run pytest tests/ --tb=short -q— 3697 passed, 1 pre-existing failure (tests/test_github_webhook.py::TestGitHubAdapterConstructor::test_throws_when_no_auth), 2 skippedtests/test_slack_dynamic_token_and_verifier.pypasshttps://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code