Skip to content

feat(slack): dynamic bot_token resolver and custom webhook_verifier (vercel/chat#421)#87

Draft
patrick-chinchill wants to merge 6 commits intomainfrom
claude/port-slack-dynamic-bot-token-J7S7H
Draft

feat(slack): dynamic bot_token resolver and custom webhook_verifier (vercel/chat#421)#87
patrick-chinchill wants to merge 6 commits intomainfrom
claude/port-slack-dynamic-bot-token-J7S7H

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

Summary

Ports upstream vercel/chat#421 (commit 2531e9c) — feat(slack): dynamic botToken resolver and custom webhookVerifier — into the Python SDK.

  • SlackAdapterConfig.bot_token now accepts str | Callable[[], str | Awaitable[str]]. Existing static-string usage is unchanged. The resolver is invoked per call so token rotation and lazy retrieval from a secret manager just work.
  • SlackAdapterConfig.webhook_verifier is a new escape hatch: a sync/async (request, body) -> bool | str | None that replaces the built-in HMAC + timestamp verification. Returning a string substitutes the verified body for downstream parsing (matches upstream "verifier returns canonical body" semantics).
  • signing_secret continues to take precedence over webhook_verifier when both are set; passing an explicit webhook_verifier opts out of the SLACK_SIGNING_SECRET env fallback so a deployment-set env can't silently shadow the verifier.
  • schedule_message cancel() is now rotation-safe: multi-workspace snapshots ctx.token, single-workspace re-resolves at cancel time. Attachment.fetch_data mirrors this — multi-workspace snapshots, single-workspace defers to the resolver.
  • New SlackAdapter.current_token_async() companion to the existing sync current_token property for callers that need to invoke the resolver explicitly.

Python-specific design notes

  • A per-instance ContextVar (_resolved_default_token) caches the most recently resolved default token per request. handle_webhook awaits the resolver once at the top (after the verifier/signature check) so all downstream sync _get_token() call sites observe the freshly-resolved value without leaking across concurrent webhooks (hazard chore: bump to 0.0.1a3 #6: ContextVar boundaries).
  • The sync current_token property remains for current_token / current_client callers that don't expect to await — for resolver-mode adapters accessed before any webhook has fired it raises AuthenticationError with a pointer to current_token_async() (hazard fix: add contents:read permission for checkout #3: explicit context > globals).
  • _get_client() cache is keyed by token, so rotating tokens warm a new entry per rotation and the LRU eviction reclaims old ones (hazard docs: Add comprehensive project documentation #11: session lifecycle). Pre-existing.

Security model — custom webhook_verifier

The default _verify_signature uses hmac.compare_digest and a 5-minute x-slack-request-timestamp tolerance window. The custom verifier replaces both. Worst-case failure modes when an implementer gets it wrong:

  1. Timing-side-channel signature comparison — using == on signature bytes leaks the expected MAC bit by bit. Implementers must use hmac.compare_digest (or platform equivalent).
  2. Replay attack — without timestamp/freshness checks, an attacker who captured a single signed request can replay it indefinitely. Implementers must validate x-slack-request-timestamp (or an equivalent freshness signal) inside the verifier.
  3. Body-substitution misuse — the str return value replaces the body for downstream parsing. A verifier that returns attacker-controlled bytes without validating them grants payload injection.

The SlackWebhookVerifier type alias docstring captures the contract. The default path (no custom verifier) still uses hmac.compare_digest — a regression test (TestSecurityProperties::test_default_verifier_uses_constant_time_compare) inspects the source so any future swap to == fails CI.

Tests

29 new tests in tests/test_slack_dynamic_token_and_verifier.py:

  • Constructor: static str, sync callable, async callable, env-fallback opt-out, signing_secret precedence.
  • Resolver invocation: sync + async paths via current_token_async(), per-call rotation, error propagation, empty/non-string returns rejected, lazy (not invoked at construction).
  • Webhook verifier: truthy accept, falsy reject (False, None), throws → 401, async verifier awaited, body substitution via string return, default signature check skipped.
  • Cross-request isolation: forced-interleave concurrent _resolve_default_token calls each see their own token (would fail with a shared instance attribute — verified by simulation). Multi-workspace concurrent webhooks for two teams each see their own InstallationStore token.
  • Resolver integration with webhook flow: resolver is invoked at top of handle_webhook; resolved value is visible to sync _get_token() from inside _process_event_payload.
  • Security: default verifier uses compare_digest; custom verifier can accept requests the default check would reject (documents the escape hatch).

Test plan

  • uv run ruff check src/ tests/ scripts/
  • uv run ruff format --check src/ tests/ scripts/
  • uv run python scripts/audit_test_quality.py — 0 hard failures
  • uv run pytest tests/ --tb=short -q — 3697 passed, 1 pre-existing failure (tests/test_github_webhook.py::TestGitHubAdapterConstructor::test_throws_when_no_auth), 2 skipped
  • All 29 new tests in tests/test_slack_dynamic_token_and_verifier.py pass
  • All pre-existing Slack tests (496) still pass

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj


Generated by Claude Code

…ercel/chat#421)

Allow ``bot_token`` to be a zero-arg callable returning ``str | Awaitable[str]``
so apps can rotate or lazily fetch tokens; the resolver is invoked per call
(rotation-safe). Add ``webhook_verifier`` as an alternative to
``signing_secret`` for custom request verification — returns truthy/string
to accept (string substitutes the body for downstream parsing), falsy or
raises to reject with 401.

Mirrors the upstream TS PR. Notable adaptations:

- Per-request ``ContextVar`` (``_resolved_default_token``) caches the most
  recent resolver result so the existing sync ``_get_token()`` call sites
  see a primed value during dispatch without leaking across concurrent
  webhooks (hazard #6: ContextVar boundaries).
- ``handle_webhook`` awaits the resolver once at the top after the
  signature/verifier check, so dispatch + sync API helpers downstream
  observe rotation.
- ``schedule_message`` ``cancel()`` re-resolves in single-workspace mode
  and snapshots ``ctx.token`` in multi-workspace mode (matches upstream
  rotation-safe semantics for 12h-TTL Slack rotated tokens).
- ``Attachment.fetchData`` snapshots ``ctx.token`` for multi-workspace
  and defers to the resolver in single-workspace.
- An explicit ``webhook_verifier`` opts out of the
  ``SLACK_SIGNING_SECRET`` env fallback so a deployment-set env can't
  silently shadow the verifier.
- ``current_token_async()`` added alongside the existing sync
  ``current_token`` property for callers that need to invoke the resolver.

SECURITY: the default verifier path continues to use
``hmac.compare_digest`` and a 5-minute timestamp tolerance check. When a
custom verifier is configured, both are bypassed — implementers MUST do
constant-time comparison + timestamp/replay validation themselves. The
``SlackWebhookVerifier`` docstring captures the contract.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ca87cef6-e2c3-455e-a8d8-a2f48571286c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/port-slack-dynamic-bot-token-J7S7H

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.


from __future__ import annotations

from collections.abc import Awaitable, Callable
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces dynamic bot token resolution and custom webhook verification to the Slack adapter, enabling token rotation and alternative authentication methods. It implements per-request token isolation using ContextVar and updates the configuration and initialization logic accordingly. The review feedback correctly identified a flaw in the error handling of asynchronous token resolvers, suggesting that the await call be moved inside the try-except block to ensure all exceptions are caught.

Comment thread src/chat_sdk/adapters/slack/adapter.py Outdated
Comment on lines +461 to +469
try:
result = provider()
except Exception as exc:
self._logger.error("Bot token resolver raised", {"error": exc})
raise
if inspect.isawaitable(result):
token = await result
else:
token = result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The try...except block currently only wraps the initial call to the provider. If the provider is an async function, calling it merely returns a coroutine object; the actual execution (and any potential exceptions) happens when it is awaited. Moving the await inside the try block ensures that exceptions from both synchronous and asynchronous resolvers are consistently logged and handled by the adapter's logger.

Suggested change
try:
result = provider()
except Exception as exc:
self._logger.error("Bot token resolver raised", {"error": exc})
raise
if inspect.isawaitable(result):
token = await result
else:
token = result
try:
result = provider()
if inspect.isawaitable(result):
token = await result
else:
token = result
except Exception as exc:
self._logger.error("Bot token resolver raised", {"error": exc})
raise

Address gemini-code-assist review on #87 (line 469). When the resolver is
an async function, calling provider() returns a coroutine; the actual
exception is raised at the await, which was outside the try block. Move
the await inside the try so async resolver failures are logged through
self._logger.error and propagate consistently with sync resolver
failures.

Adds test_async_resolver_exception_is_logged_and_propagated regression
test that fails before the fix.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Copy link
Copy Markdown
Collaborator Author

@patrick-chinchill patrick-chinchill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed against upstream vercel/chat@2531e9c and docs/UPSTREAM_SYNC.md hazards #3, #6, #11, #12.

Findings

🟡 Medium — test_default_verifier_uses_constant_time_compare is too weak (security regression test)

tests/test_slack_dynamic_token_and_verifier.py:670 does inspect.getsource(SlackAdapter._verify_signature) and asserts "compare_digest" in src. The function body in adapter.py:1042-1046 has the literal "compare_digest" in two comments in addition to the actual call on line 1046. A future maintainer who swaps the call to == while leaving the explanatory comment in place will pass this test. Tighten to e.g. "hmac.compare_digest(" (with the open paren) or re.search(r"\bhmac\.compare_digest\s*\(", src). The hazard #12 invariant deserves a stronger guard.

🟡 Medium — schedule_message().cancel() rotation safety has zero test coverage

The PR description and adapter.py:2728-2780 add the multi-workspace-snapshots / single-workspace-re-resolves split for cancel(), but tests/test_slack_dynamic_token_and_verifier.py contains no test that asserts: (a) cancel re-invokes the resolver in single-workspace mode (so a rotated token is observed), or (b) cancel uses the snapshotted ctx.token in multi-workspace mode (and does NOT touch the resolver/installation store). Same gap for Attachment.fetch_data (adapter.py:2076-2100 and 2199-2204) — the new "re-resolve at fetch time" behavior for single-workspace, and the snapshot for multi-workspace, are uncovered. These are exactly the rotation invariants the PR body advertises; without tests, a future refactor that re-introduces the captured bot_token = self._get_token() from the old code will pass CI silently.

🟡 Medium — SlackWebhookVerifier docstring is missing the third documented failure mode

The PR body's "Security model" section enumerates three failure modes (timing-side-channel, replay, body-substitution). The type alias docstring in src/chat_sdk/adapters/slack/types.py:34-39 only covers the first two. Body-substitution misuse is the most novel of the three (it is the new escape hatch surface this PR introduces) — please add a bullet warning that returning attacker-controlled bytes as the substituted body grants payload injection downstream. Otherwise an implementer reading only the docstring (not the PR description) won't know.

🔵 Nit — Resolver not isinstance(token, str) or not token rejection lacks coverage for non-string returns

_resolve_default_token (adapter.py:470) rejects empty string AND non-string values, but tests only exercise "" (test_resolver_returning_empty_string_raises). Add cases for None, int, dict returns so a refactor that drops the isinstance check fails.

🔵 Nit — current_token error-message regex is too loose

test_sync_current_token_with_resolver_before_resolution_raises matches "resolver has not been invoked" but does not assert the message points users at current_token_async(). The actual message does, but the test wouldn't catch a regression that drops the helpful pointer. Tighten to match="current_token_async".

🔵 Nit — SlackBotToken = "str | SlackBotTokenResolver" is a runtime string, not a type alias

types.py:23 assigns a string literal to SlackBotToken. With from __future__ import annotations the usages in annotations work, but typing.get_type_hints(SlackAdapterConfig) would fail to resolve it. Prefer from typing import TypeAlias; SlackBotToken: TypeAlias = str | SlackBotTokenResolver (or use a Union[str, SlackBotTokenResolver]).

✅ Looks good

  • Hazard #6 ContextVar isolation: _resolved_default_token.set(token) after await result is correct — each asyncio Task gets its own context copy, so the post-await set lands in the right context. The concurrent isolation test (test_concurrent_resolver_invocations_do_not_leak_across_requests) actually exercises the race window via asyncio.gather + an event-gated interleave; it would fail with a shared instance attribute. Multi-workspace isolation test does the same for the InstallationStore path.
  • Hazard #3: resolver is invoked explicitly per request at the top of handle_webhook (adapter.py:928-936); no implicit team_id pickup. Sync current_token raises with a pointer to current_token_async() when the resolver hasn't fired.
  • Hazard #11: the _get_client LRU cache eviction-without-close behavior is pre-existing and documented in code comments — not introduced here.
  • Hazard #12: signing_secret precedence over webhook_verifier is correct and tested (test_signing_secret_takes_precedence_over_verifier); env-fallback opt-out is tested (test_verifier_opts_out_of_env_signing_secret). The follow-up commit's async-resolver-exception logging is correctly tested by test_async_resolver_exception_is_logged_and_propagated.
  • Single-workspace static-string fast path: _default_bot_token_cache primed in the constructor preserves the pre-PR sync-only behavior for the common case.

Posted by an automated reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj


Generated by Claude Code

- Tighten ``test_default_verifier_uses_constant_time_compare`` to require
  an actual ``hmac.compare_digest(`` call via regex (substring "compare_digest"
  matches comments too).
- Add 4 rotation/snapshot tests for ``schedule_message().cancel()`` and
  ``Attachment.fetch_data``: assert single-workspace mode re-resolves the
  token via the dynamic resolver, while multi-workspace mode uses the
  ctx_token snapshot captured at construction time and does NOT consult
  InstallationStore at call time.
- Document body-substitution misuse in ``SlackWebhookVerifier`` docstring
  (returning attacker-controlled bytes grants payload injection — third
  documented failure mode alongside missing constant-time compare and
  replay protection).
- Add resolver non-string return tests (``None``, ``int``, ``dict``) — only
  ``""`` was covered.
- Tighten the ``current_token`` error-message regex to require
  ``current_token_async`` so callers know which async accessor to use.
- Make ``SlackBotToken`` a real ``TypeAlias`` instead of a runtime string —
  the previous form was a plain ``str`` at runtime, defeating type-checker
  recognition.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Copy link
Copy Markdown
Collaborator Author

@patrick-chinchill patrick-chinchill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of fixes in 2ba5f77

Verified each item from the previous-round fix list against src/chat_sdk/adapters/slack/{adapter.py,types.py} at HEAD and against upstream 2531e9c (vercel/chat#421).

Verified fixed (from prior round)

  1. test_default_verifier_uses_constant_time_compare now does re.search(r"\bhmac\.compare_digest\s*\(", src) against inspect.getsource(SlackAdapter._verify_signature) — the call-form regex would catch a swap to == or a comment-only mention. Good.
  2. Four rotation-safety tests are present and exercise the right invariants:
    • test_schedule_message_cancel_re_resolves_token_in_single_workspace_mode asserts the resolver counter goes from 1→2 across schedule_message + cancel, and that chat_deleteScheduledMessage is invoked with xoxb-new.
    • test_schedule_message_cancel_uses_snapshot_in_multi_workspace_mode spies on get_installation and asserts cancel issues zero further calls — good rotation/snapshot delineation.
    • test_attachment_fetch_data_re_resolves_token_in_single_workspace_mode asserts the resolver is not called at attachment-creation time and is re-invoked on each fetch_data().
    • test_attachment_fetch_data_uses_snapshot_in_multi_workspace_mode asserts get_installation is not consulted at fetch time when ctx_token was captured.
  3. SlackWebhookVerifier docstring (types.py:25-45) covers the three failure modes — constant-time comparison (timing channel), replay protection, and body-substitution safety with the specific warning that returning attacker-controlled bytes grants payload injection.
  4. Resolver non-string return tests are present: test_resolver_returning_none_raises, test_resolver_returning_int_raises, test_resolver_returning_dict_raises (in addition to empty-string).
  5. Sync current_token error message now includes "Use the async API (handle_webhook / current_token_async)…" and the test regex is r"current_token_async", which would catch a regression to a generic message.
  6. SlackBotToken is now TypeAlias = str | SlackBotTokenResolver.

All 37 tests in tests/test_slack_dynamic_token_and_verifier.py pass locally.

Findings (new)

Medium — single-workspace resolver gates URL verification (divergence from upstream)
handle_webhook at src/chat_sdk/adapters/slack/adapter.py:928-936 awaits _resolve_default_token() before the url_verification branch at :978. Upstream only invokes the provider via withToken({...}) at the per-API-call site, so a URL verification challenge succeeds even when the resolver is broken (no API call needed). In Python, a flaky/down secret-manager would make Slack's initial URL verification ping return 500 instead of echoing the challenge. Either short-circuit URL verification before the resolver call (parse JSON → handle URL verification → then resolve), or wrap the resolver invocation so URL verification stays oblivious to it. Add a regression test: resolver that raises + URL verification body must still return 200.

Medium — direct API calls outside webhook flow with resolver bot_token will raise
By design, _get_token (sync) does not invoke the resolver and the per-request _resolved_default_token ContextVar is only primed inside handle_webhook. A consumer that calls adapter.post_message(...) from a cron job / background task with bot_token=callable hits AuthenticationError("Bot token resolver has not been invoked yet…") — the docstring even says use _resolve_token_async first, but post_message/add_reaction/upload_files/etc. don't. Upstream's getToken is async and resolves on every API call, so cron-style usage works there. Either (a) make the public adapter methods call _resolve_token_async() before _get_client(), or (b) document this as a Python divergence in docs/UPSTREAM_SYNC.md "Known Non-Parity" with a clear "wrap your call in current_token_async() first" recipe.

Nit — missing parity test: "treats a function botToken as single-workspace mode"
Upstream index.test.ts covers this (initialize() must call auth.test with the resolved token when bot_token is a function and set bot_user_id). The Python adapter does this at adapter.py:532-545, but no test asserts it. Easy add: configure resolver, mock auth_test, call initialize(), assert _bot_user_id populated and the resolver was awaited.

Nit — within-request token caching is a divergence worth a code breadcrumb
Upstream calls the provider on every API call within a single request. Python caches per-request via _resolved_default_token ContextVar (so the sync _get_token path works). Functionally equivalent in the rotation sense (TTL >> request lifetime), but the divergence isn't called out at the _resolve_default_token site or in docs/UPSTREAM_SYNC.md. Add a comment + a one-liner row in the non-parity table so a future syncer doesn't try to "fix" it back.

Nit — _FakeRequest constructor type leaks to the verifier in tests
test_verifier_receives_request_and_body correctly asserts captured[0][0] is request. Worth confirming request exposes a body-readable shape that real ASGI/WSGI request objects do — the test currently passes a _FakeRequest, so verifiers that probe request.headers/request.method aren't covered. Optional: add one test using a starlette.requests.Request (or a richer fake) to ensure the verifier path doesn't accidentally rely on _FakeRequest-only attributes.

Re-review verdict: FOLLOW-UP NEEDED

Two medium parity gaps that aren't blockers but should land before this is called "synced": the URL-verification gating regression and the cron-mode resolver gap. The two nits + 1 doc breadcrumb are quick adds.

Posted by an automated re-reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj


Generated by Claude Code

claude added 3 commits May 9, 2026 20:30
Address re-review on PR #87 (Medium #1). Slack's url_verification ping
arrives at app-install / event-subscription time and only expects the
challenge echo — no bot token / API call required. Previously the
single-workspace resolver was invoked at handle_webhook entry, BEFORE
the url_verification short-circuit, so a flaky/down secret manager
would block app installation with a 500. Move the JSON peek for
url_verification ahead of _resolve_default_token() and short-circuit
there. Mirrors upstream where getToken() is only called at per-API-call
sites, never at webhook entry.

Adds test_url_verification_bypasses_broken_resolver: configures a
resolver that raises and asserts URL verification still returns 200 with
the challenge echo.

Also documents two related divergences in docs/UPSTREAM_SYNC.md
non-parity table (Medium #2 + Nit from re-review):
- Slack bot_token resolver invocation site: TS resolves on every API
  call site (cron-mode works); Python resolves once at handle_webhook
  entry into a ContextVar (cron callers must await
  current_token_async() first).
- Within-request resolver caching scope: TS calls per API call; Python
  caches per request to keep _get_token sync.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
…n fix

The previous version used a url_verification payload, which is now
special-cased to short-circuit BEFORE the resolver runs (per the
PR #87 follow-up fix). Switch to an event_callback payload — the actual
dispatch path that DOES need a token — to exercise the invariant.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill pushed a commit that referenced this pull request May 10, 2026
Final upstream-coverage audit before merging the 7 sync PRs (#84-#90)
identified one undocumented N/A item:

vercel/chat#415 (Teams SDK 2.0.8 + User-Agent) is a JS-only botbuilder
dependency bump. The Python Teams adapter uses raw aiohttp (no
botbuilder), so there is no equivalent dependency to bump. The optional
User-Agent: Vercel.ChatSDK header on the ~9 outbound aiohttp call sites
is a defense-in-depth nice-to-have; deferred as a follow-up rather than
landed in this sync.

Updates:
- CHANGELOG.md: tick all completed items and link them to their PRs
  (#84, #85, #86, #87, #88, #89, #90, plus already-merged PR #74).
  Document #415 inline as N/A.
- docs/UPSTREAM_SYNC.md non-parity table: add row for Teams User-Agent
  header divergence so future syncers don't try to "port" the JS bump.

Item #6 (concurrency.maxConcurrent) is already implementation-covered
in the Python port (existing divergence row at L492). The 4 new TS
concurrency tests in chat.test.ts have Python-specific equivalents at
test_chat_faithful.py L2969-3055 that don't name-match — leaving as
deferred fidelity-baseline polish since the behavior is verified.

Verdict from the coverage audit: all 18 substantive ports across PRs
#84-#90 are upstream-verified. No commits in chat@4.26.0..f55378a were
missed. Ready to start merging.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants