security: Fix all critical and high findings from security audit#1
security: Fix all critical and high findings from security audit#1patrick-chinchill merged 4 commits intomainfrom
Conversation
Critical: - Teams: Add JWT verification for incoming webhooks (Bot Framework spec) - Teams: Validate service_url against Microsoft endpoint allowlist (SSRF fix) - Discord: Use hmac.compare_digest for gateway token (timing-safe) High: - Redis/Memory: Use secrets.token_hex for lock tokens (CSPRNG) - WhatsApp: Validate media download URLs + strip auth header from CDN - CI: Pin all GitHub Actions to commit SHAs Medium: - Google Chat: Cache JWKS client instance (was creating per-request) Also: - Fix README quick start (add missing imports) - Export error types from top-level __init__.py
There was a problem hiding this comment.
Code Review
This pull request introduces several security and performance enhancements across multiple chat adapters. Key improvements include the implementation of constant-time token comparisons for Discord, SSRF protection for Teams and WhatsApp by validating service and media download URLs, and the transition to cryptographically secure token generation using the secrets module for memory and Redis state adapters. Additionally, JWT verification was added for the Teams adapter, and JWKS client caching was implemented for both Google Chat and Teams to reduce per-request overhead. Feedback was provided regarding the Teams JWT verification process, specifically suggesting an explicit check of the HTTP response status when fetching the OpenID configuration to ensure more robust error handling.
| async with session.get(BOT_FRAMEWORK_OPENID_CONFIG_URL) as resp: | ||
| openid_config = await resp.json() |
There was a problem hiding this comment.
For improved robustness and clearer error logging, it's a good practice to explicitly check if the HTTP request to fetch the OpenID configuration was successful before attempting to parse the JSON response. This will provide more specific logs if the endpoint is unavailable or returns an error.
| async with session.get(BOT_FRAMEWORK_OPENID_CONFIG_URL) as resp: | |
| openid_config = await resp.json() | |
| async with session.get(BOT_FRAMEWORK_OPENID_CONFIG_URL) as resp: | |
| if not resp.ok: | |
| self._logger.error( | |
| "Failed to fetch Bot Framework OpenID config", | |
| {"status": resp.status, "url": BOT_FRAMEWORK_OPENID_CONFIG_URL}, | |
| ) | |
| return self._make_response("Unauthorized", 401) | |
| openid_config = await resp.json() |
There was a problem hiding this comment.
Already addressed in the existing code -- the resp.status != 200 check was present at line 1700 before this review comment was posted. No changes needed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f37fa43fba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- WhatsApp: expand media URL allowlist to include *.fbsbx.com, *.whatsapp.net, *.whatsapp.com (real Meta CDN domains used by Cloud API) - Teams: expand service URL allowlist to include sovereign cloud endpoints (GCC, GCCH, DoD: *.teams.microsoft.com, *.teams.microsoft.us, smba.infra.*) - Teams: check OpenID config HTTP response status before parsing JSON Note: The upstream TS SDK does not validate service_url or media URLs at all (Teams delegates auth to @microsoft/teams.apps SDK, WhatsApp follows any URL). Our validation is additive security hardening not present in the original.
- Fix Slack rate limit detection: _handle_slack_error now correctly handles SlackResponse objects (not just dicts) by checking resp.data, and extracts the Retry-After header for rate limit errors - Fix Teams JWT verification blocking: wrap synchronous PyJWKClient.get_signing_key_from_jwt in asyncio.to_thread to avoid blocking the event loop during JWKS fetch - Make Slack client cache max configurable: add client_cache_max field to SlackAdapterConfig (defaults to 100) - Fix unsafe session cleanup on LRU eviction: remove create_task session close that could break concurrent requests using evicted clients; rely on garbage collection instead - Add tests for SlackResponse handling, rate limiting, and configurable cache max Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- process_options_load: switch to `is not None` check so handlers returning [] short-circuit correctly (Port Rule #1 truthiness trap) - Slack adapter: replace `or ""` fallbacks in OptionsLoadEvent construction with explicit None checks - test helper: URL-encode payload JSON in signed-request body - Narrow OptionsLoadHandler return type and typed Protocol signature for process_options_load (replace Any with OptionsLoadEvent) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- chat.py:2107 — type the rehydrate-attachment callable so the list comprehension narrows to list[Attachment]. Unblocks CI. - _coerce_attachments: replace `or` fallbacks with `is not None` (Port Rule #1 truthiness trap) - google_chat rehydrate_attachment: preserve resolved URL when reconstructing, drop truthiness fallback on meta["url"] - Harden telegram and whatsapp rehydrate tests to execute the async callback and verify download-method wiring (AsyncMock). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ness fallbacks Second review pass on PR #67 (rehydrate_attachment). The previous fixup addressed pyrefly only — this commit resolves the remaining review feedback. SSRF guards (3 adapters) - Slack, Teams, Google Chat all rebuild fetch_data closures from serialized fetch_metadata["url"] in rehydrate_attachment. A tampered URL in persisted queue state could exfiltrate the workspace bot/OAuth token to an attacker-controlled host. Each adapter now validates the URL's scheme (https only) and host against a platform-specific allowlist before forwarding the auth header. Upstream TS does not validate; this is a Python-first divergence documented in docs/UPSTREAM_SYNC.md. - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com - Teams: Microsoft-owned hosts (graph.microsoft.com, smba.trafficmanager.net, *.sharepoint.com, *.botframework.com, *.office.com, attachments.office.net, …) - Google Chat: chat.googleapis.com, *.googleapis.com, *.googleusercontent.com, *.google.com Message-instance rehydration (P1) - Chat._rehydrate_message used to early-return on Message inputs, matching upstream TS's `raw instanceof Message` shortcut. That shortcut is safe in upstream because its state adapters return raw JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter both upgrade the dequeued dict to `Message.from_json(...)` before returning, so the early return would skip rehydrate_attachment for every persistent-backend dequeue and leave fetch_data stripped. We now fall through and apply the rehydrate pass on Message inputs too (already-hydrated attachments with fetch_data are filtered out). Truthiness fallbacks (Port Rule #1) - telegram, whatsapp rehydrate_attachment and types.py dual-key fetch_metadata lookup now use explicit `is not None` instead of `or`, so an empty-dict fetch_metadata is preserved. Teams connection pooling - _build_teams_fetch_data used httpx.AsyncClient as a throwaway context manager per download. Refactored to use the shared aiohttp session (_get_http_session) that the rest of the adapter already goes through. Test hardening - test_slack_webhook.py and test_teams_adapter.py now stub the fetch path with AsyncMock, await rehydrated.fetch_data(), and assert the URL + token that were forwarded. Previously the tests only checked that `fetch_data is not None` — they would have passed even if rehydration returned a dummy closure. - New tests per adapter verify the SSRF guard rejects untrusted hosts and the allowlist accepts the intended Slack/Teams/GCP hosts. - New regression test in test_chat_faithful.py drives a Message- instance dequeue through the chat queue and asserts rehydrate_attachment still fires. Slack adapter connection pooling (deferred) - _fetch_slack_file still uses httpx.AsyncClient per call. The Slack adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient for Slack API calls), so adding one is a larger refactor left for a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- StreamingPlanOptions mapping: replace truthy checks with `is not None` so update_interval_ms=0 and end_with=[] propagate correctly (thread.py post dispatcher and _fallback_stream interval guard). Per CLAUDE.md Port Rule #1; diverges from upstream thread.ts, which has the same latent bug. - StreamingPlan: raise RuntimeError from is_supported() and get_fallback_text() so misroutes through post_postable_object / Channel.post fail loudly instead of posting "" or attempting a wrong-shape adapter.post_object("stream", ...). Also guard post_postable_object() with an early kind=="stream" check for a clearer error message. Diverges from upstream postable-object.ts, which silently posts the empty fallback string. - Fallback test: spy on _fallback_stream to capture the StreamOptions actually reaching the fallback path and assert task_display_mode, stop_blocks, and update_interval_ms all propagate -- previous test passed even when options were silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 fixup over-applied Port Rule #1. Upstream JS uses || chain intentionally — empty-string username falls through to name/user_id. The is-not-None chain preserved "" and diverged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- process_options_load: switch to `is not None` check so handlers returning [] short-circuit correctly (Port Rule #1 truthiness trap) - Slack adapter: replace `or ""` fallbacks in OptionsLoadEvent construction with explicit None checks - test helper: URL-encode payload JSON in signed-request body - Narrow OptionsLoadHandler return type and typed Protocol signature for process_options_load (replace Any with OptionsLoadEvent) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 fixup over-applied Port Rule #1. Upstream JS uses || chain intentionally — empty-string username falls through to name/user_id. The is-not-None chain preserved "" and diverged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- StreamingPlanOptions mapping: replace truthy checks with `is not None` so update_interval_ms=0 and end_with=[] propagate correctly (thread.py post dispatcher and _fallback_stream interval guard). Per CLAUDE.md Port Rule #1; diverges from upstream thread.ts, which has the same latent bug. - StreamingPlan: raise RuntimeError from is_supported() and get_fallback_text() so misroutes through post_postable_object / Channel.post fail loudly instead of posting "" or attempting a wrong-shape adapter.post_object("stream", ...). Also guard post_postable_object() with an early kind=="stream" check for a clearer error message. Diverges from upstream postable-object.ts, which silently posts the empty fallback string. - Fallback test: spy on _fallback_stream to capture the StreamOptions actually reaching the fallback path and assert task_display_mode, stop_blocks, and update_interval_ms all propagate -- previous test passed even when options were silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- chat.py:2107 — type the rehydrate-attachment callable so the list comprehension narrows to list[Attachment]. Unblocks CI. - _coerce_attachments: replace `or` fallbacks with `is not None` (Port Rule #1 truthiness trap) - google_chat rehydrate_attachment: preserve resolved URL when reconstructing, drop truthiness fallback on meta["url"] - Harden telegram and whatsapp rehydrate tests to execute the async callback and verify download-method wiring (AsyncMock). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ness fallbacks Second review pass on PR #67 (rehydrate_attachment). The previous fixup addressed pyrefly only — this commit resolves the remaining review feedback. SSRF guards (3 adapters) - Slack, Teams, Google Chat all rebuild fetch_data closures from serialized fetch_metadata["url"] in rehydrate_attachment. A tampered URL in persisted queue state could exfiltrate the workspace bot/OAuth token to an attacker-controlled host. Each adapter now validates the URL's scheme (https only) and host against a platform-specific allowlist before forwarding the auth header. Upstream TS does not validate; this is a Python-first divergence documented in docs/UPSTREAM_SYNC.md. - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com - Teams: Microsoft-owned hosts (graph.microsoft.com, smba.trafficmanager.net, *.sharepoint.com, *.botframework.com, *.office.com, attachments.office.net, …) - Google Chat: chat.googleapis.com, *.googleapis.com, *.googleusercontent.com, *.google.com Message-instance rehydration (P1) - Chat._rehydrate_message used to early-return on Message inputs, matching upstream TS's `raw instanceof Message` shortcut. That shortcut is safe in upstream because its state adapters return raw JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter both upgrade the dequeued dict to `Message.from_json(...)` before returning, so the early return would skip rehydrate_attachment for every persistent-backend dequeue and leave fetch_data stripped. We now fall through and apply the rehydrate pass on Message inputs too (already-hydrated attachments with fetch_data are filtered out). Truthiness fallbacks (Port Rule #1) - telegram, whatsapp rehydrate_attachment and types.py dual-key fetch_metadata lookup now use explicit `is not None` instead of `or`, so an empty-dict fetch_metadata is preserved. Teams connection pooling - _build_teams_fetch_data used httpx.AsyncClient as a throwaway context manager per download. Refactored to use the shared aiohttp session (_get_http_session) that the rest of the adapter already goes through. Test hardening - test_slack_webhook.py and test_teams_adapter.py now stub the fetch path with AsyncMock, await rehydrated.fetch_data(), and assert the URL + token that were forwarded. Previously the tests only checked that `fetch_data is not None` — they would have passed even if rehydration returned a dummy closure. - New tests per adapter verify the SSRF guard rejects untrusted hosts and the allowlist accepts the intended Slack/Teams/GCP hosts. - New regression test in test_chat_faithful.py drives a Message- instance dequeue through the chat queue and asserts rehydrate_attachment still fires. Slack adapter connection pooling (deferred) - _fetch_slack_file still uses httpx.AsyncClient per call. The Slack adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient for Slack API calls), so adding one is a larger refactor left for a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously `verify_test_fidelity.py` printed "SKIPPED (file not found)" for any mapped TS test whose source didn't exist under `TS_ROOT`, then summed 0 matches + 0 missing and exited 0 with "All TS tests have Python equivalents." Combined with `continue-on-error: true` on the upstream-clone step, that made a silently-failing clone report "fidelity check passed" in CI. Now the script tracks missing-TS-file hits separately from real successes. If any mapped TS file is absent at end-of-run, the script prints a clear "upstream checkout missing — cannot verify fidelity" message naming every missing path, includes the clone command hint, and exits 1. This fires before strict/baseline/update-baseline success branches so no mode can accidentally mask it. Reproducer: TS_ROOT=/tmp/definitely-missing uv run python scripts/verify_test_fidelity.py Before: exit 0. After: exit 1 with infra-level error message. Closes self-review gap #1 on #72.
Three review-driven fixes: 1. Adversarial input sweep on _extract_slack_recipient_team_id (Medium): extend the parametrized test from 4 happy paths to 11 cases, covering the helper's fallthrough guards (team dict missing id, empty team_id/team strings, non-dict raw, non-string user.team_id, team.id non-string). Per docs/SELF_REVIEW.md principle #1. 2. Defense-in-depth GUID guard on aadObjectId before formatting it into the Graph chat ID (Nit). Bot Framework JWT verification authenticates the activity envelope but does not constrain from.aadObjectId; a malformed value containing /, ?, # could otherwise inject into the Graph URL path. New _AAD_OBJECT_ID_PATTERN reject malformed shapes; adversarial test covers 6 attack patterns. 3. End-to-end legacy-cache-shape test for #403 backwards-compat (Nit). Cached entries written before the discriminator landed lack a "type" key. The test asserts _get_graph_context loads them and _chat_id_from_context returns the raw conversation ID (channel semantics), not a misclassified Graph DM URL. 4. Wire-shape parity (Nit): the channel-context cache now omits the "type": "channel" key to match upstream TS, which only sets the discriminator on the DM branch. Functionally equivalent (absent type is treated as channel by the dispatcher), but the cross-language wire shape is now identical. All TestGraphDmConversationIdResolution tests pass (10 cases). https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Address re-review on PR #87 (Medium #1). Slack's url_verification ping arrives at app-install / event-subscription time and only expects the challenge echo — no bot token / API call required. Previously the single-workspace resolver was invoked at handle_webhook entry, BEFORE the url_verification short-circuit, so a flaky/down secret manager would block app installation with a 500. Move the JSON peek for url_verification ahead of _resolve_default_token() and short-circuit there. Mirrors upstream where getToken() is only called at per-API-call sites, never at webhook entry. Adds test_url_verification_bypasses_broken_resolver: configures a resolver that raises and asserts URL verification still returns 200 with the challenge echo. Also documents two related divergences in docs/UPSTREAM_SYNC.md non-parity table (Medium #2 + Nit from re-review): - Slack bot_token resolver invocation site: TS resolves on every API call site (cron-mode works); Python resolves once at handle_webhook entry into a ContextVar (cron callers must await current_token_async() first). - Within-request resolver caching scope: TS calls per API call; Python caches per request to keep _get_token sync. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Security Fixes
From external security researcher review of the repo before PyPI publication.
Critical
service_urlin thread IDs was user-controlled and forwarded with Bearer token. Now validated against Microsoft endpoint allowlist.!=. Now useshmac.compare_digest.High
random.choices(non-CSPRNG). Now usessecrets.token_hex.Medium
PyJWKClientwas instantiated per-request. Now cached on adapter instance.Also
create_slack_adapterimport__init__.py