Skip to content

security: Fix all critical and high findings from security audit#1

Merged
patrick-chinchill merged 4 commits intomainfrom
security-fixes
Apr 6, 2026
Merged

security: Fix all critical and high findings from security audit#1
patrick-chinchill merged 4 commits intomainfrom
security-fixes

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

Security Fixes

From external security researcher review of the repo before PyPI publication.

Critical

  • Teams JWT verification — incoming webhooks were unauthenticated. Now validates Bot Framework JWT tokens (signature + audience).
  • Teams SSRFservice_url in thread IDs was user-controlled and forwarded with Bearer token. Now validated against Microsoft endpoint allowlist.
  • Discord timing attack — gateway token compared with !=. Now uses hmac.compare_digest.

High

  • Lock token predictability — Redis/Memory used random.choices (non-CSPRNG). Now uses secrets.token_hex.
  • WhatsApp SSRF — media download followed API-returned URL with auth header. Now validates domain + strips auth from CDN request.
  • CI supply chain — GitHub Actions pinned to commit SHAs (was using mutable tags).

Medium

  • GChat JWKS cachingPyJWKClient was instantiated per-request. Now cached on adapter instance.

Also

  • README quick start: added missing create_slack_adapter import
  • Error types exported from top-level __init__.py

Critical:
- Teams: Add JWT verification for incoming webhooks (Bot Framework spec)
- Teams: Validate service_url against Microsoft endpoint allowlist (SSRF fix)
- Discord: Use hmac.compare_digest for gateway token (timing-safe)

High:
- Redis/Memory: Use secrets.token_hex for lock tokens (CSPRNG)
- WhatsApp: Validate media download URLs + strip auth header from CDN
- CI: Pin all GitHub Actions to commit SHAs

Medium:
- Google Chat: Cache JWKS client instance (was creating per-request)

Also:
- Fix README quick start (add missing imports)
- Export error types from top-level __init__.py
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several security and performance enhancements across multiple chat adapters. Key improvements include the implementation of constant-time token comparisons for Discord, SSRF protection for Teams and WhatsApp by validating service and media download URLs, and the transition to cryptographically secure token generation using the secrets module for memory and Redis state adapters. Additionally, JWT verification was added for the Teams adapter, and JWKS client caching was implemented for both Google Chat and Teams to reduce per-request overhead. Feedback was provided regarding the Teams JWT verification process, specifically suggesting an explicit check of the HTTP response status when fetching the OpenID configuration to ensure more robust error handling.

Comment on lines +1673 to +1674
async with session.get(BOT_FRAMEWORK_OPENID_CONFIG_URL) as resp:
openid_config = await resp.json()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved robustness and clearer error logging, it's a good practice to explicitly check if the HTTP request to fetch the OpenID configuration was successful before attempting to parse the JSON response. This will provide more specific logs if the endpoint is unavailable or returns an error.

Suggested change
async with session.get(BOT_FRAMEWORK_OPENID_CONFIG_URL) as resp:
openid_config = await resp.json()
async with session.get(BOT_FRAMEWORK_OPENID_CONFIG_URL) as resp:
if not resp.ok:
self._logger.error(
"Failed to fetch Bot Framework OpenID config",
{"status": resp.status, "url": BOT_FRAMEWORK_OPENID_CONFIG_URL},
)
return self._make_response("Unauthorized", 401)
openid_config = await resp.json()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in the existing code -- the resp.status != 200 check was present at line 1700 before this review comment was posted. No changes needed.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f37fa43fba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/chat_sdk/adapters/whatsapp/adapter.py Outdated
Comment thread src/chat_sdk/adapters/teams/adapter.py Outdated
- WhatsApp: expand media URL allowlist to include *.fbsbx.com, *.whatsapp.net,
  *.whatsapp.com (real Meta CDN domains used by Cloud API)
- Teams: expand service URL allowlist to include sovereign cloud endpoints
  (GCC, GCCH, DoD: *.teams.microsoft.com, *.teams.microsoft.us, smba.infra.*)
- Teams: check OpenID config HTTP response status before parsing JSON

Note: The upstream TS SDK does not validate service_url or media URLs at all
(Teams delegates auth to @microsoft/teams.apps SDK, WhatsApp follows any URL).
Our validation is additive security hardening not present in the original.
@patrick-chinchill patrick-chinchill merged commit 5b99696 into main Apr 6, 2026
0 of 3 checks passed
patrick-chinchill added a commit that referenced this pull request Apr 7, 2026
- Fix Slack rate limit detection: _handle_slack_error now correctly
  handles SlackResponse objects (not just dicts) by checking resp.data,
  and extracts the Retry-After header for rate limit errors
- Fix Teams JWT verification blocking: wrap synchronous
  PyJWKClient.get_signing_key_from_jwt in asyncio.to_thread to avoid
  blocking the event loop during JWKS fetch
- Make Slack client cache max configurable: add client_cache_max field
  to SlackAdapterConfig (defaults to 100)
- Fix unsafe session cleanup on LRU eviction: remove create_task session
  close that could break concurrent requests using evicted clients; rely
  on garbage collection instead
- Add tests for SlackResponse handling, rate limiting, and configurable
  cache max

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 7, 2026
fix: Address unresolved PR review comments from #1, #7, #8
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
- process_options_load: switch to `is not None` check so handlers
  returning [] short-circuit correctly (Port Rule #1 truthiness trap)
- Slack adapter: replace `or ""` fallbacks in OptionsLoadEvent
  construction with explicit None checks
- test helper: URL-encode payload JSON in signed-request body
- Narrow OptionsLoadHandler return type and typed Protocol signature
  for process_options_load (replace Any with OptionsLoadEvent)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
- chat.py:2107 — type the rehydrate-attachment callable so the list
  comprehension narrows to list[Attachment]. Unblocks CI.
- _coerce_attachments: replace `or` fallbacks with `is not None`
  (Port Rule #1 truthiness trap)
- google_chat rehydrate_attachment: preserve resolved URL when
  reconstructing, drop truthiness fallback on meta["url"]
- Harden telegram and whatsapp rehydrate tests to execute the async
  callback and verify download-method wiring (AsyncMock).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
…ness fallbacks

Second review pass on PR #67 (rehydrate_attachment). The previous fixup
addressed pyrefly only — this commit resolves the remaining review
feedback.

SSRF guards (3 adapters)
- Slack, Teams, Google Chat all rebuild fetch_data closures from
  serialized fetch_metadata["url"] in rehydrate_attachment. A tampered
  URL in persisted queue state could exfiltrate the workspace
  bot/OAuth token to an attacker-controlled host. Each adapter now
  validates the URL's scheme (https only) and host against a
  platform-specific allowlist before forwarding the auth header.
  Upstream TS does not validate; this is a Python-first divergence
  documented in docs/UPSTREAM_SYNC.md.
    - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com
    - Teams: Microsoft-owned hosts (graph.microsoft.com,
      smba.trafficmanager.net, *.sharepoint.com, *.botframework.com,
      *.office.com, attachments.office.net, …)
    - Google Chat: chat.googleapis.com, *.googleapis.com,
      *.googleusercontent.com, *.google.com

Message-instance rehydration (P1)
- Chat._rehydrate_message used to early-return on Message inputs,
  matching upstream TS's `raw instanceof Message` shortcut. That
  shortcut is safe in upstream because its state adapters return raw
  JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter
  both upgrade the dequeued dict to `Message.from_json(...)` before
  returning, so the early return would skip rehydrate_attachment for
  every persistent-backend dequeue and leave fetch_data stripped. We
  now fall through and apply the rehydrate pass on Message inputs too
  (already-hydrated attachments with fetch_data are filtered out).

Truthiness fallbacks (Port Rule #1)
- telegram, whatsapp rehydrate_attachment and types.py dual-key
  fetch_metadata lookup now use explicit `is not None` instead of
  `or`, so an empty-dict fetch_metadata is preserved.

Teams connection pooling
- _build_teams_fetch_data used httpx.AsyncClient as a throwaway
  context manager per download. Refactored to use the shared
  aiohttp session (_get_http_session) that the rest of the adapter
  already goes through.

Test hardening
- test_slack_webhook.py and test_teams_adapter.py now stub the fetch
  path with AsyncMock, await rehydrated.fetch_data(), and assert the
  URL + token that were forwarded. Previously the tests only checked
  that `fetch_data is not None` — they would have passed even if
  rehydration returned a dummy closure.
- New tests per adapter verify the SSRF guard rejects untrusted hosts
  and the allowlist accepts the intended Slack/Teams/GCP hosts.
- New regression test in test_chat_faithful.py drives a Message-
  instance dequeue through the chat queue and asserts
  rehydrate_attachment still fires.

Slack adapter connection pooling (deferred)
- _fetch_slack_file still uses httpx.AsyncClient per call. The Slack
  adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient
  for Slack API calls), so adding one is a larger refactor left for
  a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
- StreamingPlanOptions mapping: replace truthy checks with `is not None`
  so update_interval_ms=0 and end_with=[] propagate correctly (thread.py
  post dispatcher and _fallback_stream interval guard). Per CLAUDE.md
  Port Rule #1; diverges from upstream thread.ts, which has the same
  latent bug.
- StreamingPlan: raise RuntimeError from is_supported() and
  get_fallback_text() so misroutes through post_postable_object /
  Channel.post fail loudly instead of posting "" or attempting a
  wrong-shape adapter.post_object("stream", ...). Also guard
  post_postable_object() with an early kind=="stream" check for a
  clearer error message. Diverges from upstream postable-object.ts,
  which silently posts the empty fallback string.
- Fallback test: spy on _fallback_stream to capture the StreamOptions
  actually reaching the fallback path and assert task_display_mode,
  stop_blocks, and update_interval_ms all propagate -- previous test
  passed even when options were silently dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
Round-2 fixup over-applied Port Rule #1. Upstream JS uses || chain
intentionally — empty-string username falls through to name/user_id.
The is-not-None chain preserved "" and diverged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
- process_options_load: switch to `is not None` check so handlers
  returning [] short-circuit correctly (Port Rule #1 truthiness trap)
- Slack adapter: replace `or ""` fallbacks in OptionsLoadEvent
  construction with explicit None checks
- test helper: URL-encode payload JSON in signed-request body
- Narrow OptionsLoadHandler return type and typed Protocol signature
  for process_options_load (replace Any with OptionsLoadEvent)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
Round-2 fixup over-applied Port Rule #1. Upstream JS uses || chain
intentionally — empty-string username falls through to name/user_id.
The is-not-None chain preserved "" and diverged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
- StreamingPlanOptions mapping: replace truthy checks with `is not None`
  so update_interval_ms=0 and end_with=[] propagate correctly (thread.py
  post dispatcher and _fallback_stream interval guard). Per CLAUDE.md
  Port Rule #1; diverges from upstream thread.ts, which has the same
  latent bug.
- StreamingPlan: raise RuntimeError from is_supported() and
  get_fallback_text() so misroutes through post_postable_object /
  Channel.post fail loudly instead of posting "" or attempting a
  wrong-shape adapter.post_object("stream", ...). Also guard
  post_postable_object() with an early kind=="stream" check for a
  clearer error message. Diverges from upstream postable-object.ts,
  which silently posts the empty fallback string.
- Fallback test: spy on _fallback_stream to capture the StreamOptions
  actually reaching the fallback path and assert task_display_mode,
  stop_blocks, and update_interval_ms all propagate -- previous test
  passed even when options were silently dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
- chat.py:2107 — type the rehydrate-attachment callable so the list
  comprehension narrows to list[Attachment]. Unblocks CI.
- _coerce_attachments: replace `or` fallbacks with `is not None`
  (Port Rule #1 truthiness trap)
- google_chat rehydrate_attachment: preserve resolved URL when
  reconstructing, drop truthiness fallback on meta["url"]
- Harden telegram and whatsapp rehydrate tests to execute the async
  callback and verify download-method wiring (AsyncMock).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
…ness fallbacks

Second review pass on PR #67 (rehydrate_attachment). The previous fixup
addressed pyrefly only — this commit resolves the remaining review
feedback.

SSRF guards (3 adapters)
- Slack, Teams, Google Chat all rebuild fetch_data closures from
  serialized fetch_metadata["url"] in rehydrate_attachment. A tampered
  URL in persisted queue state could exfiltrate the workspace
  bot/OAuth token to an attacker-controlled host. Each adapter now
  validates the URL's scheme (https only) and host against a
  platform-specific allowlist before forwarding the auth header.
  Upstream TS does not validate; this is a Python-first divergence
  documented in docs/UPSTREAM_SYNC.md.
    - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com
    - Teams: Microsoft-owned hosts (graph.microsoft.com,
      smba.trafficmanager.net, *.sharepoint.com, *.botframework.com,
      *.office.com, attachments.office.net, …)
    - Google Chat: chat.googleapis.com, *.googleapis.com,
      *.googleusercontent.com, *.google.com

Message-instance rehydration (P1)
- Chat._rehydrate_message used to early-return on Message inputs,
  matching upstream TS's `raw instanceof Message` shortcut. That
  shortcut is safe in upstream because its state adapters return raw
  JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter
  both upgrade the dequeued dict to `Message.from_json(...)` before
  returning, so the early return would skip rehydrate_attachment for
  every persistent-backend dequeue and leave fetch_data stripped. We
  now fall through and apply the rehydrate pass on Message inputs too
  (already-hydrated attachments with fetch_data are filtered out).

Truthiness fallbacks (Port Rule #1)
- telegram, whatsapp rehydrate_attachment and types.py dual-key
  fetch_metadata lookup now use explicit `is not None` instead of
  `or`, so an empty-dict fetch_metadata is preserved.

Teams connection pooling
- _build_teams_fetch_data used httpx.AsyncClient as a throwaway
  context manager per download. Refactored to use the shared
  aiohttp session (_get_http_session) that the rest of the adapter
  already goes through.

Test hardening
- test_slack_webhook.py and test_teams_adapter.py now stub the fetch
  path with AsyncMock, await rehydrated.fetch_data(), and assert the
  URL + token that were forwarded. Previously the tests only checked
  that `fetch_data is not None` — they would have passed even if
  rehydration returned a dummy closure.
- New tests per adapter verify the SSRF guard rejects untrusted hosts
  and the allowlist accepts the intended Slack/Teams/GCP hosts.
- New regression test in test_chat_faithful.py drives a Message-
  instance dequeue through the chat queue and asserts
  rehydrate_attachment still fires.

Slack adapter connection pooling (deferred)
- _fetch_slack_file still uses httpx.AsyncClient per call. The Slack
  adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient
  for Slack API calls), so adding one is a larger refactor left for
  a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
Previously `verify_test_fidelity.py` printed "SKIPPED (file not found)"
for any mapped TS test whose source didn't exist under `TS_ROOT`, then
summed 0 matches + 0 missing and exited 0 with "All TS tests have
Python equivalents." Combined with `continue-on-error: true` on the
upstream-clone step, that made a silently-failing clone report
"fidelity check passed" in CI.

Now the script tracks missing-TS-file hits separately from real
successes. If any mapped TS file is absent at end-of-run, the script
prints a clear "upstream checkout missing — cannot verify fidelity"
message naming every missing path, includes the clone command hint,
and exits 1. This fires before strict/baseline/update-baseline
success branches so no mode can accidentally mask it.

Reproducer:
  TS_ROOT=/tmp/definitely-missing uv run python scripts/verify_test_fidelity.py
Before: exit 0.  After: exit 1 with infra-level error message.

Closes self-review gap #1 on #72.
patrick-chinchill pushed a commit that referenced this pull request May 9, 2026
Three review-driven fixes:

1. Adversarial input sweep on _extract_slack_recipient_team_id (Medium):
   extend the parametrized test from 4 happy paths to 11 cases, covering
   the helper's fallthrough guards (team dict missing id, empty
   team_id/team strings, non-dict raw, non-string user.team_id, team.id
   non-string). Per docs/SELF_REVIEW.md principle #1.

2. Defense-in-depth GUID guard on aadObjectId before formatting it into
   the Graph chat ID (Nit). Bot Framework JWT verification authenticates
   the activity envelope but does not constrain from.aadObjectId; a
   malformed value containing /, ?, # could otherwise inject into the
   Graph URL path. New _AAD_OBJECT_ID_PATTERN reject malformed shapes;
   adversarial test covers 6 attack patterns.

3. End-to-end legacy-cache-shape test for #403 backwards-compat (Nit).
   Cached entries written before the discriminator landed lack a "type"
   key. The test asserts _get_graph_context loads them and
   _chat_id_from_context returns the raw conversation ID (channel
   semantics), not a misclassified Graph DM URL.

4. Wire-shape parity (Nit): the channel-context cache now omits the
   "type": "channel" key to match upstream TS, which only sets the
   discriminator on the DM branch. Functionally equivalent (absent type
   is treated as channel by the dispatcher), but the cross-language wire
   shape is now identical.

All TestGraphDmConversationIdResolution tests pass (10 cases).

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill pushed a commit that referenced this pull request May 9, 2026
Address re-review on PR #87 (Medium #1). Slack's url_verification ping
arrives at app-install / event-subscription time and only expects the
challenge echo — no bot token / API call required. Previously the
single-workspace resolver was invoked at handle_webhook entry, BEFORE
the url_verification short-circuit, so a flaky/down secret manager
would block app installation with a 500. Move the JSON peek for
url_verification ahead of _resolve_default_token() and short-circuit
there. Mirrors upstream where getToken() is only called at per-API-call
sites, never at webhook entry.

Adds test_url_verification_bypasses_broken_resolver: configures a
resolver that raises and asserts URL verification still returns 200 with
the challenge echo.

Also documents two related divergences in docs/UPSTREAM_SYNC.md
non-parity table (Medium #2 + Nit from re-review):
- Slack bot_token resolver invocation site: TS resolves on every API
  call site (cron-mode works); Python resolves once at handle_webhook
  entry into a ContextVar (cron callers must await
  current_token_async() first).
- Within-request resolver caching scope: TS calls per API call; Python
  caches per request to keep _get_token sync.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant