Skip to content

GitHub OAuth Security Enhancement#76

Open
camcalaquian wants to merge 1 commit into
tenki/base-4from
tenki/head-4
Open

GitHub OAuth Security Enhancement#76
camcalaquian wants to merge 1 commit into
tenki/base-4from
tenki/head-4

Conversation

@camcalaquian
Copy link
Copy Markdown

@camcalaquian camcalaquian commented Apr 28, 2026

…etsentry#67876)

We're adding one more step in the GitHub integration installation
pipeline, namely GitHub OAuth2 authorize. This is transparent from the
UX perspective as the data exchange happens without user interaction.

The pipeline will now fail in these cases:
- If there is a mismatch between currently authenticated GitHub user
(derived from OAuth2 authorize step) and the user who installed the
GitHub app (https://github.com/apps/sentry-io)
- If there is a mismatch between `state` parameter supplied by user and
pipeline signature
- If GitHub could not generate correct `access_token` from the `code`
(wrong or attempt of re-use of `code`).

In all those cases, this error is shown:

![image](https://github.com/getsentry/sentry/assets/1127549/18923861-2ead-4cf5-adda-7738aef801d7)
@tenki-reviewer
Copy link
Copy Markdown

tenki-reviewer Bot commented Apr 28, 2026

Tenki Code Review - Complete

Files Reviewed: 3
Findings: 6

By Severity:

  • 🔴 High: 4
  • 🟠 Medium: 1
  • 🟡 Low: 1

This PR adds a two-step OAuth login flow to the GitHub integration setup, enabling user identity verification and improving security. However, it introduces 3 high-severity issues: unguarded dict access that can crash the pipeline, attacker-controlled installation_id parameters in pipeline state, and redundant/inconsistent database queries with status filtering gaps.

Files Reviewed (3 files)
src/sentry/integrations/github/integration.py
src/sentry/web/frontend/pipeline_advancer.py
tests/sentry/integrations/github/test_integration.py

Copy link
Copy Markdown

@tenki-reviewer tenki-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

PR #76 refactors the GitHub integration setup flow to add an intermediate OAuth step (OAuthLoginView) that authenticates the user with GitHub and verifies the OAuth user matches the original app installer (sender). The changes consolidate error handling with new helper functions (error(), get_document_origin()) and simplify pipeline forwarding logic in pipeline_advancer.py.

Key Changes

  • New OAuthLoginView class (lines 89-139): Handles OAuth flow including state validation, token exchange, and user info retrieval
  • New helper functions (lines 47-71): error() and get_document_origin() consolidate duplicate error response logic (3 instances merged into 1)
  • GitHubInstallation.dispatch updates (lines 148-241): Now extracts installation_id from both GET and pipeline state; validates authenticated user against stored installation's sender
  • Test coverage (test_integration.py): New OAuth callback assertions, user mismatch scenario test

Critical Issues

🔴 High Severity: Attacker-Controlled installation_id Parameter (sec-001, lines 397–449)

The installation_id is read from request.GET and stored in pipeline state before state validation in OAuthLoginView. An attacker can supply ?installation_id=<victim_id> in the OAuth callback to override the authenticated value. For new integrations (when Integration.DoesNotExist), the security check is skipped entirely, allowing the attacker to register an integration they don't own.

Impact: Privilege escalation; attacker can hijack GitHub integration setup for another org's installation.

🔴 High Severity: Unguarded integration.metadata['sender']['login'] Access (cq-001, sec-002, lines 501–504)

The code performs a bare dictionary key access without guards. If an existing integration lacks the 'sender' key in its metadata (possible for older records or certain creation paths), a KeyError is raised, causing an unhandled 500 error instead of the graceful error response.

Impact: Pipeline crashes; security check is silently bypassed instead of being properly enforced.

🔴 High Severity: Redundant Query with Status Filter Gap (cq-002, lines 493–498)

A first query confirms the Integration exists; a second query re-fetches it with status=ObjectStatus.ACTIVE, which silently rejects non-ACTIVE integrations (DISABLED, PENDING_DELETION) and returns a generic error instead of a descriptive response.

Impact: Inconsistent user experience and potential for logic gaps when integration state changes.

Additional Issues

🟡 Medium Severity: Broad Exception Handling (cq-003, lines 425–429)

All exceptions from OAuth token exchange are silently swallowed with no logging. Network outages and decode failures look identical to auth failures from the user's perspective, and operators have no visibility.

🟡 Medium Severity: Installation_id Preservation Across OAuth Redirect (cq-004)

The installation_id may be lost if not present in GitHub's OAuth callback URL, creating a broken setup loop if pipeline state is not properly persisted.

🔵 Low Severity: Missing Type Annotations (cq-005, lines 129–134)

New helper functions lack type annotations on request and org parameters, inconsistent with the file's style and reducing IDE support.

Recommendations

  1. Fix attacker-controlled parameters: Only accept installation_id from GET during initial redirect; use pipeline state exclusively on callback.
  2. Guard metadata access: Use .get() with fallbacks to prevent KeyError crashes.
  3. Consolidate queries: Reuse the first Integration lookup or check status explicitly on the already-fetched object.
  4. Add logging: Log OAuth token exchange failures so operators can distinguish auth errors from infrastructure issues.
  5. Add type annotations: Annotate error() and get_document_origin() for consistency.

Files Reviewed

  • src/sentry/integrations/github/integration.py — OAuth flow, user verification, error handling
  • src/sentry/web/frontend/pipeline_advancer.py — Pipeline forwarding logic (minor refactoring)
  • tests/sentry/integrations/github/test_integration.py — OAuth callback assertions, user mismatch test

Comment on lines +397 to +399
installation_id = request.GET.get("installation_id")
if installation_id:
pipeline.bind_state("installation_id", installation_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Attacker-controlled installation_id accepted after OAuth callback before state verification, enabling pipeline state manipulation (Auth & Access)

In OAuthLoginView.dispatch (lines 397-399), installation_id is read from request.GET and unconditionally stored into the pipeline state via pipeline.bind_state — even when the request is the OAuth callback (i.e., when a state parameter is also present). The state integrity check does not run until line 412. This means an attacker can craft an OAuth callback URL like ?code=<valid_code>&state=<valid_state>&installation_id=<victim_installation_id> and override any previously stored installation_id with an arbitrary value before authentication of the code completes.

In GitHubInstallation.dispatch (lines 448-449), the same pattern recurs: installation_id = request.GET.get('installation_id', pipeline.fetch_state('installation_id')) — meaning the installation ID can again be overridden by a URL parameter at the second pipeline step.

While the sender-login check at lines 501-505 partially mitigates this for existing integrations (it will reject a mismatch between the OAuth'd user and the integration's stored sender), for the Integration.DoesNotExist branch (line 481-482, where the code calls pipeline.next_step() with no sender check at all), an attacker who authenticates with GitHub OAuth and injects any unregistered installation_id value would pass straight through to build_integration, which then calls the GitHub API for that injected installation ID and may create an integration the attacker doesn't own.

💡 Suggestion: Only read installation_id from request.GET before the OAuth redirect (when no state is present). On the callback (when state is present), use only the value already bound in pipeline state. Similarly, in GitHubInstallation.dispatch, do not accept installation_id from request.GET — always use the value from pipeline state.

📋 Prompt for AI Agents

In src/sentry/integrations/github/integration.py:

  1. In OAuthLoginView.dispatch (lines 397-399), move the installation_id GET parameter read inside the if not request.GET.get('state'): block (i.e., only store installation_id from GET when this is the initial redirect, not on the OAuth callback). On the callback branch (when state is present), do NOT accept or re-bind installation_id from the request.

  2. In GitHubInstallation.dispatch (lines 448-450), change the installation_id resolution to ONLY use the pipeline state value, removing the request.GET.get('installation_id', ...) override: installation_id = pipeline.fetch_state('installation_id'). This ensures that the installation ID authenticated in OAuthLoginView cannot be replaced by an attacker-supplied URL parameter in subsequent pipeline steps.

Comment on lines +501 to +504
if (
pipeline.fetch_state("github_authenticated_user")
!= integration.metadata["sender"]["login"]
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unguarded dict access on integration.metadata may raise KeyError (bug)

In GitHubInstallation.dispatch, after looking up the existing integration, the code accesses integration.metadata["sender"]["login"] directly (lines 501-503). If the metadata dict is missing the "sender" key (possible for integrations created before this field was stored, or if GitHub changes its payload), a KeyError will propagate as an unhandled 500 error instead of returning the graceful error response. The try/except block on lines 493-498 only catches Integration.DoesNotExist — not dict access failures.

💡 Suggestion: Use .get() with a fallback (or a nested get) to safely access the nested keys, and treat a missing value the same as a mismatch.

Suggested change
if (
pipeline.fetch_state("github_authenticated_user")
!= integration.metadata["sender"]["login"]
):
if (
pipeline.fetch_state("github_authenticated_user")
!= integration.metadata.get("sender", {}).get("login")
):
📋 Prompt for AI Agents

In src/sentry/integrations/github/integration.py around lines 501-504, the expression integration.metadata["sender"]["login"] will raise a KeyError if the 'sender' or 'login' key is absent from an existing integration's metadata (e.g., for older records). Replace the direct dict access with safe .get() calls: change integration.metadata["sender"]["login"] to integration.metadata.get("sender", {}).get("login"). This way, a missing key is treated the same as a mismatch, returning the error page rather than a 500.

Comment on lines +493 to +498
try:
integration = Integration.objects.get(
external_id=installation_id, status=ObjectStatus.ACTIVE
)
except Integration.DoesNotExist:
return error(request, self.active_organization)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Second Integration.objects.get with status filter can silently fail after first query succeeded (bug)

In GitHubInstallation.dispatch, the first lookup (line 477-479) already confirms the integration exists. The second query on lines 493-496 re-fetches using status=ObjectStatus.ACTIVE, which means an integration with any other status (e.g., PENDING_DELETION at the integration level, DISABLED) will raise DoesNotExist and return the generic 'Invalid installation request' error. The comment says 'OrganizationIntegration does not exist, but Integration does exist', implying the next step should be taken — but non-ACTIVE integrations will hit the error branch, silently swallowing the real reason.

💡 Suggestion: Reuse the integration object already retrieved in the first query — store it in a variable at line 477-479 to avoid the second query and the status-filtering gap. If the intent is to block non-ACTIVE integrations, document this explicitly and consider returning a more specific error.

📋 Prompt for AI Agents

In src/sentry/integrations/github/integration.py around lines 475-498, the first query (line 477-479) with Integration.objects.get(external_id=installation_id) already confirms the integration exists — store its result in a variable (e.g., integration = Integration.objects.get(...)). Then remove the second Integration.objects.get(external_id=installation_id, status=ObjectStatus.ACTIVE) query (lines 493-496) and instead use the already-fetched integration object. If filtering by ACTIVE status is intentional, check integration.status == ObjectStatus.ACTIVE on the already-fetched object and return a more descriptive error if it is not active. This eliminates the redundant DB query and the inconsistency where an existing-but-not-ACTIVE integration yields a generic 'Invalid installation request' error.

Comment on lines +501 to +505
if (
pipeline.fetch_state("github_authenticated_user")
!= integration.metadata["sender"]["login"]
):
return error(request, self.active_organization)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unguarded integration.metadata['sender'] key access causes KeyError crash, bypassing the user-identity check (Auth & Access)

At GitHubInstallation.dispatch line 503, the code executes integration.metadata['sender']['login'] to verify that the OAuth-authenticated user matches the original installer. However, build_integration (line 376-377) only conditionally sets sender in the metadata dict:

if state.get('sender'):
    integration['metadata']['sender'] = state['sender']

Integrations created via the webhook (InstallationEventWebhook.__call__, webhook.py line 202-210) do include sender. However, if an integration was created without a sender (e.g., via an older code path, or if the webhook payload lacked a sender), then integration.metadata will not have the 'sender' key and line 503 will throw a KeyError.

This KeyError is unhandled, causing an unhandled exception (500 error) instead of the intended error(...) response. Depending on how the Django error handler is configured, this may leak internal details, and more importantly the security check is silently bypassed in this scenario — neither blocking nor allowing the install in a controlled way.

💡 Suggestion: Wrap the integration.metadata['sender']['login'] access with .get() calls and treat a missing sender as a check failure (return error). This ensures that if sender is absent from metadata, the flow is blocked safely.

📋 Prompt for AI Agents

In src/sentry/integrations/github/integration.py, at lines 500-505, replace the bare dictionary access integration.metadata['sender']['login'] with safe .get() calls to prevent a KeyError crash when sender is absent from integration metadata. Specifically, change the condition to use integration.metadata.get('sender', {}).get('login') so that a missing sender key evaluates to None, which will not match any authenticated user login and will correctly trigger return error(request, self.active_organization) instead of raising an unhandled exception.

Comment on lines +425 to +429
try:
body = safe_urlread(req).decode("utf-8")
payload = dict(parse_qsl(body))
except Exception:
payload = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Broad except Exception silently swallows token-exchange errors without logging (bug)

In OAuthLoginView.dispatch, lines 425-429 catch all exceptions from safe_urlread, decode, and parse_qsl with except Exception: payload = {}. This means genuine network errors, timeouts, or decode failures are silently discarded. No logging (logger.exception/logger.warning) is performed, so operators have no visibility when the GitHub OAuth token exchange fails for non-auth reasons (e.g., GitHub API outage). The user sees the same generic error as for a real auth mismatch.

💡 Suggestion: Log the exception before falling through to the empty-payload path, so failures are observable in Sentry/logs. Ideally also distinguish between network errors and parse errors.

📋 Prompt for AI Agents

In src/sentry/integrations/github/integration.py around lines 425-429, the except Exception: payload = {} block silently discards all errors from the GitHub token exchange. Add a logger.exception("Failed to exchange GitHub OAuth token") (or logger.warning) call inside the except block before setting payload = {}. Ensure logger is imported/defined at the module level (it likely already is in this file). This allows operators to distinguish between a legitimate auth failure and a transient infrastructure error.

Comment on lines +129 to +134
def error(
request,
org,
error_short="Invalid installation request.",
error_long=ERR_INTEGRATION_INVALID_INSTALLATION_REQUEST,
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing type annotations on new module-level helper functions (style)

The error(request, org, ...) function (lines 129-134) and get_document_origin(org) (lines 149-152) do not have type annotations for request or org. The rest of the module uses Request and typed parameters consistently. Without annotations, type checkers and IDEs cannot validate callers or the return type of error().

💡 Suggestion: Add type annotations: request: HttpRequest (or Request from DRF), org: RpcUserOrganizationContext | None, and -> HttpResponse on error(), and org: RpcUserOrganizationContext | None on get_document_origin().

📋 Prompt for AI Agents

In src/sentry/integrations/github/integration.py at lines 129-134 and 149-152, add type annotations to the error and get_document_origin module-level functions. For error: annotate request as HttpRequest or Request (whichever is already imported), org as the appropriate organization context type (likely RpcUserOrganizationContext | None based on self.active_organization usage), and add -> HttpResponse return type. For get_document_origin: annotate org with the same type and confirm -> str is already present.

@camcalaquian camcalaquian marked this pull request as draft April 30, 2026 16:11
@camcalaquian camcalaquian marked this pull request as ready for review April 30, 2026 16:11
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +501 to +505
if (
pipeline.fetch_state("github_authenticated_user")
!= integration.metadata["sender"]["login"]
):
return error(request, self.active_organization)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 KeyError on integration.metadata["sender"] when sender is missing from metadata

At line 503, integration.metadata["sender"]["login"] is accessed without verifying that "sender" exists in the metadata. After a successful first installation, ensure_integration (src/sentry/integrations/pipeline.py:32-44) is called from finish_pipeline with the result of build_integration (src/sentry/integrations/github/integration.py:351-379), which only includes sender if state.get("sender") is truthy. Since the pipeline state never contains sender (only installation_id and github_authenticated_user), ensure_integration overwrites the Integration's metadata and strips the sender key that was originally set by the webhook handler (src/sentry/integrations/github/webhook.py:204-206). The test at test_integration.py:303 confirms the metadata after installation has no sender. If the OrganizationIntegration is later deleted (but the Integration persists), a subsequent installation attempt reaches line 503 and raises an unhandled KeyError, resulting in a 500 error.

Comparable safe access in installation.py

The existing GitHubIntegrationsInstallationEndpoint at src/sentry/integrations/github/installation.py:44 properly guards against this with if "sender" not in integration.metadata: return HttpResponse(status=404), but the new code does not.

Suggested change
if (
pipeline.fetch_state("github_authenticated_user")
!= integration.metadata["sender"]["login"]
):
return error(request, self.active_organization)
if "sender" not in integration.metadata or (
pipeline.fetch_state("github_authenticated_user")
!= integration.metadata["sender"]["login"]
):
return error(request, self.active_organization)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants