Skip to content

fix: multiple outbound http requests in superset/db_... in base.py#39646

Open
orbisai0security wants to merge 3 commits into
apache:masterfrom
orbisai0security:fix-ssrf-oauth2-token-uri-validation
Open

fix: multiple outbound http requests in superset/db_... in base.py#39646
orbisai0security wants to merge 3 commits into
apache:masterfrom
orbisai0security:fix-ssrf-oauth2-token-uri-validation

Conversation

@orbisai0security
Copy link
Copy Markdown

Summary

Fix high severity security issue in superset/db_engine_specs/base.py.

Vulnerability

Field Value
ID V-003
Severity HIGH
Scanner multi_agent_ai
Rule V-003
File superset/db_engine_specs/base.py:806

Description: Multiple outbound HTTP requests in superset/db_engine_specs/base.py (lines 806, 808, 831, 833) and superset/db_engine_specs/impala.py (line 213) use URLs that may be derived from user-supplied database connection parameters. No URL validation, IP allowlist, or private network blocking is evident. An authenticated user with database connection management permissions can supply internal network addresses as the database host, causing Superset to make HTTP requests to internal services on behalf of the attacker.

Changes

  • superset/db_engine_specs/base.py

Verification

  • Build passes
  • Scanner re-scan confirms fix
  • LLM code review passed

Automated security fix by OrbisAI Security

Automated security fix generated by Orbis Security AI
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Apr 25, 2026

Code Review Agent Run #99179e

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: c1f06b1..c1f06b1
    • superset/db_engine_specs/base.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added the change:backend Requires changing the backend label Apr 25, 2026
Comment thread superset/db_engine_specs/base.py Outdated
)
hostname = parsed.hostname or ""
try:
ip = ipaddress.ip_address(_socket.gethostbyname(hostname))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The validation checks only one DNS result via gethostbyname, but hostnames can resolve to multiple IPs and the actual HTTP connection may use a different address than the one validated. This creates a DNS-based SSRF bypass where a hostname with mixed public/private records can pass validation yet still connect to an internal address. [security]

Severity Level: Critical 🚨
- ⚠️ SSRF possible via mixed public/private DNS records.
- ⚠️ Database OAuth2 token exchange may reach internal services.
Steps of Reproduction ✅
1. Configure a database OAuth2 client with a token endpoint whose hostname has mixed DNS
records by setting `encrypted_extra``oauth2_client_info.token_request_uri` to
`https://mixed.example.com/token` in `Database.get_oauth2_config`
(superset/models/core.py:26–39), where `mixed.example.com` is controlled by an attacker
and resolves to both a public IP (e.g. 198.51.100.10) and a private IP (e.g. 10.0.0.5).

2. Trigger the OAuth2 code exchange by completing the OAuth2 authorization flow for that
database so the provider redirects back to `DatabaseRestApi.oauth2`
(superset/databases/api.py:1448–1487), which constructs and runs `OAuth2StoreTokenCommand`
(superset/commands/database/oauth2.py:36–48).

3. `OAuth2StoreTokenCommand.run` calls
`self._database.db_engine_spec.get_oauth2_token(oauth2_config, code, code_verifier)` at
line 74, which in `BaseEngineSpec.get_oauth2_token`
(superset/db_engine_specs/base.py:60–73) sets `uri = config["token_request_uri"]` and then
calls `cls._validate_oauth2_token_uri(uri)` before issuing `requests.post(uri, ...)`.

4. In `_validate_oauth2_token_uri` (superset/db_engine_specs/base.py:56–75), `hostname =
parsed.hostname` and `ip = ipaddress.ip_address(_socket.gethostbyname(hostname))` (line
828) validate only a single resolved IPv4 address. If the resolver returns the public IP
first, `ip.is_private` is False and validation passes; however, when `requests.post` opens
the connection, the underlying `socket.getaddrinfo` may choose or fall back to the private
IP (e.g. 10.0.0.5), allowing the actual HTTP request to be sent to a private/internal
address even though only one (public) DNS result was checked, demonstrating a DNS-based
SSRF bypass via multi-record hosts.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/db_engine_specs/base.py
**Line:** 828:828
**Comment:**
	*Security: The validation checks only one DNS result via `gethostbyname`, but hostnames can resolve to multiple IPs and the actual HTTP connection may use a different address than the one validated. This creates a DNS-based SSRF bypass where a hostname with mixed public/private records can pass validation yet still connect to an internal address.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbisai0security can you address code review comments?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — this review comment is valid.

gethostbyname() only checks a single resolved IPv4 address, so a hostname with multiple DNS records can still slip through if requests.post() connects to a different IP. That means the current validation does not fully prevent DNS-based SSRF.

Recommended fix

Validate all resolved addresses for the hostname and reject the URI if any of them are private / loopback / link-local / reserved.

A concise improvement would be to switch from gethostbyname() to getaddrinfo() and validate every returned IP:

import ipaddress
import socket as _socket
from urllib.parse import urlparse

parsed = urlparse(uri)
hostname = parsed.hostname or ""

addresses = {
    info[4][0]
    for info in _socket.getaddrinfo(hostname, None)
}

for addr in addresses:
    ip = ipaddress.ip_address(addr)
    if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
        raise ValueError(
            "OAuth2 token URI must not point to a private or reserved network address"
        )

Why this is better

  • Covers multiple DNS results
  • Prevents the “one public IP passes, one private IP is used” bypass
  • Keeps the change small and localized

So yes, I’d recommend updating this validation before merging. If you want, I can also review the other comments on this PR and help apply the remaining minimal fixes too.

@bito-code-review
Copy link
Copy Markdown
Contributor

The flagged security issue is correct: the validation uses gethostbyname which checks only one IP address, but DNS can return multiple IPs, allowing SSRF bypass if a hostname has both public and private records. To resolve, modify the validation to check all resolved IP addresses using getaddrinfo for comprehensive coverage. Here's a concise fix for the _validate_oauth2_token_uri method in superset/db_engine_specs/base.py. Since there are no other comments in the PR, no further fixes are needed.

superset/db_engine_specs/base.py

@classmethod
    def _validate_oauth2_token_uri(cls, uri: str) -> None:
        """Validate OAuth2 token URI to prevent SSRF attacks."""
        import ipaddress
        import socket as _socket
        from urllib.parse import urlparse

        parsed = urlparse(uri)
        if parsed.scheme not in ("https", "http"):
            raise ValueError(
                f"OAuth2 token URI must use http or https scheme, got: {parsed.scheme!r}"
            )
        hostname = parsed.hostname or ""
        try:
            addrinfo = _socket.getaddrinfo(hostname, None)
            for family, type_, proto, canonname, sockaddr in addrinfo:
                ip_str = sockaddr[0]
                ip = ipaddress.ip_address(ip_str)
                if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
                    raise ValueError(
                        "OAuth2 token URI must not point to a private or reserved network address"
                    )
        except _socket.gaierror:
            pass

try:
ip = ipaddress.ip_address(_socket.gethostbyname(hostname))
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise ValueError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impedes devs from doing OAuth2 db connection works. I wholly disagree with this validation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I agree the current mitigation is incomplete as written: gethostbyname() only validates a single IPv4 result, while the latter requests.post() may resolve/connect differently. I’ll update this to validate all resolved addresses via getaddrinfo() and reject if any address is loopback/private/link-local/reserved/etc.

That said, I also agree that a hard global private-network block may break legitimate internal OAuth2 providers. I’ll look at making this policy configurable rather than unconditional, with secure defaults.

…Auth2 token URI

Address PR review comments: replace gethostbyname() with getaddrinfo() to
validate all resolved addresses, and make the validation configurable via
DATABASE_OAUTH2_TOKEN_URI_SSRF_VALIDATION and
DATABASE_OAUTH2_TOKEN_URI_ALLOWED_HOSTS for deployments with internal
OAuth2 providers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pull-request-size pull-request-size Bot added size/M and removed size/S labels May 9, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented May 9, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit dd59dc5
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69ff07a9051aca000816857e
😎 Deploy Preview https://deploy-preview-39646--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 9, 2026

Code Review Agent Run #cb548f

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: c1f06b1..dd59dc5
    • superset/config.py
    • superset/db_engine_specs/base.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@rusackas
Copy link
Copy Markdown
Member

Chiming in to echo some findings/status on this one.

  1. The core concern raised by @hainenber is valid. Blocking private network addresses unconditionally breaks any Superset deployment where the OAuth2 provider is internal (e.g., enterprise SSO, self-hosted Keycloak, etc.). This is a real usability regression for many legitimate deployments. The ALLOWED_HOSTS escape hatch partially mitigates this, but operators would need to know to configure it.
  2. DNS TOCTOU / rebinding risk. The bot reviewers correctly identified that even the second commit's use of getaddrinfo() doesn't fully close the race: the validation resolves DNS at configuration time, but requests.post() does its own resolution later. A DNS rebinding attack can still slip through. This is a hard problem to solve purely at the application layer without using a hardened HTTP client (like ssrf-guard, or a network-level control).
  3. No tests. There are no unit tests covering the validation logic or the new config flags.
  4. The fix scope is narrow. The PR title mentions multiple outbound HTTP requests in base.py and impala.py, but only base.py is modified. The Impala path remains unaddressed.

Let us know if/how you want to carry this forward from here, and if you think there's a real security risk here, please follow the security submission process on the GitHub Issue template rather than publicly surfacing anything in an Issue or publishing exploits in a PR description. Just email security@superset.apache.org, basically.

Maybe as a half-measure, we can set DATABASE_OAUTH2_TOKEN_URI_SSRF_VALIDATION to False for now (opt-in rather than opt-out) until the UX impact is better understood, or require a more explicit allowlist model.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 26.92308% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.26%. Comparing base (eb2645a) to head (dd59dc5).
⚠️ Report is 230 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/base.py 20.83% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39646      +/-   ##
==========================================
- Coverage   64.47%   63.26%   -1.21%     
==========================================
  Files        2566     2584      +18     
  Lines      133910   136615    +2705     
  Branches    31090    31459     +369     
==========================================
+ Hits        86333    86425      +92     
- Misses      46082    48674    +2592     
- Partials     1495     1516      +21     
Flag Coverage Δ
hive 39.36% <26.92%> (-0.41%) ⬇️
mysql 59.00% <26.92%> (-1.22%) ⬇️
postgres 59.07% <26.92%> (-1.22%) ⬇️
presto 41.05% <26.92%> (-0.49%) ⬇️
python 59.32% <26.92%> (-2.54%) ⬇️
sqlite 58.71% <26.92%> (-1.22%) ⬇️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@orbisai0security
Copy link
Copy Markdown
Author

Thanks, I agree the current patch is too blunt for enterprise OAuth2 setups where the token endpoint may legitimately be internal.

I’ll rework this as configurable hardening rather than an unconditional block. The revised approach will add a small OAuth2 token URI validation hook/config option, with documentation explaining that deployments allowing non-admin users to configure database OAuth2 clients can use it to restrict token endpoints to approved hosts or block private/link-local/loopback ranges.

I’ll also avoid claiming this as a universal HIGH severity issue. The risk depends on who can set oauth2_client_info / database OAuth2 settings, so I’ll reframe this as defence-in-depth for user-configurable database OAuth2 endpoints.

Cover all scenarios from code review: multi-IP DNS bypass, private/loopback
rejection, configurable validation toggle, and hostname allowlist. Also mock
socket.getaddrinfo in existing OAuth2 tests for deterministic behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 13, 2026
Copy link
Copy Markdown
Contributor

@bito-code-review bito-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #33dcf6

Actionable Suggestions - 1
  • tests/unit_tests/db_engine_specs/test_base.py - 1
Review Details
  • Files reviewed - 1 · Commit Range: dd59dc5..7ad237e
    • tests/unit_tests/db_engine_specs/test_base.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment on lines +972 to +975
mocker.patch(
"superset.db_engine_specs.base.socket.getaddrinfo",
return_value=[(2, 1, 6, "", ("93.184.215.14", 0))],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated mock setup

The same socket.getaddrinfo mock is duplicated across 6 test functions (test_get_oauth2_token_without_pkce, test_get_oauth2_token_with_pkce, test_get_oauth2_token_additional_params, test_get_oauth2_fresh_token_success, test_get_oauth2_fresh_token_raises_on_auth_error, test_get_oauth2_fresh_token_raises_on_server_error). This creates maintenance risk if the mock needs to change. Consider using a pytest fixture like @pytest.fixture def mock_getaddrinfo(mocker): return mocker.patch("superset.db_engine_specs.base.socket.getaddrinfo", return_value=[(2, 1, 6, "", ("93.184.215.14", 0))]) and applying it to the relevant tests.

Code Review Run #33dcf6


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbisai0security can you address code review comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants