fix: multiple outbound http requests in superset/db_... in base.py#39646
fix: multiple outbound http requests in superset/db_... in base.py#39646orbisai0security wants to merge 3 commits into
Conversation
Automated security fix generated by Orbis Security AI
Code Review Agent Run #99179eActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| ) | ||
| hostname = parsed.hostname or "" | ||
| try: | ||
| ip = ipaddress.ip_address(_socket.gethostbyname(hostname)) |
There was a problem hiding this comment.
Suggestion: The validation checks only one DNS result via gethostbyname, but hostnames can resolve to multiple IPs and the actual HTTP connection may use a different address than the one validated. This creates a DNS-based SSRF bypass where a hostname with mixed public/private records can pass validation yet still connect to an internal address. [security]
Severity Level: Critical 🚨
- ⚠️ SSRF possible via mixed public/private DNS records.
- ⚠️ Database OAuth2 token exchange may reach internal services.Steps of Reproduction ✅
1. Configure a database OAuth2 client with a token endpoint whose hostname has mixed DNS
records by setting `encrypted_extra` → `oauth2_client_info.token_request_uri` to
`https://mixed.example.com/token` in `Database.get_oauth2_config`
(superset/models/core.py:26–39), where `mixed.example.com` is controlled by an attacker
and resolves to both a public IP (e.g. 198.51.100.10) and a private IP (e.g. 10.0.0.5).
2. Trigger the OAuth2 code exchange by completing the OAuth2 authorization flow for that
database so the provider redirects back to `DatabaseRestApi.oauth2`
(superset/databases/api.py:1448–1487), which constructs and runs `OAuth2StoreTokenCommand`
(superset/commands/database/oauth2.py:36–48).
3. `OAuth2StoreTokenCommand.run` calls
`self._database.db_engine_spec.get_oauth2_token(oauth2_config, code, code_verifier)` at
line 74, which in `BaseEngineSpec.get_oauth2_token`
(superset/db_engine_specs/base.py:60–73) sets `uri = config["token_request_uri"]` and then
calls `cls._validate_oauth2_token_uri(uri)` before issuing `requests.post(uri, ...)`.
4. In `_validate_oauth2_token_uri` (superset/db_engine_specs/base.py:56–75), `hostname =
parsed.hostname` and `ip = ipaddress.ip_address(_socket.gethostbyname(hostname))` (line
828) validate only a single resolved IPv4 address. If the resolver returns the public IP
first, `ip.is_private` is False and validation passes; however, when `requests.post` opens
the connection, the underlying `socket.getaddrinfo` may choose or fall back to the private
IP (e.g. 10.0.0.5), allowing the actual HTTP request to be sent to a private/internal
address even though only one (public) DNS result was checked, demonstrating a DNS-based
SSRF bypass via multi-record hosts.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/base.py
**Line:** 828:828
**Comment:**
*Security: The validation checks only one DNS result via `gethostbyname`, but hostnames can resolve to multiple IPs and the actual HTTP connection may use a different address than the one validated. This creates a DNS-based SSRF bypass where a hostname with mixed public/private records can pass validation yet still connect to an internal address.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
@orbisai0security can you address code review comments?
There was a problem hiding this comment.
Yes — this review comment is valid.
gethostbyname() only checks a single resolved IPv4 address, so a hostname with multiple DNS records can still slip through if requests.post() connects to a different IP. That means the current validation does not fully prevent DNS-based SSRF.
Recommended fix
Validate all resolved addresses for the hostname and reject the URI if any of them are private / loopback / link-local / reserved.
A concise improvement would be to switch from gethostbyname() to getaddrinfo() and validate every returned IP:
import ipaddress
import socket as _socket
from urllib.parse import urlparse
parsed = urlparse(uri)
hostname = parsed.hostname or ""
addresses = {
info[4][0]
for info in _socket.getaddrinfo(hostname, None)
}
for addr in addresses:
ip = ipaddress.ip_address(addr)
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise ValueError(
"OAuth2 token URI must not point to a private or reserved network address"
)Why this is better
- Covers multiple DNS results
- Prevents the “one public IP passes, one private IP is used” bypass
- Keeps the change small and localized
So yes, I’d recommend updating this validation before merging. If you want, I can also review the other comments on this PR and help apply the remaining minimal fixes too.
|
The flagged security issue is correct: the validation uses superset/db_engine_specs/base.py |
| try: | ||
| ip = ipaddress.ip_address(_socket.gethostbyname(hostname)) | ||
| if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved: | ||
| raise ValueError( |
There was a problem hiding this comment.
This impedes devs from doing OAuth2 db connection works. I wholly disagree with this validation.
There was a problem hiding this comment.
Thanks for the review. I agree the current mitigation is incomplete as written: gethostbyname() only validates a single IPv4 result, while the latter requests.post() may resolve/connect differently. I’ll update this to validate all resolved addresses via getaddrinfo() and reject if any address is loopback/private/link-local/reserved/etc.
That said, I also agree that a hard global private-network block may break legitimate internal OAuth2 providers. I’ll look at making this policy configurable rather than unconditional, with secure defaults.
…Auth2 token URI Address PR review comments: replace gethostbyname() with getaddrinfo() to validate all resolved addresses, and make the validation configurable via DATABASE_OAUTH2_TOKEN_URI_SSRF_VALIDATION and DATABASE_OAUTH2_TOKEN_URI_ALLOWED_HOSTS for deployments with internal OAuth2 providers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Code Review Agent Run #cb548fActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
Chiming in to echo some findings/status on this one.
Let us know if/how you want to carry this forward from here, and if you think there's a real security risk here, please follow the security submission process on the GitHub Issue template rather than publicly surfacing anything in an Issue or publishing exploits in a PR description. Just email security@superset.apache.org, basically. Maybe as a half-measure, we can set |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39646 +/- ##
==========================================
- Coverage 64.47% 63.26% -1.21%
==========================================
Files 2566 2584 +18
Lines 133910 136615 +2705
Branches 31090 31459 +369
==========================================
+ Hits 86333 86425 +92
- Misses 46082 48674 +2592
- Partials 1495 1516 +21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks, I agree the current patch is too blunt for enterprise OAuth2 setups where the token endpoint may legitimately be internal. I’ll rework this as configurable hardening rather than an unconditional block. The revised approach will add a small OAuth2 token URI validation hook/config option, with documentation explaining that deployments allowing non-admin users to configure database OAuth2 clients can use it to restrict token endpoints to approved hosts or block private/link-local/loopback ranges. I’ll also avoid claiming this as a universal HIGH severity issue. The risk depends on who can set |
Cover all scenarios from code review: multi-IP DNS bypass, private/loopback rejection, configurable validation toggle, and hostname allowlist. Also mock socket.getaddrinfo in existing OAuth2 tests for deterministic behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review Agent Run #33dcf6
Actionable Suggestions - 1
-
tests/unit_tests/db_engine_specs/test_base.py - 1
- Duplicated mock setup · Line 972-975
Review Details
-
Files reviewed - 1 · Commit Range:
dd59dc5..7ad237e- tests/unit_tests/db_engine_specs/test_base.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| mocker.patch( | ||
| "superset.db_engine_specs.base.socket.getaddrinfo", | ||
| return_value=[(2, 1, 6, "", ("93.184.215.14", 0))], | ||
| ) |
There was a problem hiding this comment.
The same socket.getaddrinfo mock is duplicated across 6 test functions (test_get_oauth2_token_without_pkce, test_get_oauth2_token_with_pkce, test_get_oauth2_token_additional_params, test_get_oauth2_fresh_token_success, test_get_oauth2_fresh_token_raises_on_auth_error, test_get_oauth2_fresh_token_raises_on_server_error). This creates maintenance risk if the mock needs to change. Consider using a pytest fixture like @pytest.fixture def mock_getaddrinfo(mocker): return mocker.patch("superset.db_engine_specs.base.socket.getaddrinfo", return_value=[(2, 1, 6, "", ("93.184.215.14", 0))]) and applying it to the relevant tests.
Code Review Run #33dcf6
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
@orbisai0security can you address code review comments?
Summary
Fix high severity security issue in
superset/db_engine_specs/base.py.Vulnerability
V-003superset/db_engine_specs/base.py:806Description: Multiple outbound HTTP requests in superset/db_engine_specs/base.py (lines 806, 808, 831, 833) and superset/db_engine_specs/impala.py (line 213) use URLs that may be derived from user-supplied database connection parameters. No URL validation, IP allowlist, or private network blocking is evident. An authenticated user with database connection management permissions can supply internal network addresses as the database host, causing Superset to make HTTP requests to internal services on behalf of the attacker.
Changes
superset/db_engine_specs/base.pyVerification
Automated security fix by OrbisAI Security