fix(network): validate target hostname in outbound requests#39301
fix(network): validate target hostname in outbound requests#39301sha174n wants to merge 10 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #39301 +/- ##
==========================================
- Coverage 64.45% 64.40% -0.05%
==========================================
Files 2555 2553 -2
Lines 132721 132600 -121
Branches 30802 30754 -48
==========================================
- Hits 85539 85404 -135
- Misses 45696 45710 +14
Partials 1486 1486
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f5d39c1 to
669096a
Compare
- Add is_safe_host() utility in network.py for hostname validation - Update validate_data_uri() to validate resolved hostname after allowlist match - Add DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS config flag for on-premises deployments - Add unit tests for is_safe_host() and validate_data_uri() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
669096a to
ec17725
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review Agent Run #78c147
Actionable Suggestions - 1
-
tests/unit_tests/utils/test_network.py - 1
- Parametrize argument should be tuple not string · Line 24-28
Review Details
-
Files reviewed - 5 · Commit Range:
f5d39c1..f5d39c1- superset/commands/dataset/importers/v1/utils.py
- superset/config.py
- superset/utils/network.py
- tests/unit_tests/datasets/commands/importers/v1/import_test.py
- tests/unit_tests/utils/test_network.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
…handling - Add 100.64.0.0/10 (RFC 6598 shared address space) to blocked ranges - Unwrap IPv4-mapped IPv6 addresses before network membership checks - Fix return type annotations on new test functions - Add tests for CGNAT range and IPv4-mapped IPv6 bypass cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…redirects - Add _ValidatingRedirectHandler to re-run URL validation on each redirect - Replace urlopen() with build_opener(_ValidatingRedirectHandler) in load_data() - Add test for redirect handler blocking disallowed redirect targets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review Agent Run #827327Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
… change Update test mocks to match the request.build_opener() call pattern introduced in the previous commit, and apply ruff formatting fixes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onnectors and webhooks Extend host validation to database connector setup and webhook notification dispatch, using the same hostname check introduced for dataset import URL validation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review Agent Run #72b94cActionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review Agent Run #9be3dcActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review Agent Run #25279aActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| headers: Any, | ||
| newurl: str, | ||
| ) -> request.Request | None: | ||
| validate_data_uri(newurl) |
There was a problem hiding this comment.
Suggestion: Add a docstring to the newly introduced method so all new function definitions are documented. [custom_rule]
Severity Level: Minor
| validate_data_uri(newurl) | |
| """Validate each redirect target before delegating to the parent handler.""" |
Why it matters? 🤔
The new method redirect_request is introduced without a docstring, which matches the stated documentation rule. The improved code adds a concise docstring without changing behavior, and the method remains syntactically valid and executable.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/commands/dataset/importers/v1/utils.py
**Line:** 60:60
**Comment:**
*Custom Rule: Add a docstring to the newly introduced method so all new function definitions are documented.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| :raises DatasetForbiddenDataURI: if the URI is not permitted | ||
| """ | ||
| if data_uri.startswith("file://"): | ||
| return |
There was a problem hiding this comment.
Suggestion: Allowing every file:// URI unconditionally lets dataset imports read arbitrary local files, bypassing the configured URL allowlist. Restrict local file access to the examples directory (or otherwise enforce an explicit allowlist check) before returning. [security]
Severity Level: Critical 🚨
- ❌ Dataset import API reads arbitrary local filesystem files.
- ❌ Attackers with import rights can exfiltrate sensitive host data.
- ⚠️ Violates expectation that file:// used only for examples.| return | |
| from pathlib import Path | |
| from superset.examples.helpers import get_examples_folder | |
| file_path = Path(urlparse(data_uri).path).resolve() | |
| examples_path = Path(get_examples_folder()).resolve() | |
| if file_path.is_relative_to(examples_path): | |
| return | |
| raise DatasetForbiddenDataURI() |
Steps of Reproduction ✅
1. As an authenticated user with dataset import permission, call the dataset import
endpoint implemented in `superset/datasets/api.py:1000-1048` by sending `POST
/api/v1/dataset/import/` with a ZIP file payload.
2. Inside the ZIP, include a dataset config file under `datasets/` (handled by
`ImportDatasetsCommand` in `superset/commands/dataset/importers/v1/__init__.py:32-72`)
whose YAML, after loading via `ImportV1DatasetSchema` in
`superset/datasets/schemas.py:56-106`, sets `data` to a local file URL such as
`file:///etc/passwd`.
3. When `ImportDatasetsCommand._import()` calls `import_dataset(config,
overwrite=overwrite)` (`superset/commands/dataset/importers/v1/__init__.py:65-72`),
`import_dataset()` in `superset/commands/dataset/importers/v1/utils.py:143-230` copies
`config`, extracts `data_uri = config.get("data")` at line 193, and, because the target
table does not yet exist, invokes `load_data(data_uri, dataset, dataset.database)` at line
224.
4. In `load_data()` (`superset/commands/dataset/importers/v1/utils.py:233-288`), the URL
is first passed through `normalize_example_data_url()` which returns `file://` URLs
unchanged (`superset/examples/helpers.py:93-115`), then `validate_data_uri()` is called at
line 245; `validate_data_uri()` immediately returns for any `file://` URI without
allowlist or host checks at lines 107-120, and `load_data()` then executes
`opener.open(data_uri)` at line 248 and `pd.read_csv(data, encoding="utf-8")` at line 251,
causing the contents of the arbitrary local file (e.g. `/etc/passwd`) to be read into a
dataset and exposed through Superset.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/commands/dataset/importers/v1/utils.py
**Line:** 120:120
**Comment:**
*Security: Allowing every `file://` URI unconditionally lets dataset imports read arbitrary local files, bypassing the configured URL allowlist. Restrict local file access to the examples directory (or otherwise enforce an explicit allowlist check) before returning.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # are checked against the IPv4 unsafe networks rather than bypassing. | ||
| if isinstance(ip, ipaddress.IPv6Address) and ip.ipv4_mapped: | ||
| ip = ip.ipv4_mapped | ||
| if any(ip in net for net in _SSRF_UNSAFE_NETWORKS): |
There was a problem hiding this comment.
Suggestion: The host safety check only blocks a hardcoded subset of non-public ranges, so other non-globally-routable addresses (for example benchmark/test/documentation ranges) are incorrectly treated as safe. Use ip.is_global so all non-routable targets are rejected consistently. [security]
Severity Level: Major ⚠️
- ❌ Dataset import can call hosts in documentation IP ranges.
- ❌ Connector host validation allows reserved or benchmark-only IP ranges.
- ⚠️ Webhook notifications may target non-global internal-only HTTP endpoints.
- ⚠️ SSRF guard weaker than `is_safe_host` documented behavior.| if any(ip in net for net in _SSRF_UNSAFE_NETWORKS): | |
| if not ip.is_global or any(ip in net for net in _SSRF_UNSAFE_NETWORKS): |
Steps of Reproduction ✅
1. Start Superset with this PR code and ensure dataset import is enabled. Confirm that
dataset imports use the v1 importer via the dispatcher in
`superset/commands/dataset/importers/dispatcher.py:32-38`, where `command_versions =
[v1.ImportDatasetsCommand, v0.ImportDatasetsCommand]`.
2. Configure dataset import to allow arbitrary HTTPS URLs but still enforce the "no
internal hosts" check: set `DATASET_IMPORT_ALLOWED_DATA_URLS = [r".*"]` and leave
`DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS = False` (default defined in
`superset/config.py:2283` and read in
`superset/commands/dataset/importers/v1/utils.py:122-135`).
3. Use the dataset import REST API implemented in `superset/datasets/api.py:1036-1047` by
POSTing a bundle to the import endpoint so that one dataset config contains a `data` field
pointing to `https://192.0.2.10/data.csv` (192.0.2.0/24 is an IANA documentation-only
range). The API constructs `ImportDatasetsCommand` (dispatcher), which instantiates
`v1.ImportDatasetsCommand` (`superset/commands/dataset/importers/v1/__init__.py:32-71`),
and that in turn calls `import_dataset(config, overwrite=overwrite)` with your config.
4. Inside `import_dataset` (`superset/commands/dataset/importers/v1/utils.py:143-230`),
when `config["data"]` is set, it eventually calls `load_data(data_uri, dataset,
dataset.database)` at line 224. `load_data` normalizes the URL then invokes
`validate_data_uri(data_uri)` at line 245, which resolves the hostname and calls
`is_safe_host(hostname)` at lines 136-138. For `hostname = "192.0.2.10"`,
`socket.getaddrinfo` returns that IP; `is_safe_host` (in
`superset/utils/network.py:43-69`) iterates the results and checks `if any(ip in net for
net in _SSRF_UNSAFE_NETWORKS): return False` (lines 67-68). Because 192.0.2.10 is not part
of any network in `_SSRF_UNSAFE_NETWORKS` (defined at lines 26-36), the condition never
triggers and the function returns `True`, treating this documentation-only,
non-globally-routable address as "safe". The dataset import proceeds to fetch from that
host, contradicting the docstring's promise to allow only "public, globally-routable IP
addresses". With the suggested change `if not ip.is_global or any(ip in net for net in
_SSRF_UNSAFE_NETWORKS): return False`, addresses like 192.0.2.10 (for which
`ipaddress.ip_address("192.0.2.10").is_global` is False) would be rejected and the import
would fail instead of reaching this non-global range.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/utils/network.py
**Line:** 67:67
**Comment:**
*Security: The host safety check only blocks a hardcoded subset of non-public ranges, so other non-globally-routable addresses (for example benchmark/test/documentation ranges) are incorrectly treated as safe. Use `ip.is_global` so all non-routable targets are rejected consistently.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
SUMMARY
Extends the hostname allowlist check introduced for dataset import URLs to cover additional outbound request paths in database connector setup and report webhook dispatch.
Changes:
superset/utils/network.py— addsis_safe_host()helper that resolves a hostname and returnsTrueonly when all resolved addresses fall outside loopback, private, link-local, and shared address rangessuperset/commands/dataset/importers/v1/utils.py— applies the host check when loading data from a configured URL; also validates redirect targets before following themsuperset/config.py— addsDATASET_IMPORT_ALLOW_INTERNAL_DATA_URLSflag (defaultFalse) for deployments where data URLs point to internal hostssuperset/db_engine_specs/couchbase.py,databend.py,databricks.py— replaces the existingis_hostname_valid()call withis_safe_host()invalidate_parameters()superset/db_engine_specs/impala.py— validates the database host before the cancel-query HTTP call incancel_query()superset/reports/notifications/webhook.py— validates the webhook target host before dispatching; extracts URL checks into_validate_webhook_url()to keepsend()within complexity limitsBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A — no UI changes
TESTING INSTRUCTIONS
All new behaviour is covered by unit tests. No integration environment required.
ADDITIONAL INFORMATION
is_hostname_valid()removed from three engine spec imports (replaced byis_safe_host())