Skip to content

fix(network): validate target hostname in outbound requests#39301

Open
sha174n wants to merge 10 commits intoapache:masterfrom
sha174n:fix/dataset-import-url-validation
Open

fix(network): validate target hostname in outbound requests#39301
sha174n wants to merge 10 commits intoapache:masterfrom
sha174n:fix/dataset-import-url-validation

Conversation

@sha174n
Copy link
Copy Markdown
Contributor

@sha174n sha174n commented Apr 12, 2026

SUMMARY

Extends the hostname allowlist check introduced for dataset import URLs to cover additional outbound request paths in database connector setup and report webhook dispatch.

Changes:

  • superset/utils/network.py — adds is_safe_host() helper that resolves a hostname and returns True only when all resolved addresses fall outside loopback, private, link-local, and shared address ranges
  • superset/commands/dataset/importers/v1/utils.py — applies the host check when loading data from a configured URL; also validates redirect targets before following them
  • superset/config.py — adds DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS flag (default False) for deployments where data URLs point to internal hosts
  • superset/db_engine_specs/couchbase.py, databend.py, databricks.py — replaces the existing is_hostname_valid() call with is_safe_host() in validate_parameters()
  • superset/db_engine_specs/impala.py — validates the database host before the cancel-query HTTP call in cancel_query()
  • superset/reports/notifications/webhook.py — validates the webhook target host before dispatching; extracts URL checks into _validate_webhook_url() to keep send() within complexity limits

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — no UI changes

TESTING INSTRUCTIONS

pytest tests/unit_tests/utils/test_network.py
pytest tests/unit_tests/datasets/commands/importers/v1/import_test.py

All new behaviour is covered by unit tests. No integration environment required.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API — is_hostname_valid() removed from three engine spec imports (replaced by is_safe_host())

@dosubot dosubot bot added change:backend Requires changing the backend risk:breaking-change Issues or PRs that will introduce breaking changes labels Apr 12, 2026
Comment thread tests/unit_tests/datasets/commands/importers/v1/import_test.py Outdated
Comment thread tests/unit_tests/datasets/commands/importers/v1/import_test.py Outdated
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 27.77778% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.40%. Comparing base (3e25f02) to head (5f87da9).

Files with missing lines Patch % Lines
superset/utils/network.py 15.78% 16 Missing ⚠️
superset/commands/dataset/importers/v1/utils.py 31.25% 11 Missing ⚠️
superset/reports/notifications/webhook.py 22.22% 7 Missing ⚠️
superset/db_engine_specs/impala.py 33.33% 2 Missing ⚠️
superset/db_engine_specs/couchbase.py 50.00% 1 Missing ⚠️
superset/db_engine_specs/databend.py 50.00% 1 Missing ⚠️
superset/db_engine_specs/databricks.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39301      +/-   ##
==========================================
- Coverage   64.45%   64.40%   -0.05%     
==========================================
  Files        2555     2553       -2     
  Lines      132721   132600     -121     
  Branches    30802    30754      -48     
==========================================
- Hits        85539    85404     -135     
- Misses      45696    45710      +14     
  Partials     1486     1486              
Flag Coverage Δ
hive 39.95% <27.77%> (-0.01%) ⬇️
mysql 60.57% <27.77%> (-0.03%) ⬇️
postgres 60.66% <27.77%> (-0.03%) ⬇️
presto 41.74% <27.77%> (-0.02%) ⬇️
python 62.24% <27.77%> (-0.03%) ⬇️
sqlite 60.28% <27.77%> (-0.03%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread superset/utils/network.py
Comment thread superset/commands/dataset/importers/v1/utils.py
Comment thread superset/utils/network.py
Comment thread superset/utils/network.py
@sha174n sha174n force-pushed the fix/dataset-import-url-validation branch from f5d39c1 to 669096a Compare April 12, 2026 12:11
- Add is_safe_host() utility in network.py for hostname validation
- Update validate_data_uri() to validate resolved hostname after allowlist match
- Add DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS config flag for on-premises deployments
- Add unit tests for is_safe_host() and validate_data_uri()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sha174n sha174n force-pushed the fix/dataset-import-url-validation branch from 669096a to ec17725 Compare April 12, 2026 12:13
@sha174n sha174n changed the title fix(security): block SSRF in dataset import URL validation fix(dataset-import): validate hostname in import URL allowlist check Apr 12, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #78c147

Actionable Suggestions - 1
  • tests/unit_tests/utils/test_network.py - 1
    • Parametrize argument should be tuple not string · Line 24-28
Review Details
  • Files reviewed - 5 · Commit Range: f5d39c1..f5d39c1
    • superset/commands/dataset/importers/v1/utils.py
    • superset/config.py
    • superset/utils/network.py
    • tests/unit_tests/datasets/commands/importers/v1/import_test.py
    • tests/unit_tests/utils/test_network.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread tests/unit_tests/utils/test_network.py
sha174n and others added 2 commits April 12, 2026 13:20
…handling

- Add 100.64.0.0/10 (RFC 6598 shared address space) to blocked ranges
- Unwrap IPv4-mapped IPv6 addresses before network membership checks
- Fix return type annotations on new test functions
- Add tests for CGNAT range and IPv4-mapped IPv6 bypass cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…redirects

- Add _ValidatingRedirectHandler to re-run URL validation on each redirect
- Replace urlopen() with build_opener(_ValidatingRedirectHandler) in load_data()
- Add test for redirect handler blocking disallowed redirect targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review bot commented Apr 12, 2026

Code Review Agent Run #827327

Actionable Suggestions - 0
Review Details
  • Files reviewed - 5 · Commit Range: ec17725..dfb304f
    • superset/commands/dataset/importers/v1/utils.py
    • superset/config.py
    • superset/utils/network.py
    • tests/unit_tests/datasets/commands/importers/v1/import_test.py
    • tests/unit_tests/utils/test_network.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

sha174n and others added 3 commits April 12, 2026 15:10
… change

Update test mocks to match the request.build_opener() call pattern
introduced in the previous commit, and apply ruff formatting fixes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onnectors and webhooks

Extend host validation to database connector setup and webhook
notification dispatch, using the same hostname check introduced for
dataset import URL validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sha174n sha174n changed the title fix(dataset-import): validate hostname in import URL allowlist check fix(network): validate target hostname in outbound requests Apr 12, 2026
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review bot commented Apr 12, 2026

Code Review Agent Run #72b94c

Actionable Suggestions - 0
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • superset/db_engine_specs/databend.py - 1
Review Details
  • Files reviewed - 8 · Commit Range: dfb304f..8d120df
    • superset/commands/dataset/importers/v1/utils.py
    • tests/unit_tests/datasets/commands/importers/v1/import_test.py
    • tests/unit_tests/utils/test_network.py
    • superset/db_engine_specs/couchbase.py
    • superset/db_engine_specs/databend.py
    • superset/db_engine_specs/databricks.py
    • superset/db_engine_specs/impala.py
    • superset/reports/notifications/webhook.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review bot commented Apr 12, 2026

Code Review Agent Run #9be3dc

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 8d120df..0c20ccf
    • tests/unit_tests/db_engine_specs/test_impala.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review bot commented Apr 12, 2026

Code Review Agent Run #25279a

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 0c20ccf..de45255
    • tests/unit_tests/utils/test_network.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

headers: Any,
newurl: str,
) -> request.Request | None:
validate_data_uri(newurl)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a docstring to the newly introduced method so all new function definitions are documented. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
validate_data_uri(newurl)
"""Validate each redirect target before delegating to the parent handler."""
Why it matters? 🤔

The new method redirect_request is introduced without a docstring, which matches the stated documentation rule. The improved code adds a concise docstring without changing behavior, and the method remains syntactically valid and executable.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/commands/dataset/importers/v1/utils.py
**Line:** 60:60
**Comment:**
	*Custom Rule: Add a docstring to the newly introduced method so all new function definitions are documented.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

:raises DatasetForbiddenDataURI: if the URI is not permitted
"""
if data_uri.startswith("file://"):
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Allowing every file:// URI unconditionally lets dataset imports read arbitrary local files, bypassing the configured URL allowlist. Restrict local file access to the examples directory (or otherwise enforce an explicit allowlist check) before returning. [security]

Severity Level: Critical 🚨
- ❌ Dataset import API reads arbitrary local filesystem files.
- ❌ Attackers with import rights can exfiltrate sensitive host data.
- ⚠️ Violates expectation that file:// used only for examples.
Suggested change
return
from pathlib import Path
from superset.examples.helpers import get_examples_folder
file_path = Path(urlparse(data_uri).path).resolve()
examples_path = Path(get_examples_folder()).resolve()
if file_path.is_relative_to(examples_path):
return
raise DatasetForbiddenDataURI()
Steps of Reproduction ✅
1. As an authenticated user with dataset import permission, call the dataset import
endpoint implemented in `superset/datasets/api.py:1000-1048` by sending `POST
/api/v1/dataset/import/` with a ZIP file payload.

2. Inside the ZIP, include a dataset config file under `datasets/` (handled by
`ImportDatasetsCommand` in `superset/commands/dataset/importers/v1/__init__.py:32-72`)
whose YAML, after loading via `ImportV1DatasetSchema` in
`superset/datasets/schemas.py:56-106`, sets `data` to a local file URL such as
`file:///etc/passwd`.

3. When `ImportDatasetsCommand._import()` calls `import_dataset(config,
overwrite=overwrite)` (`superset/commands/dataset/importers/v1/__init__.py:65-72`),
`import_dataset()` in `superset/commands/dataset/importers/v1/utils.py:143-230` copies
`config`, extracts `data_uri = config.get("data")` at line 193, and, because the target
table does not yet exist, invokes `load_data(data_uri, dataset, dataset.database)` at line
224.

4. In `load_data()` (`superset/commands/dataset/importers/v1/utils.py:233-288`), the URL
is first passed through `normalize_example_data_url()` which returns `file://` URLs
unchanged (`superset/examples/helpers.py:93-115`), then `validate_data_uri()` is called at
line 245; `validate_data_uri()` immediately returns for any `file://` URI without
allowlist or host checks at lines 107-120, and `load_data()` then executes
`opener.open(data_uri)` at line 248 and `pd.read_csv(data, encoding="utf-8")` at line 251,
causing the contents of the arbitrary local file (e.g. `/etc/passwd`) to be read into a
dataset and exposed through Superset.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/commands/dataset/importers/v1/utils.py
**Line:** 120:120
**Comment:**
	*Security: Allowing every `file://` URI unconditionally lets dataset imports read arbitrary local files, bypassing the configured URL allowlist. Restrict local file access to the examples directory (or otherwise enforce an explicit allowlist check) before returning.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Comment thread superset/utils/network.py
# are checked against the IPv4 unsafe networks rather than bypassing.
if isinstance(ip, ipaddress.IPv6Address) and ip.ipv4_mapped:
ip = ip.ipv4_mapped
if any(ip in net for net in _SSRF_UNSAFE_NETWORKS):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The host safety check only blocks a hardcoded subset of non-public ranges, so other non-globally-routable addresses (for example benchmark/test/documentation ranges) are incorrectly treated as safe. Use ip.is_global so all non-routable targets are rejected consistently. [security]

Severity Level: Major ⚠️
- ❌ Dataset import can call hosts in documentation IP ranges.
- ❌ Connector host validation allows reserved or benchmark-only IP ranges.
- ⚠️ Webhook notifications may target non-global internal-only HTTP endpoints.
- ⚠️ SSRF guard weaker than `is_safe_host` documented behavior.
Suggested change
if any(ip in net for net in _SSRF_UNSAFE_NETWORKS):
if not ip.is_global or any(ip in net for net in _SSRF_UNSAFE_NETWORKS):
Steps of Reproduction ✅
1. Start Superset with this PR code and ensure dataset import is enabled. Confirm that
dataset imports use the v1 importer via the dispatcher in
`superset/commands/dataset/importers/dispatcher.py:32-38`, where `command_versions =
[v1.ImportDatasetsCommand, v0.ImportDatasetsCommand]`.

2. Configure dataset import to allow arbitrary HTTPS URLs but still enforce the "no
internal hosts" check: set `DATASET_IMPORT_ALLOWED_DATA_URLS = [r".*"]` and leave
`DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS = False` (default defined in
`superset/config.py:2283` and read in
`superset/commands/dataset/importers/v1/utils.py:122-135`).

3. Use the dataset import REST API implemented in `superset/datasets/api.py:1036-1047` by
POSTing a bundle to the import endpoint so that one dataset config contains a `data` field
pointing to `https://192.0.2.10/data.csv` (192.0.2.0/24 is an IANA documentation-only
range). The API constructs `ImportDatasetsCommand` (dispatcher), which instantiates
`v1.ImportDatasetsCommand` (`superset/commands/dataset/importers/v1/__init__.py:32-71`),
and that in turn calls `import_dataset(config, overwrite=overwrite)` with your config.

4. Inside `import_dataset` (`superset/commands/dataset/importers/v1/utils.py:143-230`),
when `config["data"]` is set, it eventually calls `load_data(data_uri, dataset,
dataset.database)` at line 224. `load_data` normalizes the URL then invokes
`validate_data_uri(data_uri)` at line 245, which resolves the hostname and calls
`is_safe_host(hostname)` at lines 136-138. For `hostname = "192.0.2.10"`,
`socket.getaddrinfo` returns that IP; `is_safe_host` (in
`superset/utils/network.py:43-69`) iterates the results and checks `if any(ip in net for
net in _SSRF_UNSAFE_NETWORKS): return False` (lines 67-68). Because 192.0.2.10 is not part
of any network in `_SSRF_UNSAFE_NETWORKS` (defined at lines 26-36), the condition never
triggers and the function returns `True`, treating this documentation-only,
non-globally-routable address as "safe". The dataset import proceeds to fetch from that
host, contradicting the docstring's promise to allow only "public, globally-routable IP
addresses". With the suggested change `if not ip.is_global or any(ip in net for net in
_SSRF_UNSAFE_NETWORKS): return False`, addresses like 192.0.2.10 (for which
`ipaddress.ip_address("192.0.2.10").is_global` is False) would be rejected and the import
would fail instead of reaching this non-global range.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/utils/network.py
**Line:** 67:67
**Comment:**
	*Security: The host safety check only blocks a hardcoded subset of non-public ranges, so other non-globally-routable addresses (for example benchmark/test/documentation ranges) are incorrectly treated as safe. Use `ip.is_global` so all non-routable targets are rejected consistently.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend risk:breaking-change Issues or PRs that will introduce breaking changes size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant