Skip to content

fix(dataset): validate datasource access during import#39998

Merged
dpgaspar merged 3 commits into
apache:masterfrom
sha174n:fix/dataset-import-access-check
May 15, 2026
Merged

fix(dataset): validate datasource access during import#39998
dpgaspar merged 3 commits into
apache:masterfrom
sha174n:fix/dataset-import-access-check

Conversation

@sha174n
Copy link
Copy Markdown
Contributor

@sha174n sha174n commented May 10, 2026

SUMMARY

import_dataset() in the V1 importer had an ownership gate that prevented unauthorized overwrites of existing datasets, but lacked a call to security_manager.raise_for_access(datasource=dataset) after the dataset was created or updated.

Without this check, a user with can_write on the Dataset resource could import a YAML bundle that creates or overwrites a dataset pointing to a database/schema they do not have access to under the normal access rules.

This PR adds raise_for_access(datasource=dataset) after the dataset is persisted, mirroring the pattern used by UpdateDatasetCommand. The check is skipped when ignore_permissions=True (used by admin bulk-import flows).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only change.

TESTING INSTRUCTIONS

  1. Run the unit tests:

    pytest tests/unit_tests/datasets/commands/importers/v1/import_test.py -v

    All 17 tests should pass, including the new test_import_dataset_access_check.

  2. Attempt to import a dataset YAML for a table in a schema the importing user does not have access to (e.g., a Gamma user importing a dataset on a restricted database) — should return a 422 error.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API

Add a raise_for_access(datasource=dataset) call in import_dataset()
after the dataset is created or updated. This ensures the importing
user has the necessary datasource-level permissions in addition to the
existing can_write and ownership checks.

The check is skipped when ignore_permissions=True (used by admin
bulk-import workflows).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 10, 2026

Code Review Agent Run #685e4b

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 8be0a5c..8be0a5c
    • superset/commands/dataset/importers/v1/utils.py
    • tests/unit_tests/datasets/commands/importers/v1/import_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added authentication:access-control Rlated to access control change:backend Requires changing the backend data:dataset Related to dataset configurations labels May 10, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.83%. Comparing base (f67dd4a) to head (77d257a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
superset/commands/dataset/importers/v1/utils.py 57.14% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39998      +/-   ##
==========================================
- Coverage   63.83%   63.83%   -0.01%     
==========================================
  Files        2589     2589              
  Lines      137821   137827       +6     
  Branches    31928    31929       +1     
==========================================
+ Hits        87978    87981       +3     
- Misses      48327    48329       +2     
- Partials     1516     1517       +1     
Flag Coverage Δ
hive 39.36% <28.57%> (-0.01%) ⬇️
mysql 59.01% <57.14%> (-0.01%) ⬇️
postgres 59.09% <57.14%> (-0.01%) ⬇️
presto 41.05% <28.57%> (-0.01%) ⬇️
python 60.53% <57.14%> (-0.01%) ⬇️
sqlite 58.73% <57.14%> (-0.01%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +179 to +182
except SupersetSecurityException as ex:
raise ImportFailedError(
"User does not have access to the target dataset"
) from ex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This new access-denial path raises ImportFailedError, which is a 500-class command exception in this codebase. As a result, permission-denied imports are reported as internal server errors instead of client validation/authorization failures (the PR expectation says 422). Raise a 4xx-mapped exception type for this branch so unauthorized datasource imports return the correct HTTP status. [api mismatch]

Severity Level: Major ⚠️
- ❌ Dataset import endpoint returns 500 on permission-denied imports.
- ⚠️ Clients cannot distinguish auth failures from server crashes.
- ⚠️ API contract/docs advertising 4xx on import mismatched.
Steps of Reproduction ✅
1. Trigger the dataset import API `POST /api/v1/dataset/import/` implemented in
`superset/datasets/api.py:930-1032` by uploading a ZIP/YAML bundle that includes a dataset
whose `database_id` points to a database/schema the current user cannot access under
normal security rules.

2. In `DatasetRestApi.import_` (`superset/datasets/api.py:930-1032`), the request file is
parsed into `contents`, and an `ImportDatasetsCommand` is instantiated and executed at
`superset/datasets/api.py:1022-1032` via `command = ImportDatasetsCommand(contents, ...)`
followed by `command.run()`.

3. `ImportDatasetsCommand.run()` in
`superset/commands/dataset/importers/dispatcher.py:51-68` dispatches to
`v1.ImportDatasetsCommand`, whose `_import` method in
`superset/commands/dataset/importers/v1/__init__.py:36-43` iterates dataset configs and
calls `import_dataset(config, overwrite=overwrite)` from
`superset/commands/dataset/importers/v1/utils.py:106-112`.

4. Inside `import_dataset` in `superset/commands/dataset/importers/v1/utils.py:84-93`,
after the dataset is created/updated and flushed, the new block at lines 176-182 executes
`security_manager.raise_for_access(datasource=dataset)`; for a user without
datasource-level access this raises `SupersetSecurityException`, which is caught at line
179 and converted to `ImportFailedError` (lines 179-182). `ImportFailedError` is defined
in `superset/commands/exceptions.py:8-10` with `status = 500`, and the global
`CommandException` handler in `superset/views/error_handling.py:184-212` returns a JSON
error response with `status=ex.status`, i.e. HTTP 500, so the permission-denied dataset
import surfaces as a 500 Internal Server Error instead of a 4xx (e.g. 422)
client/authorization error.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/commands/dataset/importers/v1/utils.py
**Line:** 179:182
**Comment:**
	*Api Mismatch: This new access-denial path raises `ImportFailedError`, which is a 500-class command exception in this codebase. As a result, permission-denied imports are reported as internal server errors instead of client validation/authorization failures (the PR expectation says 422). Raise a 4xx-mapped exception type for this branch so unauthorized datasource imports return the correct HTTP status.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@bito-code-review
Copy link
Copy Markdown
Contributor

The flagged issue is correct: the code catches SupersetSecurityException and raises ImportFailedError (status 500), but permission-denied imports should return a 4xx status like 422. To resolve, remove the try-except block and directly call security_manager.raise_for_access(datasource=dataset) when not ignoring permissions. This lets SupersetSecurityException propagate, which maps to a 4xx error. The added test in the PR will need updating to expect SupersetSecurityException instead of ImportFailedError. No other comments in the PR.

superset/commands/dataset/importers/v1/utils.py

if not ignore_permissions:
        security_manager.raise_for_access(datasource=dataset)

…ailedError (500) on access check failure

Permission-denied imports should return 403, not 500. Use the existing
DatasetAccessDeniedError(ForbiddenError) which already maps to HTTP 403.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 10, 2026

Code Review Agent Run #e4df51

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 8be0a5c..e531877
    • superset/commands/dataset/importers/v1/utils.py
    • tests/unit_tests/datasets/commands/importers/v1/import_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dpgaspar dpgaspar merged commit ee9eec2 into apache:master May 15, 2026
65 checks passed
sha174n added a commit to sha174n/superset that referenced this pull request May 15, 2026
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authentication:access-control Rlated to access control change:backend Requires changing the backend data:dataset Related to dataset configurations size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants