fix(sqllab): require dataset match for raw query access#40409
Draft
sha174n wants to merge 9 commits into
Draft
Conversation
In raise_for_access, the SQL Lab branch let catalog_access and schema_access short-circuit the per-table dataset-existence check. The chart-data branch right next door requires can_access_schema on a registered SqlaTable, so the SQL Lab path was strictly broader than the chart-data path for the same permission. Tighten the SQL Lab path so every referenced table must resolve to a Superset dataset the user holds datasource_access on (or owns). The table-only call sites (/table_metadata/, /select_star/, MetaDB) keep the historical catalog_access / schema_access fallthrough since they return schema metadata only, not row data. Two existing tests updated to mock query_datasources_by_name so they exercise the new dataset-match path; one regression test added. Backwards compatibility: deployments that used schema_access as a deliberate "full SQL Lab access to one schema" grant will start being denied. Mitigation: grant database_access, or register the relevant tables as Superset datasets. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #40409 +/- ##
==========================================
+ Coverage 64.20% 64.23% +0.03%
==========================================
Files 2592 2592
Lines 139226 139243 +17
Branches 32326 32332 +6
==========================================
+ Hits 89385 89439 +54
+ Misses 48306 48256 -50
- Partials 1535 1548 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…t_match)
Code review of the previous revision flagged that `query is not None`
was too broad as a discriminator: it caught chart-data on saved-query
datasources, dataset create/update from SQL, SQL Lab results/export,
and explore on saved queries, all of which were schema_access-by-design
flows. It also missed MetaDB, which uses the table-only call path but
actually returns row data.
Replace the inferred flag with an explicit kwarg
`force_dataset_match: bool = False` on `raise_for_access`. Default
(False) preserves the historical catalog_access / schema_access
fallthrough for chart-data, dataset CRUD, table_metadata, select_star,
explore, etc. The three call sites that actually execute or return raw
row data set it to True:
- superset/sqllab/validators.py (SQL Lab execute pre-flight)
- superset/models/sql_lab.py (Query.raise_for_access, used by
results.py / export.py /
streaming_export_command.py)
- superset/extensions/metadb.py (closes the same bypass on the
database+table path)
Tests:
- Reverts the broad updates to test_raise_for_access_query /
test_raise_for_access_sql so they exercise the default path again.
- Drops the vacuous regression test that never granted schema_access
in the first place.
- Adds three focused tests under force_dataset_match=True:
* allows when a dataset exists and datasource_access matches
* denies when only schema_access is held and no dataset exists
* (default path) still allows schema_access for non-dataset
tables, proving backwards compatibility.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code review on the previous revision flagged two related ways an authenticated user could still defeat the new force_dataset_match gate. Both close the same authorization gap and are confirmed. 1. SqlaTable.query_datasources_by_name drops the schema filter when schema is None (see connectors/sqla/models.py:1938 `if schema:`). On a backend whose default_schema_for_query() returns None and a parser output that did not include a schema, the lookup matches SqlaTables across every schema. The engine then resolves the actual query against the wrong schema via search_path, while the dataset-match check passed against an unrelated row. Fix: under force_dataset_match=True, refuse tables whose qualified schema is None. Users must pick a schema in the SQL Lab schema picker or qualify the table explicitly. 2. sqlglot represents statements it cannot fully model as exp.Command. extract_tables_from_statement returns set() for an exp.Command with no parseable inner literal (e.g. CALL stored_proc() that builds dynamic SQL internally). The per-table check below then sees zero tables and the script proceeds, even though the procedure may read arbitrary tables at execution time. Fix: add SQLScript.has_unparseable_statement; under force_dataset_match=True, refuse any script that contains one. The default (chart-data / dataset CRUD / metadata previews) is unchanged. Two new regression tests cover both vectors. Existing tests for the non-strict path are untouched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + cover unparseable check - security/manager.py: apply default schema in the table branch of raise_for_access so MetaDB's 2-part URIs (db.table) keep working under force_dataset_match; the strict-deny still fires when no default can be resolved. - tests/integration_tests/security_tests.py: pin the new behavior. - tests/unit_tests/sql/parse_tests.py: cover has_unparseable_statement to reach 100% sql/ coverage (parameterized: SELECT, CALL, mixed, KQL). - tests/integration_tests/sqllab_tests.py: update test_sql_json_schema_access to reflect that schema_access alone no longer grants raw SQL Lab access (the legacy fallthrough this PR closes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…' into fix/sqllab-dataset-scoped-access
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SUMMARY
raise_for_accessinsuperset/security/manager.pypreviously had two related blind spots in its database/table/query branch:catalog_access/schema_accessas sufficient to authorise any table in that catalog/schema, even when no Superset dataset existed for the table. The chart-data branch right below requirescan_access_schema(datasource)on a registeredSqlaTable, so the SQL Lab path was strictly broader than the chart-data path for the same permission.superset/extensions/metadb.py) returns row data via thedatabase + tableform ofraise_for_accessand therefore inherited the sameschema_accessfallthrough.This PR introduces an explicit kwarg
force_dataset_match: bool = Falseonraise_for_access. When True, the historicalcatalog_access/schema_accessfallthroughs are skipped and every referenced table must resolve to a Superset dataset the user hasdatasource_accesson (or owns).The three call sites that actually execute or return raw row data pass
force_dataset_match=True:superset/sqllab/validators.py(SQL Lab execute pre-flight)superset/models/sql_lab.py::Query.raise_for_access(used by SQL Lab results, CSV export, and streaming export)superset/extensions/metadb.py(MetaDB autoload path)All other call sites keep the historical semantics unchanged. This preserves the documented
schema_accessbehaviour for chart-data, dataset CRUD from SQL,/table_metadata/,/select_star/, explore on saved queries, and the like.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A. Backend permission scoping, no UI surface.
TESTING INSTRUCTIONS
Three new tests under
force_dataset_match=True:test_raise_for_access_force_dataset_match_allows_dataset: a user withdatasource_accesson a registered SqlaTable can still query the table via the strict path.test_raise_for_access_force_dataset_match_denies_schema_only: schema_access alone no longer satisfies the strict path.test_raise_for_access_default_keeps_schema_access: the default (False) path still allows schema_access for non-dataset tables, preserving backwards compatibility for chart-data / dataset CRUD.ADDITIONAL INFORMATION
Backwards compatibility
Deployments that used
schema_accessas a deliberate "give Gamma full SQL Lab access to one schema" grant will start being denied at the SQL Lab execute path, results-fetch, CSV export, streaming export, and MetaDB autoload. Mitigation:database_access, ordatasource_accesscan be granted explicitly.All non-SQL-Lab call sites (chart-data, dataset CRUD, table_metadata, select_star, explore, viz, query_context) keep the historical behaviour and are unaffected.