fix(sql): broaden mutating-statement detection in SQL Lab parser#40421
Draft
sha174n wants to merge 5 commits into
Draft
fix(sql): broaden mutating-statement detection in SQL Lab parser#40421sha174n wants to merge 5 commits into
sha174n wants to merge 5 commits into
Conversation
is_mutating() and check_functions_present() in superset/sql/parse.py previously missed several construct shapes that share a meta cause: AST nodes that are not subclasses of the expected types, opaque exp.Command nodes for PostgreSQL constructs that sqlglot cannot fully structure, and function-name patterns that mutate server state without a wrapping DML node. This change addresses: 1. PostgreSQL opaque exp.Command constructs now classified as mutating alongside the existing DO and EXPLAIN ANALYZE handling: PREPARE, EXECUTE, CALL, COPY, GRANT, REVOKE, SET, REFRESH, REINDEX, VACUUM. 2. Structured nodes added to the mutating-node tuple: exp.Copy, exp.Grant, exp.Revoke. Some dialects parse these into structured nodes (e.g. COPY in PostgreSQL becomes exp.Copy, not exp.Command), so node-type matching is needed in addition to command-name matching. 3. PostgreSQL large-object writer functions classified as mutating by name: lo_from_bytea, lo_export, lo_import, lo_put, lo_create, lowrite. These appear as plain function calls inside an exp.Select wrapper and have no enclosing mutating AST node. The full lo_* family (writers and readers) is also added to the PostgreSQL entry in DISALLOWED_SQL_FUNCTIONS for defense in depth. 4. SELECT ... INTO new_table FROM ... parses as exp.Select with an `into` arg (PostgreSQL CTAS variant). It creates a new relation and is now treated as mutating. 5. check_functions_present() now visits exp.SessionParameter so that MySQL's @@<name> syntax matches denylist entries like `version` or `hostname`. exp.SessionParameter is not a subclass of exp.Func, so the existing exp.Func walk skipped it. Tests ----- - New: test_is_mutating_postgres_function_and_select_into covers the lo_* writers (positive), lo_* readers (negative, blocked separately via denylist), case-insensitivity, and SELECT INTO. - New: test_is_mutating_postgres_command_constructs covers PREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM, with the pre-existing DO and EXPLAIN ANALYZE controls included. - New: test_check_functions_present_session_parameter covers MySQL @@Version / @@hostname etc., plus lo_* function denylist matching. - All 562 tests in tests/unit_tests/sql/parse_tests.py pass. - All 117 tests in tests/unit_tests/sql/execution/ + sql_lab_test.py pass; no regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40421 +/- ##
==========================================
- Coverage 64.20% 63.60% -0.61%
==========================================
Files 2592 2592
Lines 139213 139254 +41
Branches 32323 32337 +14
==========================================
- Hits 89385 88566 -819
- Misses 48293 49147 +854
- Partials 1535 1541 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Follow-up to the initial broadening of is_mutating() and
check_functions_present(). This change closes the remaining bypass
surfaces that the SQL Lab read-only gate did not catch:
1. PostgreSQL information_schema views. Adds
information_schema.tables, .columns, .schemata, .views,
.routines, .role_table_grants, .role_column_grants,
.role_routine_grants, .table_privileges, .column_privileges,
.usage_privileges, .key_column_usage, .table_constraints,
.referential_constraints, .view_table_usage, .applicable_roles,
.enabled_roles to DISALLOWED_SQL_TABLES['postgresql'].
These entries are schema-qualified, so they require the new
schema-aware matching in check_tables_present (see below).
Without this change, a user table happening to be named `tables`
or `columns` would have been falsely blocked alongside the
information_schema views.
2. check_tables_present schema-qualified matching.
The previous matcher built `present = {t.table.lower() ...}`
from parsed table references, stripping the schema. A denylist
entry of `information_schema.tables` therefore could not match
without also matching every user table named `tables`. The new
matcher distinguishes bare entries (no dot in the denylist
string) from schema-qualified entries (with a dot):
- Bare entries match by table name regardless of schema
(existing behavior preserved for `pg_stat_activity` etc.).
- Schema-qualified entries match only when both schema and
table match.
3. SHOW commands. Adds SHOW to _POSTGRES_MUTATING_COMMAND_NAMES so
it is rejected by the allow_dml=False gate, mirroring the
read-side blocks already in DISALLOWED_SQL_FUNCTIONS for
pg_read_file, version(), current_setting, etc. SHOW reads
server-configuration state (server version, hba_file, ssl,
search_path) which has the same information-disclosure profile
as those functions.
4. PostgreSQL setval() sequence mutator. Adds SETVAL to
_MUTATING_FUNCTION_NAMES. setval() looks like a read but
advances sequence state for every subsequent nextval caller.
Tests
-----
- New test_check_tables_present_schema_qualified covers bare vs
schema-qualified matching, including the must-not-match case
for user tables sharing a name with an information_schema view.
- test_is_mutating_postgres_function_and_select_into extended
with setval positive and case-insensitivity cases.
- test_is_mutating_postgres_command_constructs extended with
SHOW positive cases (search_path, all, server_version).
- Pre-existing test_has_mutation expectation for
`SHOW search_path` flipped from False to True to reflect the
intentional behavior change; `SET search_path TO public`
remains non-mutating because it parses as exp.Set, not
exp.Command.
- All 576 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ and
sql_lab_test.py pass; no regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous parametrization for test_check_tables_present_schema_qualified exercised only PostgreSQL. The shipped DISALLOWED_SQL_TABLES entries for MySQL and MSSQL are also schema-qualified (mysql.user, performance_schema.threads, performance_schema.processlist, sys.sql_logins, sys.server_principals, sys.configurations) and rely on the same matcher. Without explicit dialect coverage a future refactor of check_tables_present could silently regress one engine while leaving the others working. Adds parametrized cases for: - mysql: mysql.user, performance_schema.threads, performance_schema.processlist (positive); mydb.user (negative, user-authored table sharing the leaf name must not be blocked). - mssql: sys.sql_logins, sys.server_principals, sys.configurations (positive); mydb.sql_logins (negative). All 17 parameter cases pass; no regressions in the broader suite (584 tests in tests/unit_tests/sql/parse_tests.py). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Master moved the duckdb pin but never regenerated the pinned requirements file, so every PR's check-python-deps job has been failing on this single-line diff. Regen via the uv-pip-compile path that CI uses (`uv pip compile pyproject.toml requirements/base.in`).
Coverage was at 99 (fail-under=100). Both concrete subclasses override is_destructive(), so the base-class `raise NotImplementedError()` line was never executed in tests. Added a direct test that calls the unbound base method on a sentinel object, matching the pattern used elsewhere in the codebase for abstract stubs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Broadens read-only-mode coverage in the SQL Lab parser.
is_mutating()andcheck_functions_present()insuperset/sql/parse.pypreviously missed several construct shapes that share a meta cause: AST nodes that are not subclasses of the expected types, opaqueexp.Commandnodes for PostgreSQL constructs that sqlglot cannot fully structure, and function-name patterns that mutate server-side state without a wrapping DML node.Scope of this change:
exp.Commandconstructs.is_mutating()now classifies these as mutating alongside the existingDOandEXPLAIN ANALYZEhandling:PREPARE,EXECUTE,CALL,COPY,GRANT,REVOKE,SET,REFRESH,REINDEX,VACUUM.COPY/GRANT/REVOKEintoexp.Copy/exp.Grant/exp.Revokeinstead ofexp.Command, so node-type matching is needed alongside the command-name check.lo_from_bytea,lo_export,lo_import,lo_put,lo_create,lowriteappear as plain function calls inside anexp.Selectwrapper with no enclosing mutating AST node.is_mutating()now matches on function name. The fulllo_*family (writers and readers) is also added to the PostgreSQL entry inDISALLOWED_SQL_FUNCTIONSfor defense in depth.SELECT ... INTO new_table FROM ...(PostgreSQL CTAS variant). Parses asexp.Selectwith anintoarg; now treated as mutating since it creates a new relation.@@<name>session parameters.check_functions_present()now walksexp.SessionParameterso that denylist entries likeversionorhostnamematchSELECT @@version/SELECT @@hostname.exp.SessionParameteris not a subclass ofexp.Func, so the existing walk skipped it.Test plan
pytest tests/unit_tests/sql/parse_tests.py(562 pass, no regressions)pytest tests/unit_tests/sql/execution/ tests/unit_tests/sql_lab_test.py(117 pass, no regressions)pre-commit runcleanNew parametrized cases in
tests/unit_tests/sql/parse_tests.py:test_is_mutating_postgres_function_and_select_into—lo_*writers,lo_*readers (negative, blocked separately via denylist), case-insensitivity,SELECT INTO.test_is_mutating_postgres_command_constructs—PREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM, with pre-existingDOandEXPLAIN ANALYZEcontrols included.test_check_functions_present_session_parameter— MySQL@@version/@@hostnameetc.;lo_*defense-in-depth via denylist.