Skip to content

fix(sql): broaden mutating-statement detection in SQL Lab parser#40421

Draft
sha174n wants to merge 5 commits into
apache:masterfrom
sha174n:fix/sql-coverage-20260525
Draft

fix(sql): broaden mutating-statement detection in SQL Lab parser#40421
sha174n wants to merge 5 commits into
apache:masterfrom
sha174n:fix/sql-coverage-20260525

Conversation

@sha174n
Copy link
Copy Markdown
Contributor

@sha174n sha174n commented May 25, 2026

Summary

Broadens read-only-mode coverage in the SQL Lab parser. is_mutating() and check_functions_present() in superset/sql/parse.py previously missed several construct shapes that share a meta cause: AST nodes that are not subclasses of the expected types, opaque exp.Command nodes for PostgreSQL constructs that sqlglot cannot fully structure, and function-name patterns that mutate server-side state without a wrapping DML node.

Scope of this change:

  1. PostgreSQL opaque exp.Command constructs. is_mutating() now classifies these as mutating alongside the existing DO and EXPLAIN ANALYZE handling: PREPARE, EXECUTE, CALL, COPY, GRANT, REVOKE, SET, REFRESH, REINDEX, VACUUM.
  2. Structured nodes added to the mutating-node tuple. Some dialects parse COPY / GRANT / REVOKE into exp.Copy / exp.Grant / exp.Revoke instead of exp.Command, so node-type matching is needed alongside the command-name check.
  3. PostgreSQL large-object writers classified as mutating by function name. lo_from_bytea, lo_export, lo_import, lo_put, lo_create, lowrite appear as plain function calls inside an exp.Select wrapper with no enclosing mutating AST node. is_mutating() now matches on function name. The full lo_* family (writers and readers) is also added to the PostgreSQL entry in DISALLOWED_SQL_FUNCTIONS for defense in depth.
  4. SELECT ... INTO new_table FROM ... (PostgreSQL CTAS variant). Parses as exp.Select with an into arg; now treated as mutating since it creates a new relation.
  5. MySQL @@<name> session parameters. check_functions_present() now walks exp.SessionParameter so that denylist entries like version or hostname match SELECT @@version / SELECT @@hostname. exp.SessionParameter is not a subclass of exp.Func, so the existing walk skipped it.

Test plan

  • pytest tests/unit_tests/sql/parse_tests.py (562 pass, no regressions)
  • pytest tests/unit_tests/sql/execution/ tests/unit_tests/sql_lab_test.py (117 pass, no regressions)
  • pre-commit run clean

New parametrized cases in tests/unit_tests/sql/parse_tests.py:

  • test_is_mutating_postgres_function_and_select_intolo_* writers, lo_* readers (negative, blocked separately via denylist), case-insensitivity, SELECT INTO.
  • test_is_mutating_postgres_command_constructsPREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM, with pre-existing DO and EXPLAIN ANALYZE controls included.
  • test_check_functions_present_session_parameter — MySQL @@version/@@hostname etc.; lo_* defense-in-depth via denylist.

is_mutating() and check_functions_present() in superset/sql/parse.py
previously missed several construct shapes that share a meta cause:
AST nodes that are not subclasses of the expected types, opaque
exp.Command nodes for PostgreSQL constructs that sqlglot cannot fully
structure, and function-name patterns that mutate server state without
a wrapping DML node.

This change addresses:

1. PostgreSQL opaque exp.Command constructs now classified as mutating
   alongside the existing DO and EXPLAIN ANALYZE handling: PREPARE,
   EXECUTE, CALL, COPY, GRANT, REVOKE, SET, REFRESH, REINDEX, VACUUM.

2. Structured nodes added to the mutating-node tuple: exp.Copy,
   exp.Grant, exp.Revoke. Some dialects parse these into structured
   nodes (e.g. COPY in PostgreSQL becomes exp.Copy, not exp.Command),
   so node-type matching is needed in addition to command-name
   matching.

3. PostgreSQL large-object writer functions classified as mutating by
   name: lo_from_bytea, lo_export, lo_import, lo_put, lo_create,
   lowrite. These appear as plain function calls inside an exp.Select
   wrapper and have no enclosing mutating AST node. The full lo_*
   family (writers and readers) is also added to the PostgreSQL entry
   in DISALLOWED_SQL_FUNCTIONS for defense in depth.

4. SELECT ... INTO new_table FROM ... parses as exp.Select with an
   `into` arg (PostgreSQL CTAS variant). It creates a new relation
   and is now treated as mutating.

5. check_functions_present() now visits exp.SessionParameter so that
   MySQL's @@<name> syntax matches denylist entries like `version`
   or `hostname`. exp.SessionParameter is not a subclass of exp.Func,
   so the existing exp.Func walk skipped it.

Tests
-----
- New: test_is_mutating_postgres_function_and_select_into covers the
  lo_* writers (positive), lo_* readers (negative, blocked separately
  via denylist), case-insensitivity, and SELECT INTO.
- New: test_is_mutating_postgres_command_constructs covers
  PREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM,
  with the pre-existing DO and EXPLAIN ANALYZE controls included.
- New: test_check_functions_present_session_parameter covers MySQL
  @@Version / @@hostname etc., plus lo_* function denylist matching.
- All 562 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ + sql_lab_test.py
  pass; no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 55.55556% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.60%. Comparing base (b64561f) to head (ef69fcd).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
superset/sql/parse.py 55.55% 6 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40421      +/-   ##
==========================================
- Coverage   64.20%   63.60%   -0.61%     
==========================================
  Files        2592     2592              
  Lines      139213   139254      +41     
  Branches    32323    32337      +14     
==========================================
- Hits        89385    88566     -819     
- Misses      48293    49147     +854     
- Partials     1535     1541       +6     
Flag Coverage Δ
hive 39.22% <7.40%> (-0.03%) ⬇️
mysql 58.75% <51.85%> (-0.02%) ⬇️
postgres 58.83% <55.55%> (-0.02%) ⬇️
presto 40.91% <7.40%> (-0.03%) ⬇️
python 59.07% <55.55%> (-1.34%) ⬇️
sqlite 58.46% <14.81%> (-0.04%) ⬇️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sha174n and others added 4 commits May 26, 2026 17:00
Follow-up to the initial broadening of is_mutating() and
check_functions_present(). This change closes the remaining bypass
surfaces that the SQL Lab read-only gate did not catch:

1. PostgreSQL information_schema views. Adds
   information_schema.tables, .columns, .schemata, .views,
   .routines, .role_table_grants, .role_column_grants,
   .role_routine_grants, .table_privileges, .column_privileges,
   .usage_privileges, .key_column_usage, .table_constraints,
   .referential_constraints, .view_table_usage, .applicable_roles,
   .enabled_roles to DISALLOWED_SQL_TABLES['postgresql'].

   These entries are schema-qualified, so they require the new
   schema-aware matching in check_tables_present (see below).
   Without this change, a user table happening to be named `tables`
   or `columns` would have been falsely blocked alongside the
   information_schema views.

2. check_tables_present schema-qualified matching.

   The previous matcher built `present = {t.table.lower() ...}`
   from parsed table references, stripping the schema. A denylist
   entry of `information_schema.tables` therefore could not match
   without also matching every user table named `tables`. The new
   matcher distinguishes bare entries (no dot in the denylist
   string) from schema-qualified entries (with a dot):

     - Bare entries match by table name regardless of schema
       (existing behavior preserved for `pg_stat_activity` etc.).
     - Schema-qualified entries match only when both schema and
       table match.

3. SHOW commands. Adds SHOW to _POSTGRES_MUTATING_COMMAND_NAMES so
   it is rejected by the allow_dml=False gate, mirroring the
   read-side blocks already in DISALLOWED_SQL_FUNCTIONS for
   pg_read_file, version(), current_setting, etc. SHOW reads
   server-configuration state (server version, hba_file, ssl,
   search_path) which has the same information-disclosure profile
   as those functions.

4. PostgreSQL setval() sequence mutator. Adds SETVAL to
   _MUTATING_FUNCTION_NAMES. setval() looks like a read but
   advances sequence state for every subsequent nextval caller.

Tests
-----
- New test_check_tables_present_schema_qualified covers bare vs
  schema-qualified matching, including the must-not-match case
  for user tables sharing a name with an information_schema view.
- test_is_mutating_postgres_function_and_select_into extended
  with setval positive and case-insensitivity cases.
- test_is_mutating_postgres_command_constructs extended with
  SHOW positive cases (search_path, all, server_version).
- Pre-existing test_has_mutation expectation for
  `SHOW search_path` flipped from False to True to reflect the
  intentional behavior change; `SET search_path TO public`
  remains non-mutating because it parses as exp.Set, not
  exp.Command.
- All 576 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ and
  sql_lab_test.py pass; no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous parametrization for test_check_tables_present_schema_qualified
exercised only PostgreSQL. The shipped DISALLOWED_SQL_TABLES entries
for MySQL and MSSQL are also schema-qualified (mysql.user,
performance_schema.threads, performance_schema.processlist,
sys.sql_logins, sys.server_principals, sys.configurations) and rely
on the same matcher. Without explicit dialect coverage a future
refactor of check_tables_present could silently regress one engine
while leaving the others working.

Adds parametrized cases for:

- mysql: mysql.user, performance_schema.threads,
  performance_schema.processlist (positive); mydb.user (negative,
  user-authored table sharing the leaf name must not be blocked).
- mssql: sys.sql_logins, sys.server_principals, sys.configurations
  (positive); mydb.sql_logins (negative).

All 17 parameter cases pass; no regressions in the broader suite
(584 tests in tests/unit_tests/sql/parse_tests.py).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Master moved the duckdb pin but never regenerated the pinned
requirements file, so every PR's check-python-deps job has been
failing on this single-line diff. Regen via the uv-pip-compile path
that CI uses (`uv pip compile pyproject.toml requirements/base.in`).
Coverage was at 99 (fail-under=100). Both concrete subclasses override
is_destructive(), so the base-class `raise NotImplementedError()` line
was never executed in tests. Added a direct test that calls the unbound
base method on a sentinel object, matching the pattern used elsewhere
in the codebase for abstract stubs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant