Skip to content

fix(charts): apply DISALLOWED_SQL_FUNCTIONS gate to adhoc expressions#40568

Merged
sha174n merged 9 commits into
apache:masterfrom
sha174n:fix/sqlexpression-deny-list
Jun 3, 2026
Merged

fix(charts): apply DISALLOWED_SQL_FUNCTIONS gate to adhoc expressions#40568
sha174n merged 9 commits into
apache:masterfrom
sha174n:fix/sqlexpression-deny-list

Conversation

@sha174n
Copy link
Copy Markdown
Contributor

@sha174n sha174n commented May 31, 2026

Summary

Adhoc column and metric expressions submitted in a chart-data request (the sqlExpression attribute) are user-controlled SQL that ends up inside a literal_column(...) call. Today's _process_sql_expression in superset/models/helpers.py runs validate_adhoc_subquery (which rejects sub-queries when ALLOW_ADHOC_SUBQUERY is off) and sanitize_clause (which rejects multi-statement / set-operation shapes), then returns. Neither step consults the operator's engine-keyed DISALLOWED_SQL_FUNCTIONS / DISALLOWED_SQL_TABLES configuration.

That means a user able to submit an adhoc expression can call functions the operator has explicitly denylisted (version(), pg_read_file(...), query_to_xml(...), etc.) and have them executed at chart-render time.

This change applies the same denylists immediately after sanitize_clause in _process_sql_expression. The gate fires at validation time, before the final SQL is rendered, and complements the equivalent gate at query execution.

The block detects whether the caller pre-wrapped the expression with SELECT ... (which _process_select_expression and _process_orderby_expression both do) and avoids the double-SELECT parse failure.

Tests

Added 4 cases in tests/unit_tests/models/helpers_test.py:

  • a denylisted function passed directly (version()) raises SupersetDisallowedSQLFunctionException
  • a denylisted function wrapped in an aggregate (MAX(version())) is still rejected
  • a benign aggregate over a regular column (SUM(amount)) passes through unchanged
  • when both denylists are empty for the engine, the extra parse step is skipped and any sanitize_clause-accepted SQL is allowed

All 60 tests in the file pass:

$ pytest tests/unit_tests/models/helpers_test.py -q
............................................................   [100%]
60 passed in 1.88s

Test plan

  • Maintainer review
  • CI green (pytest, mypy, pre-commit)

…ion validation

Adhoc column / metric expressions sent in a chart-data request (the
`sqlExpression` attribute) are user-controlled SQL that ends up
inside a `literal_column(...)` call. Today's `_process_sql_expression`
runs `validate_adhoc_subquery` (rejects sub-queries when
ALLOW_ADHOC_SUBQUERY is off) and `sanitize_clause` (rejects multi-
statement / set-operation shapes), then returns. Neither step consults
the operator's DISALLOWED_SQL_FUNCTIONS / DISALLOWED_SQL_TABLES
configuration, so a user could submit an adhoc expression like
`version()`, `pg_read_file(...)`, or `query_to_xml(...)` and have the
function executed when the chart renders.

This change applies the same engine-keyed denylists immediately after
`sanitize_clause` in `_process_sql_expression`. The check fires at
validation time (before the final SQL is rendered) and complements
the equivalent gate at query execution time, giving the adhoc-
expression path defense in depth.

The block detects whether the caller already pre-wrapped the
expression in `SELECT ...` (which `_process_select_expression` and
`_process_orderby_expression` both do) and avoids the double-SELECT
parse failure.

Tests added in `tests/unit_tests/models/helpers_test.py`:

- a denylisted function passed directly (`version()`) raises
  `SupersetDisallowedSQLFunctionException`.
- a denylisted function wrapped in an aggregate (`MAX(version())`)
  is still rejected, since the wrapper does not hide the inner call.
- a benign aggregate over a regular column (`SUM(amount)`) passes
  through the gate unchanged.
- when both denylists are empty for the engine, the extra parse
  step is skipped and any `sanitize_clause`-accepted SQL is allowed.

All 60 tests in the file pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

❌ Patch coverage is 36.36364% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.06%. Comparing base (c54990c) to head (3318092).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
superset/models/helpers.py 36.36% 4 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40568      +/-   ##
==========================================
- Coverage   64.08%   64.06%   -0.02%     
==========================================
  Files        2662     2662              
  Lines      143260   143264       +4     
  Branches    32943    32944       +1     
==========================================
- Hits        91806    91781      -25     
- Misses      49867    49894      +27     
- Partials     1587     1589       +2     
Flag Coverage Δ
hive 39.79% <0.00%> (-0.01%) ⬇️
mysql 58.48% <36.36%> (+<0.01%) ⬆️
postgres 58.56% <36.36%> (+<0.01%) ⬆️
presto 41.39% <0.00%> (-0.01%) ⬇️
python 60.04% <36.36%> (+<0.01%) ⬆️
sqlite 58.18% <36.36%> (+<0.01%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sha174n added 6 commits May 31, 2026 18:22
Mirror the canonical pattern in superset/sql_lab.py:419-427 so the
adhoc-expression validation gate reports only the tables actually
found in the expression, not the operator's full denylist. Add a test
that exercises the table-denylist branch with ALLOW_ADHOC_SUBQUERY
enabled and asserts the narrowed payload.
@sha174n sha174n marked this pull request as ready for review June 2, 2026 20:48
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jun 2, 2026

Code Review Agent Run #caa6c9

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 2fcafc6..6ae5dd9
    • superset/models/helpers.py
    • tests/unit_tests/models/helpers_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Copy link
Copy Markdown
Member

@rusackas rusackas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Mirrors the canonical _validate_query logic; the SELECT double-wrap heuristic is safe (bare select_count-style identifiers and real subqueries are handled correctly), it catches the MAX(version()) wrapper-bypass, and it skips when the denylists are empty.

Comment on lines +1220 to +1241
if disallowed_functions and parsed.check_functions_present(
disallowed_functions
):
raise SupersetDisallowedSQLFunctionException(disallowed_functions)
if disallowed_tables and parsed.check_tables_present(disallowed_tables):
# Report only the tables actually found in the expression,
# mirroring the canonical execution-time gate in
# `superset.sql_lab._validate_query` so the user-facing
# error doesn't echo the operator's full denylist.
present_tables = {
table.table.lower()
for statement in parsed.statements
for table in statement.tables
}
found_tables = {
table
for table in disallowed_tables
if table.lower() in present_tables
}
raise SupersetDisallowedSQLTableException(
found_tables or disallowed_tables
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Raising SupersetDisallowedSQLFunctionException/SupersetDisallowedSQLTableException directly from adhoc-expression validation changes this path from a query-object validation failure into a propagated SupersetErrorException with default HTTP status 500. In chart-data flows, validation errors are expected to be converted to QueryObjectValidationError (400) and handled as user input issues; this now surfaces as server errors instead of proper client validation responses. Wrap these denylist failures into QueryObjectValidationError (or use exception classes with a 4xx status) to preserve the existing API contract. [api mismatch]

Severity Level: Major ⚠️
- ❌ POST /api/v1/chart/data returns 500 for denylisted SQL.
- ⚠️ Dashboards with such adhoc metrics show server error pages.
- ⚠️ Breaks contract expecting 400 for invalid chart queries.
Steps of Reproduction ✅
1. Configure engine-specific denylists in `app.config` (e.g. in SUPERSET_CONFIG) so
`DISALLOWED_SQL_FUNCTIONS['postgresql'] = {'version'}` or
`DISALLOWED_SQL_TABLES['postgresql']` is non-empty, which `_process_sql_expression` reads
at `superset/models/helpers.py:56-59`.

2. Issue a chart-data request to `POST /api/v1/chart/data` handled by
`ChartDataRestApi.data` in `superset/charts/data/api.py:34-123`, with a query context
whose `form_data` contains an adhoc metric or column of `expressionType='SQL'` and
`"sqlExpression": "version()"` (or a query referencing a denylisted table) on a PostgreSQL
dataset.

3. The request is deserialized into a `QueryContext` by `_create_query_context_from_form`
(`superset/charts/data/api.py:134-147`), wrapped in `ChartDataCommand`
(`superset/commands/chart/data/get_data_command.py:34-39`), and executed via
`command.run()` inside `_get_data_response` (`superset/charts/data/api.py:65-83`), which
calls `QueryContext.get_payload()` -> `QueryContextProcessor.get_payload()` and ultimately
`QueryContextProcessor.get_query_result()`
(`superset/common/query_context_processor.py:248-256`) to fetch data from the datasource.

4. As the datasource builds SQL for the adhoc metric/column it calls
`_process_select_expression` in `superset/models/helpers.py:1244-1276`, which delegates to
`_process_sql_expression` at `superset/models/helpers.py:26-93`; there, when
`parsed.check_functions_present(disallowed_functions)` or
`parsed.check_tables_present(disallowed_tables)` is true, it raises
`SupersetDisallowedSQLFunctionException` or `SupersetDisallowedSQLTableException` directly
at lines 1223 and 1239–1241. These subclasses of `SupersetErrorException` do not override
`status` and therefore default to HTTP 500 (see `SupersetException.status = 500` and the
definitions of these classes in `superset/exceptions.py:29-37` and
`superset/exceptions.py:9-36`). Since `ChartDataRestApi.data` only catches
`QueryObjectValidationError` and `ValidationError` (`superset/charts/data/api.py:85-98`),
the denylist exceptions escape as uncaught server errors instead of being converted into a
400-level `QueryObjectValidationError`, changing the API contract for invalid user SQL in
chart-data flows.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/models/helpers.py
**Line:** 1220:1241
**Comment:**
	*Api Mismatch: Raising `SupersetDisallowedSQLFunctionException`/`SupersetDisallowedSQLTableException` directly from adhoc-expression validation changes this path from a query-object validation failure into a propagated `SupersetErrorException` with default HTTP status 500. In chart-data flows, validation errors are expected to be converted to `QueryObjectValidationError` (400) and handled as user input issues; this now surfaces as server errors instead of proper client validation responses. Wrap these denylist failures into `QueryObjectValidationError` (or use exception classes with a 4xx status) to preserve the existing API contract.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@bito-code-review
Copy link
Copy Markdown
Contributor

The flagged issue is correct. Raising SupersetDisallowedSQLFunctionException or SupersetDisallowedSQLTableException directly causes an HTTP 500 error because these exceptions do not specify a 4xx status code. To preserve the API contract, these should be wrapped in QueryObjectValidationError (which maps to 400) or a similar exception that triggers a client-side error response.

To resolve this, you can catch these exceptions in _process_sql_expression and re-raise them as QueryObjectValidationError:

                if disallowed_functions and parsed.check_functions_present(
                    disallowed_functions
                ):
                    raise QueryObjectValidationError(f"Disallowed functions: {disallowed_functions}")
                if disallowed_tables and parsed.check_tables_present(disallowed_tables):
                    # ... (table logic) ...
                    raise QueryObjectValidationError(f"Disallowed tables: {found_tables or disallowed_tables}")

There are no other comments in this PR to address.

superset/models/helpers.py

if disallowed_functions and parsed.check_functions_present(
                    disallowed_functions
                ):
                    raise QueryObjectValidationError(f"Disallowed functions: {disallowed_functions}")
                if disallowed_tables and parsed.check_tables_present(disallowed_tables):
                    # ... (table logic) ...
                    raise QueryObjectValidationError(f"Disallowed tables: {found_tables or disallowed_tables}")

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jun 3, 2026

Code Review Agent Run #12e386

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 6ae5dd9..3318092
    • superset/models/helpers.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@sha174n sha174n merged commit 8e4a460 into apache:master Jun 3, 2026
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants