fix(charts): apply DISALLOWED_SQL_FUNCTIONS gate to adhoc expressions#40568
Conversation
…ion validation Adhoc column / metric expressions sent in a chart-data request (the `sqlExpression` attribute) are user-controlled SQL that ends up inside a `literal_column(...)` call. Today's `_process_sql_expression` runs `validate_adhoc_subquery` (rejects sub-queries when ALLOW_ADHOC_SUBQUERY is off) and `sanitize_clause` (rejects multi- statement / set-operation shapes), then returns. Neither step consults the operator's DISALLOWED_SQL_FUNCTIONS / DISALLOWED_SQL_TABLES configuration, so a user could submit an adhoc expression like `version()`, `pg_read_file(...)`, or `query_to_xml(...)` and have the function executed when the chart renders. This change applies the same engine-keyed denylists immediately after `sanitize_clause` in `_process_sql_expression`. The check fires at validation time (before the final SQL is rendered) and complements the equivalent gate at query execution time, giving the adhoc- expression path defense in depth. The block detects whether the caller already pre-wrapped the expression in `SELECT ...` (which `_process_select_expression` and `_process_orderby_expression` both do) and avoids the double-SELECT parse failure. Tests added in `tests/unit_tests/models/helpers_test.py`: - a denylisted function passed directly (`version()`) raises `SupersetDisallowedSQLFunctionException`. - a denylisted function wrapped in an aggregate (`MAX(version())`) is still rejected, since the wrapper does not hide the inner call. - a benign aggregate over a regular column (`SUM(amount)`) passes through the gate unchanged. - when both denylists are empty for the engine, the extra parse step is skipped and any `sanitize_clause`-accepted SQL is allowed. All 60 tests in the file pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40568 +/- ##
==========================================
- Coverage 64.08% 64.06% -0.02%
==========================================
Files 2662 2662
Lines 143260 143264 +4
Branches 32943 32944 +1
==========================================
- Hits 91806 91781 -25
- Misses 49867 49894 +27
- Partials 1587 1589 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Mirror the canonical pattern in superset/sql_lab.py:419-427 so the adhoc-expression validation gate reports only the tables actually found in the expression, not the operator's full denylist. Add a test that exercises the table-denylist branch with ALLOW_ADHOC_SUBQUERY enabled and asserts the narrowed payload.
Code Review Agent Run #caa6c9Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
rusackas
left a comment
There was a problem hiding this comment.
LGTM. Mirrors the canonical _validate_query logic; the SELECT double-wrap heuristic is safe (bare select_count-style identifiers and real subqueries are handled correctly), it catches the MAX(version()) wrapper-bypass, and it skips when the denylists are empty.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| if disallowed_functions and parsed.check_functions_present( | ||
| disallowed_functions | ||
| ): | ||
| raise SupersetDisallowedSQLFunctionException(disallowed_functions) | ||
| if disallowed_tables and parsed.check_tables_present(disallowed_tables): | ||
| # Report only the tables actually found in the expression, | ||
| # mirroring the canonical execution-time gate in | ||
| # `superset.sql_lab._validate_query` so the user-facing | ||
| # error doesn't echo the operator's full denylist. | ||
| present_tables = { | ||
| table.table.lower() | ||
| for statement in parsed.statements | ||
| for table in statement.tables | ||
| } | ||
| found_tables = { | ||
| table | ||
| for table in disallowed_tables | ||
| if table.lower() in present_tables | ||
| } | ||
| raise SupersetDisallowedSQLTableException( | ||
| found_tables or disallowed_tables | ||
| ) |
There was a problem hiding this comment.
Suggestion: Raising SupersetDisallowedSQLFunctionException/SupersetDisallowedSQLTableException directly from adhoc-expression validation changes this path from a query-object validation failure into a propagated SupersetErrorException with default HTTP status 500. In chart-data flows, validation errors are expected to be converted to QueryObjectValidationError (400) and handled as user input issues; this now surfaces as server errors instead of proper client validation responses. Wrap these denylist failures into QueryObjectValidationError (or use exception classes with a 4xx status) to preserve the existing API contract. [api mismatch]
Severity Level: Major ⚠️
- ❌ POST /api/v1/chart/data returns 500 for denylisted SQL.
- ⚠️ Dashboards with such adhoc metrics show server error pages.
- ⚠️ Breaks contract expecting 400 for invalid chart queries.Steps of Reproduction ✅
1. Configure engine-specific denylists in `app.config` (e.g. in SUPERSET_CONFIG) so
`DISALLOWED_SQL_FUNCTIONS['postgresql'] = {'version'}` or
`DISALLOWED_SQL_TABLES['postgresql']` is non-empty, which `_process_sql_expression` reads
at `superset/models/helpers.py:56-59`.
2. Issue a chart-data request to `POST /api/v1/chart/data` handled by
`ChartDataRestApi.data` in `superset/charts/data/api.py:34-123`, with a query context
whose `form_data` contains an adhoc metric or column of `expressionType='SQL'` and
`"sqlExpression": "version()"` (or a query referencing a denylisted table) on a PostgreSQL
dataset.
3. The request is deserialized into a `QueryContext` by `_create_query_context_from_form`
(`superset/charts/data/api.py:134-147`), wrapped in `ChartDataCommand`
(`superset/commands/chart/data/get_data_command.py:34-39`), and executed via
`command.run()` inside `_get_data_response` (`superset/charts/data/api.py:65-83`), which
calls `QueryContext.get_payload()` -> `QueryContextProcessor.get_payload()` and ultimately
`QueryContextProcessor.get_query_result()`
(`superset/common/query_context_processor.py:248-256`) to fetch data from the datasource.
4. As the datasource builds SQL for the adhoc metric/column it calls
`_process_select_expression` in `superset/models/helpers.py:1244-1276`, which delegates to
`_process_sql_expression` at `superset/models/helpers.py:26-93`; there, when
`parsed.check_functions_present(disallowed_functions)` or
`parsed.check_tables_present(disallowed_tables)` is true, it raises
`SupersetDisallowedSQLFunctionException` or `SupersetDisallowedSQLTableException` directly
at lines 1223 and 1239–1241. These subclasses of `SupersetErrorException` do not override
`status` and therefore default to HTTP 500 (see `SupersetException.status = 500` and the
definitions of these classes in `superset/exceptions.py:29-37` and
`superset/exceptions.py:9-36`). Since `ChartDataRestApi.data` only catches
`QueryObjectValidationError` and `ValidationError` (`superset/charts/data/api.py:85-98`),
the denylist exceptions escape as uncaught server errors instead of being converted into a
400-level `QueryObjectValidationError`, changing the API contract for invalid user SQL in
chart-data flows.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/models/helpers.py
**Line:** 1220:1241
**Comment:**
*Api Mismatch: Raising `SupersetDisallowedSQLFunctionException`/`SupersetDisallowedSQLTableException` directly from adhoc-expression validation changes this path from a query-object validation failure into a propagated `SupersetErrorException` with default HTTP status 500. In chart-data flows, validation errors are expected to be converted to `QueryObjectValidationError` (400) and handled as user input issues; this now surfaces as server errors instead of proper client validation responses. Wrap these denylist failures into `QueryObjectValidationError` (or use exception classes with a 4xx status) to preserve the existing API contract.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
The flagged issue is correct. Raising To resolve this, you can catch these exceptions in if disallowed_functions and parsed.check_functions_present(
disallowed_functions
):
raise QueryObjectValidationError(f"Disallowed functions: {disallowed_functions}")
if disallowed_tables and parsed.check_tables_present(disallowed_tables):
# ... (table logic) ...
raise QueryObjectValidationError(f"Disallowed tables: {found_tables or disallowed_tables}")There are no other comments in this PR to address. superset/models/helpers.py |
Code Review Agent Run #12e386Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Summary
Adhoc column and metric expressions submitted in a chart-data request (the
sqlExpressionattribute) are user-controlled SQL that ends up inside aliteral_column(...)call. Today's_process_sql_expressioninsuperset/models/helpers.pyrunsvalidate_adhoc_subquery(which rejects sub-queries whenALLOW_ADHOC_SUBQUERYis off) andsanitize_clause(which rejects multi-statement / set-operation shapes), then returns. Neither step consults the operator's engine-keyedDISALLOWED_SQL_FUNCTIONS/DISALLOWED_SQL_TABLESconfiguration.That means a user able to submit an adhoc expression can call functions the operator has explicitly denylisted (
version(),pg_read_file(...),query_to_xml(...), etc.) and have them executed at chart-render time.This change applies the same denylists immediately after
sanitize_clausein_process_sql_expression. The gate fires at validation time, before the final SQL is rendered, and complements the equivalent gate at query execution.The block detects whether the caller pre-wrapped the expression with
SELECT ...(which_process_select_expressionand_process_orderby_expressionboth do) and avoids the double-SELECTparse failure.Tests
Added 4 cases in
tests/unit_tests/models/helpers_test.py:version()) raisesSupersetDisallowedSQLFunctionExceptionMAX(version())) is still rejectedSUM(amount)) passes through unchangedsanitize_clause-accepted SQL is allowedAll 60 tests in the file pass:
Test plan