fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#40499
fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#40499sha174n wants to merge 13 commits into
Conversation
Adds a configurable upper bound on the size of SQL scripts accepted by the SQL parser. Scripts longer than SQL_MAX_PARSE_LENGTH (default 1,000,000 characters) are rejected before being passed to sqlglot. The check sits in SQLStatement._parse, so it applies to every code path that goes through SQLScript, including SQL Lab execute, format, RLS rewriting, dataset SQL, and database engine spec helpers. Set SQL_MAX_PARSE_LENGTH to None to disable.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40499 +/- ##
==========================================
- Coverage 63.97% 63.96% -0.02%
==========================================
Files 2654 2649 -5
Lines 142753 142431 -322
Branches 32833 32740 -93
==========================================
- Hits 91325 91099 -226
+ Misses 49870 49771 -99
- Partials 1558 1561 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Follow-up on the parse-length gate. Three blocking gaps: 1. The gate at the top of SQLStatement._parse missed three other call sites in the same module that hand strings directly to sqlglot.parse_one: SQLStatement.parse_predicate, the extract_tables_from_statement helper that builds a pseudo SELECT from an exp.Command literal, and the standalone transpile_to_dialect entry point. Any of these could be reached without going through SQLStatement._parse, so the previous single-site check was bypassable. Pulled the check out of SQLStatement and into a module-level helper, then called it from all four sqlglot.parse/parse_one sites so the bound cannot be bypassed by a direct caller. 2. The cap was in Unicode code points, not bytes. A 1M-codepoint string of four-byte characters is up to 4MB of payload that the parser still has to ingest. Switched to UTF-8 byte length so the bound directly reflects parser memory and CPU exposure. 3. The "current_app.config.get + except RuntimeError" pattern is not the codebase idiom for "config-with-fallback-outside-app". Replaced with `has_app_context()`, which matches the pattern already used in sql_lab.py, models/core.py, and others. Tests added in tests/unit_tests/sql/parse_tests.py: - accept exactly at the cap (boundary) - reject one byte over the cap - reject when codepoint count is under the cap but byte count is over - SQL_MAX_PARSE_LENGTH=None disables the gate - app-config value overrides the module fallback - SQLScript short-circuits sqlglot.parse on over-cap input (spy asserts zero calls, covers the MySQL-backtick double-parse path) - SQLStatement.parse_predicate is gated - transpile_to_dialect is gated Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code Review Agent Run #3c3c61Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
The provided |
… contract Move the length gate inside the try/except wrappers in transpile_to_dialect and extract_tables_from_statement so the new SupersetParseError no longer escapes paths that previously normalised parse failures into QueryClauseValidationException (or swallowed them with `return set()`). Callers like transpile_virtual_dataset_sql only catch the former and would otherwise lose their graceful fallback on over-cap input.
Code Review Agent Run #0991ffActionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
Adds a configurable upper bound on the size of SQL scripts accepted by the SQL parser. Scripts whose UTF-8 byte length exceeds
SQL_MAX_PARSE_LENGTH(default 1,000,000 bytes) are rejected before being passed to sqlglot, which bounds parser memory and CPU usage. SetSQL_MAX_PARSE_LENGTH = Noneto disable.The check sits at every call site in
superset/sql/parse.pythat hands a string tosqlglot.parseorsqlglot.parse_one:SQLStatement._parse, which covers SQL Lab execute, format, RLS rewriting, dataset SQL, and engine spec helpers (including the MySQL-backtick fallback inside the same function)SQLStatement.parse_predicateextract_tables_from_statementhelper that builds a pseudo SELECT from a SQLCommandliteraltranspile_to_dialectentry point used by chart query renderingPutting one helper at all four sites means a direct caller cannot bypass the bound. The cap is in UTF-8 bytes rather than code points, so multi-byte payloads cannot exceed the intended memory cap.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A. Backend-only change.
TESTING INSTRUCTIONS
SupersetParseErrorbefore reaching sqlglot. Verified with amocker.spy(sqlglot, "parse")that asserts zero parse calls for over-cap input.SQL_MAX_PARSE_LENGTH = Noneinsuperset_config.pydisables the gate.New regression tests in
tests/unit_tests/sql/parse_tests.py:SQL_MAX_PARSE_LENGTH = Nonedisables the gatesqlglot.parseon over-cap input (covers the MySQL-backtick double-parse path)SQLStatement.parse_predicateis gatedtranspile_to_dialectis gatedRun with:
ADDITIONAL INFORMATION