Skip to content

fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#40499

Open
sha174n wants to merge 13 commits into
apache:masterfrom
sha174n:fix/improve-validation-1779971563
Open

fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#40499
sha174n wants to merge 13 commits into
apache:masterfrom
sha174n:fix/improve-validation-1779971563

Conversation

@sha174n
Copy link
Copy Markdown
Contributor

@sha174n sha174n commented May 28, 2026

SUMMARY

Adds a configurable upper bound on the size of SQL scripts accepted by the SQL parser. Scripts whose UTF-8 byte length exceeds SQL_MAX_PARSE_LENGTH (default 1,000,000 bytes) are rejected before being passed to sqlglot, which bounds parser memory and CPU usage. Set SQL_MAX_PARSE_LENGTH = None to disable.

The check sits at every call site in superset/sql/parse.py that hands a string to sqlglot.parse or sqlglot.parse_one:

  • SQLStatement._parse, which covers SQL Lab execute, format, RLS rewriting, dataset SQL, and engine spec helpers (including the MySQL-backtick fallback inside the same function)
  • SQLStatement.parse_predicate
  • the extract_tables_from_statement helper that builds a pseudo SELECT from a SQL Command literal
  • the standalone transpile_to_dialect entry point used by chart query rendering

Putting one helper at all four sites means a direct caller cannot bypass the bound. The cap is in UTF-8 bytes rather than code points, so multi-byte payloads cannot exceed the intended memory cap.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A. Backend-only change.

TESTING INSTRUCTIONS

  • Default config: SQL scripts up to 1,000,000 UTF-8 bytes parse as before.
  • Scripts longer than the configured cap raise SupersetParseError before reaching sqlglot. Verified with a mocker.spy(sqlglot, "parse") that asserts zero parse calls for over-cap input.
  • Setting SQL_MAX_PARSE_LENGTH = None in superset_config.py disables the gate.
  • Existing parser unit tests pass unchanged.

New regression tests in tests/unit_tests/sql/parse_tests.py:

  • accept exactly at the cap (boundary)
  • reject one byte over the cap
  • reject when code-point count is under the cap but byte count is over
  • SQL_MAX_PARSE_LENGTH = None disables the gate
  • app-config value overrides the module fallback
  • SQLScript short-circuits sqlglot.parse on over-cap input (covers the MySQL-backtick double-parse path)
  • SQLStatement.parse_predicate is gated
  • transpile_to_dialect is gated

Run with:

pytest tests/unit_tests/sql/parse_tests.py -k "check_script_length or parse_predicate_length or transpile_to_dialect_length or sqlscript_gate"

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Adds a configurable upper bound on the size of SQL scripts accepted by
the SQL parser. Scripts longer than SQL_MAX_PARSE_LENGTH (default
1,000,000 characters) are rejected before being passed to sqlglot. The
check sits in SQLStatement._parse, so it applies to every code path that
goes through SQLScript, including SQL Lab execute, format, RLS rewriting,
dataset SQL, and database engine spec helpers. Set SQL_MAX_PARSE_LENGTH
to None to disable.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 38.88889% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.96%. Comparing base (acc8024) to head (8381d47).

Files with missing lines Patch % Lines
superset/sql/parse.py 35.29% 8 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40499      +/-   ##
==========================================
- Coverage   63.97%   63.96%   -0.02%     
==========================================
  Files        2654     2649       -5     
  Lines      142753   142431     -322     
  Branches    32833    32740      -93     
==========================================
- Hits        91325    91099     -226     
+ Misses      49870    49771      -99     
- Partials     1558     1561       +3     
Flag Coverage Δ
hive 39.72% <33.33%> (-0.01%) ⬇️
mysql 58.42% <38.88%> (-0.01%) ⬇️
postgres 58.49% <38.88%> (-0.01%) ⬇️
presto 41.33% <33.33%> (-0.01%) ⬇️
python 59.98% <38.88%> (-0.01%) ⬇️
sqlite 58.15% <38.88%> (-0.01%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Follow-up on the parse-length gate. Three blocking gaps:

1. The gate at the top of SQLStatement._parse missed three other
   call sites in the same module that hand strings directly to
   sqlglot.parse_one: SQLStatement.parse_predicate, the
   extract_tables_from_statement helper that builds a pseudo SELECT
   from an exp.Command literal, and the standalone
   transpile_to_dialect entry point. Any of these could be reached
   without going through SQLStatement._parse, so the previous
   single-site check was bypassable.

   Pulled the check out of SQLStatement and into a module-level
   helper, then called it from all four sqlglot.parse/parse_one
   sites so the bound cannot be bypassed by a direct caller.

2. The cap was in Unicode code points, not bytes. A 1M-codepoint
   string of four-byte characters is up to 4MB of payload that the
   parser still has to ingest. Switched to UTF-8 byte length so the
   bound directly reflects parser memory and CPU exposure.

3. The "current_app.config.get + except RuntimeError" pattern is
   not the codebase idiom for "config-with-fallback-outside-app".
   Replaced with `has_app_context()`, which matches the pattern
   already used in sql_lab.py, models/core.py, and others.

Tests added in tests/unit_tests/sql/parse_tests.py:
- accept exactly at the cap (boundary)
- reject one byte over the cap
- reject when codepoint count is under the cap but byte count is over
- SQL_MAX_PARSE_LENGTH=None disables the gate
- app-config value overrides the module fallback
- SQLScript short-circuits sqlglot.parse on over-cap input (spy asserts
  zero calls, covers the MySQL-backtick double-parse path)
- SQLStatement.parse_predicate is gated
- transpile_to_dialect is gated

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 28, 2026
@sha174n sha174n marked this pull request as ready for review May 28, 2026 12:51
@dosubot dosubot Bot added change:backend Requires changing the backend sqllab Namespace | Anything related to the SQL Lab labels May 28, 2026
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 28, 2026

Code Review Agent Run #3c3c61

Actionable Suggestions - 0
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • tests/unit_tests/sql/parse_tests.py - 1
Review Details
  • Files reviewed - 3 · Commit Range: 5a6f1d5..b66c7a7
    • superset/config.py
    • superset/sql/parse.py
    • tests/unit_tests/sql/parse_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread superset/sql/parse.py Outdated
@bito-code-review
Copy link
Copy Markdown
Contributor

The provided pr_comments.csv file appears to be empty or incomplete, as it only contains a header row and no actual comment data. Without the content of the file, I cannot validate the correctness of the flagged issue or provide a resolution. Please ensure the file contains the expected review comments and try again.

sha174n added 2 commits May 29, 2026 01:16
… contract

Move the length gate inside the try/except wrappers in transpile_to_dialect
and extract_tables_from_statement so the new SupersetParseError no longer
escapes paths that previously normalised parse failures into
QueryClauseValidationException (or swallowed them with `return set()`).
Callers like transpile_virtual_dataset_sql only catch the former and would
otherwise lose their graceful fallback on over-cap input.
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 29, 2026

Code Review Agent Run #0991ff

Actionable Suggestions - 0
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • superset/sql/parse.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: b66c7a7..cad8d0b
    • superset/sql/parse.py
    • tests/unit_tests/sql/parse_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend size/L sqllab Namespace | Anything related to the SQL Lab

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant