feat(mcp): support custom SQL metrics in generate_chart and update_chart#40448
Conversation
Code Review Agent Run #280c25Actionable Suggestions - 0Additional Suggestions - 5
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| # ``metrics`` is in the bulk exclusion list (SIMPLE-metric content is | ||
| # bounded). SQL-metric adhoc dicts carry LLM-controlled strings that | ||
| # still need ``<UNTRUSTED-CONTENT>`` wrapping. | ||
| form_data = payload.get("form_data") | ||
| metrics = form_data.get("metrics") if isinstance(form_data, dict) else None | ||
| if isinstance(metrics, list): | ||
| for index, metric in enumerate(metrics): | ||
| if isinstance(metric, dict) and metric.get("expressionType") == "SQL": | ||
| for key in ("sqlExpression", "label"): | ||
| if isinstance(metric.get(key), str): | ||
| metric[key] = sanitize_for_llm_context( | ||
| metric[key], | ||
| field_path=( | ||
| "form_data", | ||
| "metrics", | ||
| str(index), | ||
| key, | ||
| ), | ||
| ) | ||
|
|
There was a problem hiding this comment.
🟠 Architect Review — HIGH
SQL adhoc metrics in ChartInfo form_data under the singular "metric" key (used by pie and big-number charts) are excluded from sanitize_for_llm_context and are not covered by the new SQL-metric wrapper, so their sqlExpression/label strings are returned to the LLM without wrapping.
Suggestion: Extend the SQL-metric wrapping in sanitize_chart_info_for_llm_context to also cover form_data["metric"] (and any other singular metric containers) in addition to the existing form_data["metrics"] handling, so that sqlExpression and label are consistently wrapped before being exposed to the LLM.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.
**Path:** superset/mcp_service/chart/schemas.py
**Line:** 448:467
**Comment:**
*HIGH: SQL adhoc metrics in ChartInfo form_data under the singular "metric" key (used by pie and big-number charts) are excluded from sanitize_for_llm_context and are not covered by the new SQL-metric wrapper, so their sqlExpression/label strings are returned to the LLM without <UNTRUSTED-CONTENT> wrapping.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
The PR comments file contains only the header row and no actual comment data. I cannot analyze or provide information about specific review comments or suggestions since there are no entries beyond the header. |
| raw = "SUM(amount)<script>alert(1)</script>" | ||
| cleaned = sanitize_sql_expression(raw, "sql_expression") | ||
| assert "<script" not in cleaned.lower() | ||
| assert "alert(1)" not in cleaned |
There was a problem hiding this comment.
Suggestion: This test asserts that alert(1) is removed from the sanitized SQL expression, but sanitize_sql_expression only strips HTML tags and does not remove safe text content inside those tags. With current sanitizer behavior, <script> tags are removed while alert(1) remains, so this assertion will fail even when sanitization is working as designed. Update the expectation to validate tag stripping only (or change sanitizer behavior explicitly if full script-content removal is actually required). [incorrect condition logic]
Severity Level: Major ⚠️
- ❌ MCP SQL expression sanitizer tests fail in CI.
- ⚠️ XSS test expectation misaligned with sanitizer design.Steps of Reproduction ✅
1. Open `tests/unit_tests/mcp_service/utils/test_sanitization.py` and locate
`test_sanitize_sql_expression_strips_script_tags` (around lines 88–95 in the current
file), where `raw = "SUM(amount)<script>alert(1)</script>"`, `cleaned =
sanitize_sql_expression(raw, "sql_expression")`, and the test asserts both `"<script" not
in cleaned.lower()` and `"alert(1)" not in cleaned` (lines 92–95 from the tool output).
2. Note that `_sanitize_sql()` in the same file imports `sanitize_sql_expression` from
`superset.mcp_service.utils.sanitization` (lines 3–7), so this test directly exercises the
production sanitizer implementation.
3. In `superset/mcp_service/utils/sanitization.py`, locate `sanitize_sql_expression` at
line 474 (from Grep output) and see that it calls `_strip_html_tags(value)` before other
checks (lines 49–52 of the snippet at offset 460). `_strip_html_tags` is defined at line
174 and uses `nh3.clean(decoded, tags=set(), url_schemes=set())` (lines 35–37 of that
snippet) to strip all HTML tags but preserve the inner text content.
4. Run `pytest
tests/unit_tests/mcp_service/utils/test_sanitization.py::test_sanitize_sql_expression_strips_script_tags`.
The call `sanitize_sql_expression("SUM(amount)<script>alert(1)</script>",
"sql_expression")` returns a string where `<script>`/`</script>` have been removed but
`alert(1)` remains (e.g. `SUM(amount)alert(1)`), so the first assertion about `<script`
passes, but the second assertion `assert "alert(1)" not in cleaned` fails, even though the
sanitizer is behaving as implemented (stripping tags only, not their text content).Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/utils/test_sanitization.py
**Line:** 931:934
**Comment:**
*Incorrect Condition Logic: This test asserts that `alert(1)` is removed from the sanitized SQL expression, but `sanitize_sql_expression` only strips HTML tags and does not remove safe text content inside those tags. With current sanitizer behavior, `<script>` tags are removed while `alert(1)` remains, so this assertion will fail even when sanitization is working as designed. Update the expectation to validate tag stripping only (or change sanitizer behavior explicitly if full script-content removal is actually required).
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #40448 +/- ##
==========================================
- Coverage 64.20% 64.13% -0.08%
==========================================
Files 2592 2592
Lines 139226 139423 +197
Branches 32326 32394 +68
==========================================
+ Hits 89385 89413 +28
- Misses 48306 48475 +169
Partials 1535 1535
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
aminghadersohi
left a comment
There was a problem hiding this comment.
Thanks for this PR! The overall approach is well-designed — sanitizer ordering (unicode stripping before keyword regexes), mutual exclusion validators, per-chart-type dimension guards, and the test suite depth are all solid.
One gap needs a fix: sanitize_chart_info_for_llm_context() correctly wraps form_data["metrics"] (plural list — used by XY/line/bar/table charts) but misses form_data["metric"] (singular dict — used by BigNumber and Pie charts). Both keys are in CHART_FORM_DATA_EXCLUDED_FIELD_NAMES, so neither is recursively sanitized, and only the plural case has the special SQL rewrap handler. A SQL metric on a BigNumber or Pie chart will return its sqlExpression to the LLM without <UNTRUSTED-CONTENT> wrapping (see inline comment).
A couple of smaller notes below as well.
| str(index), | ||
| key, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
form_data["metric"] (singular dict, used by BigNumber and Pie charts) is not handled here. Both "metric" and "metrics" are in CHART_FORM_DATA_EXCLUDED_FIELD_NAMES and excluded from the recursive sanitize_for_llm_context() call above, but only the metrics plural list gets the special SQL rewrap. A SQL metric on a BigNumber or Pie chart returns its sqlExpression to the LLM unwrapped.
Suggested addition after this block:
metric_singular = form_data.get("metric") if isinstance(form_data, dict) else None
if isinstance(metric_singular, dict) and metric_singular.get("expressionType") == "SQL":
for key in ("sqlExpression", "label"):
if isinstance(metric_singular.get(key), str):
metric_singular[key] = sanitize_for_llm_context(
metric_singular[key],
field_path=("form_data", "metric", key),
)Also worth adding a test for BigNumber/Pie SQL metric LLM context wrapping analogous to TestSqlMetricLlmContextWrapping.
There was a problem hiding this comment.
Hey @aminghadersohi Good catch, thanks! I've fixed this by applying your suggested patch verbatim, plus a TestSqlMetricLlmContextWrapping.test_singular_sql_metric_is_wrapped that mirrors the existing plural-metrics test.
72c6368 to
a3285d7
Compare
| def _check_dangerous_url_scheme(value: str, field_name: str) -> None: | ||
| """Raise if ``value`` contains a ``javascript:`` / ``vbscript:`` / ``data:`` | ||
| URL scheme.""" | ||
| if _DANGEROUS_URL_SCHEME_RE.search(value): | ||
| raise ValueError(f"{field_name} contains potentially malicious URL scheme") |
There was a problem hiding this comment.
Suggestion: URL-scheme detection runs on the raw string without first stripping zero-width/control Unicode, so payloads like java\u200Bscript: can evade the regex and then be normalized into javascript: later in the sanitization pipeline. Normalize with _remove_dangerous_unicode before checking _DANGEROUS_URL_SCHEME_RE (or do the normalization inside _check_dangerous_url_scheme) so obfuscated schemes are blocked reliably. [security]
Severity Level: Critical 🚨
- ❌ Sanitized chart labels can contain `javascript:` URL schemes.
- ❌ Dangerous URL text may be echoed in MCP/LLM responses.
- ⚠️ Obfuscated schemes in filter values bypass URL checks.Steps of Reproduction ✅
1. Call the MCP chart creation tool `generate_chart` at
`superset/mcp_service/chart/tool/generate_chart.py:95` with a config whose metric label
contains an obfuscated URL scheme, e.g. `"label": "java\u200Bscript:alert(1)"` (zero-width
space between `a` and `s`).
2. The request JSON is validated into a `GenerateChartRequest` instance by
`SchemaValidator.validate_request()` at
`superset/mcp_service/chart/validation/schema_validator.py:41-56`, which triggers Pydantic
validators on `ColumnRef` fields in `superset/mcp_service/chart/schemas.py`.
3. During validation of the metric `label`, `ColumnRef.sanitize_label()` at
`superset/mcp_service/chart/schemas.py:811-815` calls `sanitize_user_input(v, "Label",
max_length=500, allow_empty=True)` from
`superset/mcp_service/utils/sanitization.py:327-332`, which first strips HTML tags via
`_strip_html_tags()` at `174-212` and then calls `_check_dangerous_patterns(value,
field_name)` at `233-252`.
4. `_check_dangerous_patterns()` calls `_check_dangerous_url_scheme(value, field_name)` at
`247`, which uses `_DANGEROUS_URL_SCHEME_RE = re.compile(r"\b(javascript|vbscript|data):",
re.IGNORECASE)` at `215` to scan the raw string; because the input is
`java\u200Bscript:alert(1)`, the regex at `218-222` does not match and no `ValueError` is
raised, after which `sanitize_user_input` removes the zero-width character via
`_remove_dangerous_unicode()` at `283-37` and its call site `389`, returning the
normalized string `javascript:alert(1)` as a "sanitized" label that can be echoed back in
chart metadata and MCP/LLM responses with the dangerous scheme now visible.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/utils/sanitization.py
**Line:** 218:222
**Comment:**
*Security: URL-scheme detection runs on the raw string without first stripping zero-width/control Unicode, so payloads like `java\u200Bscript:` can evade the regex and then be normalized into `javascript:` later in the sanitization pipeline. Normalize with `_remove_dangerous_unicode` before checking `_DANGEROUS_URL_SCHEME_RE` (or do the normalization inside `_check_dangerous_url_scheme`) so obfuscated schemes are blocked reliably.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| ], | ||
| error_code="MISSING_BIG_NUMBER_AGGREGATE", | ||
| ) | ||
| if metric.get("sql_expression") and not (metric.get("label") or "").strip(): |
There was a problem hiding this comment.
Suggestion: The new SQL-metric label check assumes label is always a string and calls .strip() directly. If a client sends a non-string label (for example a number), this raises AttributeError during pre-validation and can bubble into a 500 instead of returning a structured validation error. Guard the type first (or coerce safely) before calling string methods. [type error]
Severity Level: Major ⚠️
- ❌ SchemaValidator crashes on non-string SQL metric labels.
- ⚠️ Big number configs miss targeted label-required validation error.
- ⚠️ Pipeline may return generic system error instead of guidance.Steps of Reproduction ✅
1. Follow the existing pattern in
`tests/unit_tests/mcp_service/chart/test_big_number_chart.py:4-19`, but construct a raw
request dict with a non-string label, e.g.:
`request_data = {"dataset_id": 1, "config": {"chart_type": "big_number", "metric":
{"sql_expression": "SUM(a)/SUM(b)", "label": 123}}}`.
2. Call `SchemaValidator.validate_request(request_data)` as done in
`TestSchemaValidatorBigNumber.test_valid_big_number_request` at
`tests/unit_tests/mcp_service/chart/test_big_number_chart.py:4-15`; this invokes
`_pre_validate` in `superset/mcp_service/chart/validation/schema_validator.py:28-31`.
3. Inside `_pre_validate`, `chart_type` is read from `config` and
`_pre_validate_chart_type` is called (lines 110-120 in `schema_validator.py`), which
dispatches `"big_number"` configs to `_pre_validate_big_number_config(config)` at
`schema_validator.py:39`.
4. In `_pre_validate_big_number_config` (lines 1-71 in `schema_validator.py`),
`metric.get("sql_expression")` is truthy and `metric.get("label")` returns the integer
`123`, so the condition at line 58 / diff line 417 (`if metric.get("sql_expression") and
not (metric.get("label") or "").strip():`) evaluates `(123 or "")` and then calls
`123.strip()`, raising `AttributeError` instead of returning a `ChartGenerationError` with
error_code `"MISSING_SQL_METRIC_LABEL"`.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/validation/schema_validator.py
**Line:** 417:417
**Comment:**
*Type Error: The new SQL-metric label check assumes `label` is always a string and calls `.strip()` directly. If a client sends a non-string label (for example a number), this raises `AttributeError` during pre-validation and can bubble into a 500 instead of returning a structured validation error. Guard the type first (or coerce safely) before calling string methods.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
The fix for form_data["metric"] (singular) LLM-context wrapping looks correct — the handler mirrors the plural-list logic exactly, and test_singular_sql_metric_is_wrapped verifies both sqlExpression and label are wrapped for BigNumber charts. ✓
On the two codeant-ai suggestions:
#1 (URL scheme ZWSP ordering in sanitize_string): Real ordering issue — _check_dangerous_patterns (which calls _check_dangerous_url_scheme) runs before _remove_dangerous_unicode in sanitize_string, so javascript: (zero-width space) could bypass the URL scheme regex. However, this ordering pre-dates this PR; the new sanitize_sql_expression correctly runs _remove_dangerous_unicode first. Worth fixing in a follow-up — not blocking this PR.
#2 (label type safety in schema_validator.py): New code from this PR, see inline comment for a one-line fix.
| ], | ||
| error_code="MISSING_BIG_NUMBER_AGGREGATE", | ||
| ) | ||
| if metric.get("sql_expression") and not (metric.get("label") or "").strip(): |
There was a problem hiding this comment.
(metric.get("label") or "").strip() raises AttributeError if a client sends "label": 123 (non-string). In schema_validator.py this check runs on raw dict input before Pydantic coercion, so non-string types can arrive here.
Suggested fix:
label = metric.get("label")
if metric.get("sql_expression") and not (isinstance(label, str) and label.strip()):There was a problem hiding this comment.
Good catch... Again! This has been fixed. The (metric.get("label") or "").strip() pattern looks fine when label is None/empty/string but raises AttributeError on any other truthy type (123, True, [], etc.), which a buggy or hostile client could absolutely send through pre-validation.
Applied your suggested isinstance(label, str) and label.strip() form verbatim, plus a regression test test_sql_expression_with_non_string_label_fails_cleanly that sends "label": 123 and asserts the validator returns MISSING_SQL_METRIC_LABEL cleanly.
a3285d7 to
c18a6c3
Compare
aminghadersohi
left a comment
There was a problem hiding this comment.
Both issues addressed:
form_data["metric"](singular) handler: confirmed intact, test covers BigNumber with injection strings for bothsqlExpressionandlabel. ✓- Label type guard in
schema_validator.py: theisinstance(label, str) and label.strip()fix is in place with a dedicatedtest_sql_expression_with_non_string_label_fails_cleanlytest. ✓
The pre-existing sanitize_string URL-scheme / unicode ordering issue is noted but out of scope for this PR.
Nice work on this one.
c18a6c3 to
b5c91e9
Compare
b5c91e9 to
2227898
Compare
Code Review Agent Run #8de722Actionable Suggestions - 0Additional Suggestions - 6
Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
The MCP
generate_chartandupdate_charttools only accepted column-based metrics (SUM(amount),COUNT(*), or a named saved metric). Anything else, ratios, conditional aggregations,ABSwrappers, unit conversions, required the LLM to ask the user to follow up in the Explore UI.This PR adds an optional
sql_expressionfield to the sharedColumnRefmetric model so the LLM can express the SQL adhoc metric directly:{ "y": [{ "sql_expression": "COUNT(CASE WHEN closed_won THEN 1 END)::numeric / NULLIF(COUNT(*), 0)", "label": "Win Rate" }] }It maps to Superset's existing
AdhocMetricshape with expressionType: "SQL". No DB migration, no frontend work, no API contract break, the field is additive and optional.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A - This is an API-level change (no UI)
TESTING INSTRUCTIONS
generate_chartwith a y metric carryingsql_expression+label(omitname/aggregate), e.g.{"sql_expression": "COUNT(CASE WHEN gender = 'boy' THEN 1 END)::numeric / NULLIF(COUNT(*), 0)", "label": "Boy Ratio"}against thebirth_namesexample dataset.form_data.metrics[0]hasexpressionType: "SQL",sqlExpressionmatches the input, andlabelis preserved.explore_url— the chart should render real ratio values (~0.47-0.59 per year). The metric editor's "Custom SQL" tab should show the expression pre-filled.update_charton a SIMPLE-metric chart with asql_expressiony — verify the chart's metric is replaced with the SQL adhoc dict.sql_expressionwithoutlabel;sql_expressioncombined withname/aggregate/saved_metric;sql_expressionon x-axis,group_by, piedimension, pivotrows, orTableChartConfigwithquery_mode='raw'; bad SQL syntax (e.g.SUM(num); subquery whileALLOW_ADHOC_SUBQUERY=False.ADDITIONAL INFORMATION