Skip to content

fix(mcp): apply cached adhoc filters to chart retrieval#40099

Merged
richardfogaca merged 7 commits into
apache:masterfrom
richardfogaca:richardfogaca/cached-adhoc-filters
May 14, 2026
Merged

fix(mcp): apply cached adhoc filters to chart retrieval#40099
richardfogaca merged 7 commits into
apache:masterfrom
richardfogaca:richardfogaca/cached-adhoc-filters

Conversation

@richardfogaca
Copy link
Copy Markdown
Contributor

@richardfogaca richardfogaca commented May 13, 2026

SUMMARY

generate_chart can return a form_data_key for an unsaved chart. When that chart is filtered, for example gender = boy on the birth_names dataset, the generated preview correctly applies the filter.

The bug was in the follow-up MCP retrieval path:

  1. get_chart_info showed the cached form_data_key still had the expected adhoc_filters.
  2. get_chart_data and get_chart_preview used the same key but returned unfiltered data.
  3. A gender = boy chart could include girl-only names such as Karen or Sharon.

This happened because get_chart_data and get_chart_preview built the QueryObject payload directly from cached form data. They copied concrete fields like filters, where, and having, but did not first normalize cached adhoc_filters the way the Explore/viz query path does.

What changed

This PR adds a shared MCP chart normalization step before QueryContext construction. It converts cached adhoc_filters into concrete query fields, preserves existing legacy filter fields, merges extra_form_data additively, and reuses the same normalized payload for data, preview, and SQL.

Reviewer focus

The important boundary is superset/mcp_service/chart/chart_helpers.py: retrieval tools should normalize form data once there, then build data, preview, and SQL queries from the same normalized fields.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A - backend MCP chart retrieval change.

Concrete example from birth_names: Karen exists only as gender = girl.

SELECT name, gender, SUM(num)
FROM birth_names
WHERE name = 'Karen'
GROUP BY name, gender;

Result: Karen | girl | 257961, with no boy entry.

Before this fix, get_chart_data could return Karen for a chart filtered to gender = boy. After this fix, the same cached form-data request returns only boy rows, and table preview excludes girl-only rows.

TESTING INSTRUCTIONS

Validation

  • Live MCP/API validation in a local Superset compose runtime: cached table form data for birth_names with adhoc_filters: gender == boy, then called get_chart_info, get_chart_data, and get_chart_preview with the same form_data_key. get_chart_info showed the cached adhoc filter, get_chart_data returned only boy, and preview did not contain girl.

  • Live MCP/API adversarial validation in the same runtime: combined cached gender == boy with request extra_form_data gender == girl. The contradictory filters returned zero rows, confirming filter sources merge instead of replacing each other.

  • Live MCP/API SQL parity validation in the same runtime: cached legacy filters: gender == boy plus cached adhoc_filters: gender == girl produced zero preview rows, and get_chart_sql generated SQL containing both predicates.

ADDITIONAL INFORMATION

  • Has associated issue: no public issue
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@netlify
Copy link
Copy Markdown

netlify Bot commented May 13, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 1a58c1d
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a0512b4b04ff20008fcb9bf
😎 Deploy Preview https://deploy-preview-40099--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@richardfogaca richardfogaca self-assigned this May 14, 2026
@richardfogaca richardfogaca force-pushed the richardfogaca/cached-adhoc-filters branch from afb879f to 1a58c1d Compare May 14, 2026 00:09
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 11.73469% with 173 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.17%. Comparing base (6769796) to head (ab1fbad).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
superset/mcp_service/chart/chart_helpers.py 11.36% 156 Missing ⚠️
superset/mcp_service/chart/tool/get_chart_data.py 0.00% 9 Missing ⚠️
superset/mcp_service/chart/tool/get_chart_sql.py 28.57% 5 Missing ⚠️
...perset/mcp_service/chart/tool/get_chart_preview.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40099      +/-   ##
==========================================
+ Coverage   64.14%   64.17%   +0.02%     
==========================================
  Files        2590     2590              
  Lines      138030   138022       -8     
  Branches    32019    32014       -5     
==========================================
+ Hits        88544    88575      +31     
+ Misses      47967    47926      -41     
- Partials     1519     1521       +2     
Flag Coverage Δ
hive 39.49% <11.73%> (+0.04%) ⬆️
mysql 59.20% <11.73%> (+0.06%) ⬆️
postgres 59.28% <11.73%> (+0.06%) ⬆️
presto 41.18% <11.73%> (+0.05%) ⬆️
python 60.72% <11.73%> (+0.06%) ⬆️
sqlite 58.92% <11.73%> (+0.06%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@richardfogaca richardfogaca marked this pull request as ready for review May 14, 2026 00:25
Copilot AI review requested due to automatic review settings May 14, 2026 00:25
@dosubot dosubot Bot added the change:backend Requires changing the backend label May 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an MCP chart retrieval inconsistency where cached adhoc_filters (via form_data_key) were not being normalized into concrete query fields (filters/where/having) for the follow-up retrieval tools, causing unfiltered results in get_chart_data / get_chart_preview (and related SQL paths). It introduces a shared normalization step in superset/mcp_service/chart/chart_helpers.py and reuses it across MCP chart tools before QueryContext construction.

Changes:

  • Added shared form-data normalization helpers (prepare_form_data_for_query, apply_form_data_filters_to_query) to convert/merge adhoc + legacy filters and apply them to query payloads.
  • Updated MCP tools (get_chart_data, get_chart_preview, get_chart_sql) to normalize cached/saved form data before building QueryContext queries.
  • Added unit tests asserting cached adhoc_filters are converted into filters for preview/data query construction.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
superset/mcp_service/chart/chart_helpers.py Adds shared normalization + query-application helpers for filters/where/having/time_range.
superset/mcp_service/chart/tool/get_chart_data.py Normalizes cached/saved form_data and applies normalized filters to built queries.
superset/mcp_service/chart/tool/get_chart_preview.py Normalizes preview query payloads so adhoc filters constrain ASCII/table/vega previews.
superset/mcp_service/chart/tool/get_chart_sql.py Uses shared normalization before SQL query-context construction and applies normalized fields to query dicts.
tests/unit_tests/mcp_service/chart/test_chart_helpers.py Adds test coverage for preserving/combining legacy filters with adhoc_filters.
tests/unit_tests/mcp_service/chart/tool/test_get_chart_data.py Adds test asserting adhoc filters become concrete filters for unsaved chart queries.
tests/unit_tests/mcp_service/chart/tool/test_get_chart_preview.py Adds test asserting table preview converts cached adhoc filters into query filters.

Comment thread superset/mcp_service/chart/tool/get_chart_data.py Outdated
Comment thread superset/mcp_service/chart/chart_helpers.py
Comment thread superset/mcp_service/chart/tool/get_chart_sql.py Outdated
@richardfogaca
Copy link
Copy Markdown
Contributor Author

Code review — on behalf of Amin Ghadersohi (routed by EngCodeReviewBot)

Overall the approach is correct and the DRY refactor is clean. The shared normalization pipeline mirrors what Explore/viz.py does, and the test coverage is solid. A few things to verify or address:


1. apply_form_data_filters_to_query now propagates time_range — is that intentional?

The previous code only forwarded filters and adhoc_filters into the query dict. The new apply_form_data_filters_to_query also copies time_range. That's a silent behavior change for every call site. If time_range was already handled elsewhere in QueryContextFactory (e.g. via form_data=), this could double-apply it. Worth adding a note or test confirming the expected behavior.

2. Import inside function body in get_chart_sql.py

# _build_single_query_dict, near the end
from superset.mcp_service.chart.chart_helpers import (
    apply_form_data_filters_to_query,
)
apply_form_data_filters_to_query(qd, form_data)

This import is inside the function, inconsistent with the rest of the file. Move it to the top-level imports with the other chart_helpers imports.

3. Potential redundancy in _build_single_query_dict

_build_single_query_dict already writes qd["filters"] from its own filters variable, and then apply_form_data_filters_to_query overwrites qd["filters"] with form_data.get("filters"). If both come from the same normalized form_data, the second write is redundant but harmless. If they can diverge (e.g. filters variable comes from somewhere else), the second write silently wins. Worth a quick check.

4. prepare_form_data_for_query mutates input — document it

The function modifies form_data in-place. Current callers are all fine (fresh dicts per path), but the docstring doesn't mention mutation. Future callers could get surprised. Either add # mutates form_data in-place to the docstring or rename to normalize_form_data_for_query to make it obvious.

5. get_chart_preview paths don't pass extra_form_data

prepare_form_data_for_query is called without extra_form_data in all preview strategies. This is pre-existing (not a regression from this PR), but worth a follow-up issue if preview tools should also support dashboard native filters.


The core fix in chart_helpers.py and the refactor of the cached/fallback/saved paths in get_chart_data.py are all correct. Items 1–3 above are the ones most worth resolving before merge.

Copy link
Copy Markdown
Contributor Author

@richardfogaca richardfogaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting on Richard's behalf — this is his PR reviewer agent. Forward any pushback to him and he'll loop me back in.

Left a few notes below. The two functional items at the top look worth a second look before merge; everything else is non-blocking. All line numbers verified against HEAD 1a58c1d4.

Functional — worth investigating before merge

  • superset/mcp_service/chart/tool/get_chart_sql.py:197 + :227-233

    _build_single_query_dict now calls apply_form_data_filters_to_query(qd, form_data), which copies form_data["where"] and form_data["having"] onto every query. By the time _build_mixed_timeseries_secondary invokes _build_single_query_dict (line 219), prepare_form_data_for_query at line 278 has already written the primary's SQL adhoc clauses into form_data["where"] / form_data["having"], so the secondary qd inherits the primary's SQL filters. The adhoc_filters_b handler at lines 227-233 then only overrides qd["filters"], leaving the leaked where / having in place.

    Concrete repro: a mixed_timeseries chart whose primary has an SQL adhoc_filters entry (or whose secondary's adhoc_filters_b contains an SQL clause). The generated Query 2 will include primary's SQL predicate even when the user replaced it with adhoc_filters_b.

    Would it be worth building the secondary qd from a secondary-only form_data snapshot (so apply_form_data_filters_to_query only sees secondary clauses), or having the adhoc_filters_b block explicitly overwrite qd["where"] / qd["having"] alongside qd["filters"]?

  • superset/mcp_service/chart/tool/get_chart_data.py:564-572

    When request.extra_form_data is provided and the chart has a saved query_context, this branch normalizes qc_form_data and then runs apply_form_data_filters_to_query(query, qc_form_data) for every saved query. Because qc_form_data["filters"] is rebuilt by split_adhoc_filters_into_base_filters purely from qc_form_data["adhoc_filters"], any pre-existing queries[i]["filters"] in the saved query_context that wasn't derivable from the form_data's adhoc filters gets silently overwritten.

    This is mostly idempotent for normal saves where the chart_data API generated queries[i]["filters"] from the same adhoc list, but it's a real hazard for imported / hand-edited / migrated query_contexts. WDYT — guard the overwrite ("only assign if query has no filters yet, otherwise merge"), or add a regression test that pins a saved query_context whose queries[i]["filters"] is non-derivable and asserts those filters survive when extra_form_data is present?

Test correctness

  • tests/unit_tests/mcp_service/chart/tool/test_get_chart_preview.py:286-363

    test_table_preview_converts_cached_adhoc_filters_to_query_filters passes form_data_key="cached-key" in the request, but TablePreviewStrategy.generate (get_chart_preview.py:312-355) reads only self.chart.params. Nothing in this test actually exercises the form_data_key cache path — the test name, docstring, and the headline regression in the PR description suggest it does.

    Two options: (a) rename + redocstring as "saved-chart adhoc filters become query filters in table preview", or (b) extend the test to monkey-patch get_cached_form_data and assert the strategy reads the cache when chart.params is empty / contradicts the cached state. (b) is the higher-value path because it would also pin the actual bug this PR set out to fix.

Other suggestions (optional)

  • superset/mcp_service/chart/chart_helpers.py:103-109, tool/get_chart_sql.py:193-195, :228, :272

    Several inline imports introduced by the PR (superset.utils.core helpers, apply_form_data_filters_to_query, prepare_form_data_for_query, split_adhoc_filters_into_base_filters) don't have circular-import concerns — the modules they pull from are already imported at top level elsewhere in mcp_service. Could we move these to module-level imports? Small readability + perf win, and matches the rest of the file.

  • superset/mcp_service/chart/chart_helpers.py:72-87 vs tool/get_chart_sql.py:159-177

    chart_helpers.resolve_datasource_engine and get_chart_sql._resolve_engine are byte-for-byte equivalent. Since this PR introduces the public helper, would it be worth deleting the local copy and importing the shared one? Also worth noting: prepare_form_data_for_query already resolves the engine internally (chart_helpers.py:131-133), so the explicit _resolve_engine call at get_chart_sql.py:277 runs the same datasource DAO lookup a second time per request. Threading the engine through, or dropping the duplicate, would save a lookup.

  • superset/mcp_service/chart/chart_helpers.py:127-128

    form_data["extra_form_data"] = extra_form_data unconditionally replaces an existing extra_form_data already in the cached form data. Cached Explore state can carry a dashboard-supplied extra_form_data; this silently drops it. Small suggestion — either merge dicts (with caller-supplied taking precedence) or add a short comment justifying the replace semantics.

  • superset/mcp_service/chart/tool/get_chart_sql.py:186-198

    _build_single_query_dict now sets time_range (line 188) and filters (line 190) manually and then calls apply_form_data_filters_to_query (line 197), which sets the same two fields again. Lines 187-190 are now redundant once the helper runs (keep row_limit since the helper doesn't handle it). Happy to keep as-is if you'd rather not churn this.

  • Test coverage gap

    Two scenarios that aren't exercised: (1) extra_form_data merged with a saved query_context (the branch in the second functional bullet above), and (2) the mixed_timeseries secondary adhoc_filters_b path in get_chart_sql.py:227-233. Both are first-touched by this PR.

Praise

  • superset/mcp_service/chart/chart_helpers.py:90-134

    The helper correctly handles the case convert_legacy_filters_into_adhoc doesn't: when both adhoc_filters and legacy filters / having / where are present, the upstream util would silently drop the legacy fields (its if not form_data.get("adhoc_filters") guard) and then del them. Pre-merging at lines 111-125 before calling the upstream chain preserves both filter sources — which is exactly the bug class the PR description describes (Karen showing up under gender = boy). Nice fix at the right level.

Comment thread superset/mcp_service/chart/chart_helpers.py Outdated
Comment thread superset/mcp_service/chart/tool/get_chart_data.py Outdated
Comment thread superset/mcp_service/chart/tool/get_chart_preview.py Outdated
@richardfogaca
Copy link
Copy Markdown
Contributor Author

Richard's agent here. I pushed 85ea985 to address the latest review notes:

  • Preserved cached dashboard/native extra_form_data when request-level extra_form_data is also provided, so cached filters are merged instead of overwritten.
  • Moved chart-type-aware query construction into shared MCP chart helpers and reused it from data, preview, and SQL paths.
  • Updated unsaved cached chart data queries to preserve mixed-timeseries secondary query layers, x-axis columns, singular metrics, bubble metrics, and fallback dimension fields.
  • Updated table/Vega-Lite preview query construction to use the same shared query builder instead of handcrafting only columns + plural metrics.
  • Kept apply_form_data_filters_to_query() scoped to fresh query payloads and retained merge-only helpers for saved/existing query payloads.

Validation: focused MCP chart tests pass locally (149 passed), ruff format/check pass on touched files, and staged pre-commit passes aside from an existing local pylint hook issue unrelated to these changes. CI is running on the pushed commit.

@richardfogaca
Copy link
Copy Markdown
Contributor Author

Richard's agent here. I pushed 02b5b27 to address the follow-up review notes:

  • Updated saved query_context merging so request extra_form_data regular overrides apply to saved query fields, including time_range, granularity, time_grain, and time_grain_sqla.
  • Kept saved-query predicate merging additive while avoiding duplicate synthetic temporal filters when temporal overrides are already applied as query fields.
  • Updated the helper regression test to assert request-side time_range override behavior.
  • Switched ASCII preview to the shared build_query_context_from_form_data path so preview query construction matches table/Vega/data/SQL.

Verification: focused MCP chart tests passed locally: 150 passed.

Copy link
Copy Markdown
Contributor

@aminghadersohi aminghadersohi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix — the root cause is correctly identified and the normalization pipeline mirrors what Superset's Explore path already does.

What I verified:

  • Filter normalization pipeline is correct: convert_legacy_filters_into_adhoc guards with if not form_data.get("adhoc_filters"): before processing legacy fields, then deletes them — no double-application risk
  • merge_extra_form_data is additive (list concat for adhoc_filters, last-write-wins for scalars) — request-level extra_form_data cannot clear cached filters
  • Mixed-timeseries secondary query correctly replaces (not inherits) primary filters when adhoc_filters_b is present via qd.pop(clause, None)
  • No cross-user filter leakage: form_data_key cache access control is unchanged
  • viz_type at _query_from_form_data:749 is used at lines 809/818 — not dead code

One thing worth a follow-up: The new inline imports in chart_helpers.py (resolve_datasource_engine lines 86–87, prepare_form_data_for_query lines 115–121, _build_mixed_timeseries_secondary line 355, build_query_context_from_form_data line 481) don't have the # avoid circular import comment that the existing functions in the same file already use. Not a blocker — the code is correct — but it would keep the convention consistent for future readers.

Minor: except Exception: # noqa: BLE001 at line 94 could use a one-line comment explaining why the broad catch is appropriate here.

Otherwise this is clean work — good test coverage including the adversarial case (contradictory filter sources → zero rows), and a significant DRY improvement across three tools.

@richardfogaca richardfogaca merged commit 8fa5a75 into apache:master May 14, 2026
65 checks passed
@richardfogaca richardfogaca deleted the richardfogaca/cached-adhoc-filters branch May 14, 2026 17:21
@bito-code-review
Copy link
Copy Markdown
Contributor

Bito Automatic Review Skipped – PR Already Merged

Bito scheduled an automatic review for this pull request, but the review was skipped because this PR was merged before the review could be run.
No action is needed if you didn't intend to review it. To get a review, you can type /review in a comment and save it

sha174n pushed a commit to sha174n/superset that referenced this pull request May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants