fix(mcp): strip json_metadata and position_json from get_dashboard_info response#39101
Conversation
…fo response The get_dashboard_info MCP tool returned excessively large responses (up to 2.4MB) due to raw json_metadata (~0.5MB of color schemes, cross-filter scopes, etc.) and position_json (internal layout tree). Changes: - Remove json_metadata and position_json raw fields from DashboardInfo - Extract only useful filter info into structured native_filters field - Add cross_filters_enabled boolean field - Replace full ChartInfo with lightweight DashboardChartSummary in dashboard context (id, name, viz_type, datasource_name, url, description) - Update dashboard_serializer, serialize_dashboard_object, and generate_dashboard/add_chart_to_existing_dashboard to use new models - Add comprehensive tests for filter extraction and response slimming
Code Review Agent Run #7a2e42Actionable Suggestions - 0Additional Suggestions - 1
Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39101 +/- ##
==========================================
- Coverage 64.52% 64.42% -0.10%
==========================================
Files 2536 2543 +7
Lines 131208 131987 +779
Branches 30457 30572 +115
==========================================
+ Hits 84661 85039 +378
- Misses 45084 45463 +379
- Partials 1463 1485 +22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Address PR review comments: - Add isinstance(metadata, dict) guard in _extract_native_filters and _extract_cross_filters_enabled to handle non-object JSON (e.g. "[]", "123") gracefully instead of raising AttributeError - Add isinstance(raw_targets, list) guard for corrupted filter targets to prevent Pydantic ValidationError - Fix stale ChartInfo reference in module docstring - Add tests for non-dict top-level JSON edge cases
Refactor both _extract_native_filters and _extract_cross_filters_enabled to use a shared _parse_json_metadata helper that safely handles None, invalid JSON, and non-dict JSON values (e.g. "[]", "123"). This addresses review feedback about isinstance(dict) guards by centralizing the validation in one place.
Add parentheses to @pytest.mark.asyncio and @pytest.fixture decorators to match CI pre-commit ruff version.
Code Review Agent Run #3628d0Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
Claude commenting from a session with @mistercrunch 👋 Nice cleanup — dropping 2.4MB responses is clearly the right call. One concern worth considering: Silent omission vs. explicit omission Right now Here's an idea: instead of silently omitting, surface a stub that tells the agent what it's missing and how to get it: {
"native_filters": [...],
"cross_filters_enabled": true,
"_omitted": {
"position_json": "omitted (~42kb layout tree)",
"json_metadata": "omitted (~18kb raw blob) — native_filters and cross_filters_enabled extracted above"
}
}Or even simpler — keep the field in the schema but return a sentinel string: position_json: str | None = "[omitted: layout tree too large for context — use get_dashboard_raw if needed]"The distinction matters because:
The |
Address reviewer feedback: instead of silently dropping json_metadata and position_json, include an omitted_fields dict that tells the LLM agent what was stripped, approximate sizes, and why. Introduces a reusable OmittedFieldsBuilder utility in response_utils.py that any MCP tool serializer can use for consistent omission metadata. Pattern follows industry best practices (mcp-git-polite, Axiom, Blockscout) for explicit omission signaling over silent drops.
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR reduces the payload size of the get_dashboard_info MCP tool by removing large raw dashboard fields and replacing them with lightweight, structured summaries plus explicit omission metadata.
Changes:
- Removed
json_metadataandposition_jsonfromDashboardInfo, addingnative_filters,cross_filters_enabled, andomitted_fieldsinstead. - Replaced full chart serialization with lightweight
DashboardChartSummaryin dashboard context. - Added a reusable
OmittedFieldsBuilderutility and expanded unit tests around extraction/omission behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit_tests/mcp_service/dashboard/tool/test_dashboard_tools.py |
Updates tool tests to reflect new default columns and removed heavy fields. |
tests/unit_tests/mcp_service/dashboard/test_dashboard_schemas.py |
Adds tests for omission behavior and json_metadata extraction helpers. |
superset/mcp_service/utils/response_utils.py |
Introduces OmittedFieldsBuilder for consistent omission metadata. |
superset/mcp_service/utils/permissions_utils.py |
Removes permissions handling for fields no longer exposed in MCP dashboard responses. |
superset/mcp_service/dashboard/tool/generate_dashboard.py |
Switches dashboard chart serialization to lightweight summaries. |
superset/mcp_service/dashboard/tool/add_chart_to_existing_dashboard.py |
Switches updated dashboard response chart serialization to lightweight summaries. |
superset/mcp_service/dashboard/schemas.py |
Implements new schema fields + extraction helpers + chart summary serializer. |
superset/mcp_service/common/schema_discovery.py |
Updates dashboard field descriptions to match new response shape. |
|
Great feedback. Implemented explicit omission metadata — the response now includes an "omitted_fields": {
"position_json": "Omitted (~42 KB) — Internal layout tree with component positions/hierarchy. Not useful for analysis or LLM context.",
"json_metadata": "Omitted (~18 KB), useful parts extracted — native_filters and cross_filters_enabled extracted into dedicated fields above."
}This follows the pattern from mcp-git-polite (truncated + reason), Axiom (result provenance), and Blockscout (explicit indicators). The For the companion tool idea ( See commit f62af42. |
Code Review Agent Run #0b84e8Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
…load guard - Filter non-dict entries from native filter targets to prevent Pydantic ValidationError on corrupted json_metadata - Rename _serialize_chart_summary to serialize_chart_summary (public API since it's imported cross-module) - Use `if chart_id is not None:` instead of `if chart_id:` to handle falsy-but-valid IDs - Use getattr() for json_metadata/position_json in dashboard_serializer to avoid triggering SQLAlchemy lazy-loads on deferred columns
Code Review Agent Run #542f75Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Summary
The
get_dashboard_infoMCP tool returned excessively large responses (up to 2.4MB for dashboards with many charts and filters). The main culprits were:json_metadata(~0.5MB on large dashboards): Raw JSON blob containing color schemes, cross-filter scopes, shared_label_colors, and other internal configuration not useful for LLM consumptionposition_json(grows with chart count): Full internal Superset dashboard layout tree with every node's children, parents, height, width, etc.Changes
json_metadataandposition_jsonraw fields fromDashboardInforesponse schemanative_filtersfield — extracts only filter name, type, and targets fromjson_metadata(the useful part for LLMs)cross_filters_enabledboolean field extracted fromjson_metadataChartInfowithDashboardChartSummaryin dashboard context — lightweight model with only: id, slice_name, viz_type, datasource_name, url, descriptiondashboard_serializer,serialize_dashboard_object,generate_dashboard, andadd_chart_to_existing_dashboardto use new modelsImpact
This significantly reduces
get_dashboard_inforesponse size (from potentially 2.4MB to a small fraction) while retaining all information useful for LLM workflows: dashboard metadata, filter configuration, and chart summaries.Test plan
_extract_native_filters(7 cases: None, empty, invalid JSON, no config, non-list, valid, skip non-dict)_extract_cross_filters_enabled(5 cases: None, empty, true, false, non-bool)DashboardInfono longer hasjson_metadata/position_jsonDashboardChartSummary(no form_data, tags, owners)