feat(mcp): chart type plugin registry for extensible generate_chart by aminghadersohi · Pull Request #39922 · apache/superset

aminghadersohi · 2026-05-06T22:58:52Z

SUMMARY

Introduces a ChartTypePlugin protocol and central registry that consolidates all chart-type-specific logic into a single plugin class per chart type. Previously, adding a new chart type required synchronised edits to four separate dispatch locations. With this change it requires one new file.

The problem: 4 dispatch points that had to stay in sync

Before this PR, generate_chart logic was split across:

Location	Responsibility
`schema_validator.py`	7 `_pre_validate_*` static methods, one per type
`dataset_validator.py`	`isinstance` branches in `_extract_column_references` and `normalize_column_names`
`chart_utils.py`	`if/elif` chain in `map_config_to_form_data`, `generate_chart_name`, `_resolve_viz_type`
`runtime/__init__.py`	`isinstance(config, XYChartConfig)` hardcoded branch

Result: 5 of 7 chart types (pie, pivot_table, mixed_timeseries, handlebars, big_number) silently skipped column validation — a bug discovered only when charts failed at query time.

Architecture

New file layout

superset/mcp_service/chart/
├── plugin.py          # ChartTypePlugin Protocol + BaseChartPlugin base class
├── registry.py        # Central registry with lazy bootstrap
└── plugins/
    ├── __init__.py    # Imports and registers all 7 built-in plugins
    ├── xy.py
    ├── table.py
    ├── pie.py
    ├── pivot_table.py
    ├── mixed_timeseries.py
    ├── handlebars.py
    └── big_number.py

Request flow (before → after)

flowchart LR
    subgraph BEFORE["Before: 4 separate dispatch points"]
        direction TB
        A1[schema_validator.py\n7 pre_validate methods]
        A2[dataset_validator.py\nisinstance branches]
        A3[chart_utils.py\nif/elif chains]
        A4[runtime/__init__.py\nXY hardcoding]
    end

    subgraph AFTER["After: single registry lookup"]
        direction TB
        B1[registry.get chart_type] --> B2[plugin.pre_validate]
        B1 --> B3[plugin.extract_column_refs]
        B1 --> B4[plugin.to_form_data]
        B1 --> B5[plugin.post_map_validate]
        B1 --> B6[plugin.normalize_column_refs]
        B1 --> B7[plugin.get_runtime_warnings]
        B1 --> B8[plugin.generate_name]
        B1 --> B9[plugin.resolve_viz_type]
    end

    BEFORE -.->|refactored to| AFTER

Plugin protocol

classDiagram
    class ChartTypePlugin {
        <<Protocol>>
        +str chart_type
        +str display_name
        +dict native_viz_types
        +pre_validate(config) ChartGenerationError|None
        +extract_column_refs(config) list[ColumnRef]
        +to_form_data(config, dataset_id) dict
        +post_map_validate(config, form_data, dataset_id) ChartGenerationError|None
        +normalize_column_refs(config, dataset_context) Any
        +get_runtime_warnings(config, dataset_id) list[str]
        +generate_name(config, dataset_name) str
        +resolve_viz_type(config) str
    }

    class BaseChartPlugin {
        +pre_validate() None
        +extract_column_refs() []
        +to_form_data() NotImplementedError
        +post_map_validate() None
        +normalize_column_refs() config unchanged
        +get_runtime_warnings() []
        +generate_name() "Chart"
        +resolve_viz_type() "unknown"
        +_with_context(what, context)$ str
    }

    class XYChartPlugin
    class PieChartPlugin
    class TableChartPlugin
    class BigNumberChartPlugin
    class PivotTableChartPlugin
    class MixedTimeseriesChartPlugin
    class HandlebarsChartPlugin

    ChartTypePlugin <|.. BaseChartPlugin
    BaseChartPlugin <|-- XYChartPlugin
    BaseChartPlugin <|-- PieChartPlugin
    BaseChartPlugin <|-- TableChartPlugin
    BaseChartPlugin <|-- BigNumberChartPlugin
    BaseChartPlugin <|-- PivotTableChartPlugin
    BaseChartPlugin <|-- MixedTimeseriesChartPlugin
    BaseChartPlugin <|-- HandlebarsChartPlugin

Registry bootstrap (lazy, no circular imports)

sequenceDiagram
    participant Caller as Caller (test / tool)
    participant Registry as registry.py
    participant Plugins as plugins/__init__.py

    Caller->>Registry: get("xy")
    Registry->>Registry: _ensure_plugins_loaded()
    alt first call
        Registry->>Plugins: import (triggers register() calls)
        Plugins-->>Registry: 7 plugins registered
    end
    Registry-->>Caller: XYChartPlugin instance

The registry self-bootstraps on first lookup — no dependency on app.py being imported first, so isolated unit tests work without the full Flask app.

How to add a new chart type

Everything goes in one new file. Here is a minimal working example for a hypothetical funnel chart:

Step 1 — Add the Pydantic schema (chart/schemas.py):

class FunnelChartConfig(ChartConfig):
    chart_type: Literal["funnel"] = "funnel"
    metric: ColumnRef
    groupby: ColumnRef
    filters: list[FilterConfig] | None = None

Step 2 — Create the plugin (chart/plugins/funnel.py):

from superset.mcp_service.chart.plugin import BaseChartPlugin
from superset.mcp_service.chart.schemas import ColumnRef
from superset.mcp_service.common.error_schemas import ChartGenerationError


class FunnelChartPlugin(BaseChartPlugin):
    chart_type = "funnel"
    display_name = "Funnel Chart"
    native_viz_types = {"funnel": "Funnel Chart"}

    def pre_validate(self, config: dict) -> ChartGenerationError | None:
        if "metric" not in config or "groupby" not in config:
            return ChartGenerationError(
                error_type="missing_funnel_fields",
                message="Funnel chart requires 'metric' and 'groupby'",
                suggestions=["Add metric: {'name': 'col', 'aggregate': 'SUM'}"],
                error_code="MISSING_FUNNEL_FIELDS",
            )
        return None

    def extract_column_refs(self, config) -> list[ColumnRef]:
        from superset.mcp_service.chart.schemas import FunnelChartConfig
        if not isinstance(config, FunnelChartConfig):
            return []
        refs = [config.metric, config.groupby]
        if config.filters:
            refs.extend(ColumnRef(name=f.column) for f in config.filters)
        return refs

    def to_form_data(self, config, dataset_id=None) -> dict:
        return {
            "viz_type": "funnel",
            "metric": {"column": {"column_name": config.metric.name},
                       "aggregate": config.metric.aggregate},
            "groupby": [config.groupby.name],
        }

    def resolve_viz_type(self, config) -> str:
        return "funnel"

    def generate_name(self, config, dataset_name=None) -> str:
        return self._with_context(
            f"Funnel: {config.metric.aggregate}({config.metric.name})",
            dataset_name,
        )

    def normalize_column_refs(self, config, dataset_context):
        from superset.mcp_service.chart.schemas import FunnelChartConfig
        from superset.mcp_service.chart.validation.dataset_validator import DatasetValidator
        d = config.model_dump()
        get = DatasetValidator._get_canonical_column_name
        d["metric"]["name"] = get(d["metric"]["name"], dataset_context)
        d["groupby"]["name"] = get(d["groupby"]["name"], dataset_context)
        DatasetValidator._normalize_filters(d, dataset_context)
        return FunnelChartConfig.model_validate(d)

Step 3 — Register it (chart/plugins/__init__.py):

from superset.mcp_service.chart.plugins.funnel import FunnelChartPlugin
# ...existing imports...
register(FunnelChartPlugin())

Step 4 — Add a Pydantic error hint (chart/validation/schema_validator.py):

# In _CHART_TYPE_ERROR_HINTS — keep in sync with plugins/
"funnel": {
    "metric": "Add metric: {'name': 'col', 'aggregate': 'SUM'}",
    "groupby": "Add groupby: {'name': 'stage_column'}",
},

That is the complete change set — no edits to chart_utils.py, dataset_validator.py, or runtime/__init__.py dispatch logic.

Bug fixes included

5-type column validation gap: pie, pivot_table, mixed_timeseries, handlebars, and big_number charts previously skipped dataset column validation entirely (silent false-pass). All 7 types now run column existence and aggregation checks.
BigNumber trendline check: Moved out of the monolithic dispatcher into BigNumberChartPlugin.post_map_validate().
Runtime validator decoupling: Removed isinstance(config, XYChartConfig) hardcoding from RuntimeValidator; XY-specific format/cardinality checks now live in XYChartPlugin.get_runtime_warnings().
Stale docstring: generate_chart.py listed only 'xy' and 'table' as valid chart types; updated to list all 7.
Pydantic error handling: Added missing pie, pivot_table, mixed_timeseries cases to _enhance_validation_error; refactored to a data-driven lookup table to reduce cyclomatic complexity.
Registry bootstrap: Registry self-bootstraps on first access so unit tests running without app.py get a fully populated registry.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only change, no UI modifications.

TESTING INSTRUCTIONS

# Run the MCP unit tests for chart functionality
pytest tests/unit_tests/mcp_service/chart/ -x -q

# Spot-check column validation now works for previously-unvalidated types
pytest tests/unit_tests/mcp_service/chart/validation/ -x -q

Manual verification:

Start the MCP service: python -m superset.mcp_service.server
Create a pie chart with a non-existent dimension column → should get a column_not_found error with fuzzy suggestions (previously: silent pass, runtime failure)
Create a big_number chart with show_trendline=true but no temporal_column → should error at validation
Run generate_chart for all 7 chart types to confirm no regressions

ADDITIONAL INFORMATION

netlify · 2026-05-06T23:01:17Z

✅ Deploy Preview for superset-docs-preview ready!

Name	Link
🔨 Latest commit	`7dc3563`
🔍 Latest deploy log	https://app.netlify.com/projects/superset-docs-preview/deploys/6a0e25c10bff6d0008bd723e
😎 Deploy Preview	https://deploy-preview-39922--superset-docs-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

bito-code-review · 2026-05-07T15:26:13Z

The inline context is insufficient to determine what 'this' refers to in the question. No diff hunk or suggestion snippet is provided for the thread in superset/mcp_service/chart/chart_utils.py.

Copilot

Pull request overview

This PR introduces a chart-type plugin architecture for the MCP chart pipeline, centralizing per-chart-type behavior (pre-validation, column extraction/normalization, form_data mapping, runtime warnings, naming, viz_type resolution) into one plugin per supported chart_type. This reduces the previous need to update multiple dispatch sites when adding or modifying chart types.

Changes:

Added a ChartTypePlugin protocol + BaseChartPlugin, a global plugin registry, and 7 built-in chart plugins (xy, table, pie, pivot_table, mixed_timeseries, handlebars, big_number).
Refactored schema/dataset/runtime validation and form_data mapping to dispatch via the registry instead of isinstance / if/elif chains.
Enhanced tool/docs output by expanding generate_chart docstring chart types and adding chart_type_display_name to ChartInfo.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
superset/mcp_service/chart/validation/schema_validator.py	Switch pre-validation dispatch to registry/plugins; refactor Pydantic error enhancement.
superset/mcp_service/chart/validation/runtime/init.py	Replace XY-only runtime checks with per-plugin runtime warning dispatch.
superset/mcp_service/chart/validation/dataset_validator.py	Delegate column extraction/normalization to plugins; apply aggregation validation for all chart types.
superset/mcp_service/chart/tool/generate_chart.py	Update tool docstring to list all supported `chart_type` values.
superset/mcp_service/chart/schemas.py	Add `chart_type_display_name` to `ChartInfo`; resolve display name via registry mapping.
superset/mcp_service/chart/registry.py	Introduce global registry for chart plugins + viz_type display-name lookup.
superset/mcp_service/chart/plugin.py	Add plugin protocol and base class defining the plugin responsibilities.
superset/mcp_service/chart/plugins/init.py	Register the built-in plugins on import.
superset/mcp_service/chart/plugins/xy.py	Implement XY plugin (pre-validate, refs, mapping, naming, viz_type, runtime warnings).
superset/mcp_service/chart/plugins/table.py	Implement Table plugin.
superset/mcp_service/chart/plugins/pie.py	Implement Pie plugin.
superset/mcp_service/chart/plugins/pivot_table.py	Implement Pivot Table plugin.
superset/mcp_service/chart/plugins/mixed_timeseries.py	Implement Mixed Timeseries plugin.
superset/mcp_service/chart/plugins/handlebars.py	Implement Handlebars plugin.
superset/mcp_service/chart/plugins/big_number.py	Implement Big Number plugin including post-map trendline temporal validation.
superset/mcp_service/chart/chart_utils.py	Replace mapping/name/viz_type dispatch with registry/plugin calls.
superset/mcp_service/app.py	Import plugins at MCP app startup; update default instructions about chart type naming.

codecov · 2026-05-07T22:31:19Z

Codecov Report

❌ Patch coverage is 34.33476% with 459 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.12%. Comparing base (a183582) to head (42a293a).
⚠️ Report is 46 commits behind head on master.

Files with missing lines	Patch %	Lines
superset/mcp_service/chart/plugins/xy.py	28.75%	57 Missing ⚠️
...rset/mcp_service/chart/plugins/mixed_timeseries.py	26.86%	49 Missing ⚠️
superset/mcp_service/chart/registry.py	42.35%	44 Missing and 5 partials ⚠️
superset/mcp_service/chart/plugins/big_number.py	28.78%	47 Missing ⚠️
superset/mcp_service/chart/plugins/handlebars.py	29.03%	44 Missing ⚠️
superset/mcp_service/chart/plugins/pivot_table.py	31.03%	40 Missing ⚠️
superset/mcp_service/chart/plugins/pie.py	37.50%	30 Missing ⚠️
superset/mcp_service/chart/plugins/table.py	40.00%	27 Missing ⚠️
.../mcp_service/chart/validation/dataset_validator.py	14.81%	23 Missing ⚠️
superset/mcp_service/chart/plugin.py	64.70%	18 Missing ⚠️
... and 6 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #39922      +/-   ##
==========================================
- Coverage   64.18%   64.12%   -0.07%     
==========================================
  Files        2591     2602      +11     
  Lines      138471   139388     +917     
  Branches    32120    32254     +134     
==========================================
+ Hits        88883    89381     +498     
- Misses      48056    48469     +413     
- Partials     1532     1538       +6

Flag	Coverage Δ
hive	`39.47% <34.33%> (+0.06%)`	⬆️
mysql	`58.84% <34.33%> (-0.21%)`	⬇️
postgres	`58.92% <34.33%> (-0.21%)`	⬇️
presto	`41.13% <34.33%> (+0.04%)`	⬆️
python	`60.47% <34.33%> (-0.10%)`	⬇️
sqlite	`58.57% <34.33%> (-0.21%)`	⬇️
unit	`100.00% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

aminghadersohi · 2026-05-13T17:41:27Z

Internal architecture review — findings & resolutions

Ran an independent codex review of the plugin registry design. Summary of findings and their status:

Finding	Severity	Status
`_ensure_plugins_loaded()` flag not lock-protected — race condition on concurrent first-calls	MEDIUM	✅ Fixed in `eface3b` — added `threading.Lock` with double-checked locking
Duplicate `register()` call silently overwrites	MEDIUM	Already handled — `registry.py` lines 68-71 emit `logger.warning` on overwrite, no silent overwrite
`extract_column_refs()` returns `[]` for unrecognised configs	MEDIUM	Accepted — correct for the 7 known types (plugins always registered); unknown types have no column schema to validate against
`BaseChartPlugin.extract_column_refs()` no-op default	NIT	Accepted — base class provides a safe default; concrete plugins override for their types
Local imports in `chart_utils.py` / `dataset_validator.py`	NIT	Already addressed in previous commit with explanatory comments on why they must stay local (circular dependency prevention)

The thread-safety fix (eface3bf) is the only code change — the others are either already handled or accepted design decisions.

…generation Replaces four scattered dispatch locations (schema_validator, dataset_validator, chart_utils, runtime validator) with a central ChartTypePlugin registry. Each of the 7 supported chart types (xy, table, pie, pivot_table, mixed_timeseries, handlebars, big_number) now owns its pre-validation, column extraction, form_data mapping, post-map validation, column normalization, and runtime warnings in a single plugin class. Key changes: - Add ChartTypePlugin protocol and BaseChartPlugin base class (plugin.py) - Add ChartTypeRegistry with register/get/all_types helpers (registry.py) - Add 7 chart type plugins under chart/plugins/ with full coverage - Fix 5-type column validation gap: pie, pivot_table, mixed_timeseries, handlebars, and big_number now participate in dataset column validation (previously silently skipped) - Move BigNumber trendline temporal check to BigNumberChartPlugin.post_map_validate() - Add get_runtime_warnings() to plugin protocol; XYChartPlugin implements format/cardinality checks, removing isinstance(config, XYChartConfig) from RuntimeValidator - Fix stale generate_chart.py docstring listing only 'xy' and 'table' chart types - Add missing pie, pivot_table, mixed_timeseries handlers to _enhance_validation_error; refactor into a data-driven lookup table to stay within complexity limits - Fix empty details fallback in Pydantic error handler

Each ChartTypePlugin now declares: - display_name: human-readable label for the chart_type discriminator (e.g. "Line / Bar / Area / Scatter Chart", "Pivot Table") - native_viz_types: dict mapping every Superset-internal viz_type the plugin produces to a user-friendly name (e.g. {"echarts_timeseries_line": "Line Chart", "echarts_area": "Area Chart"}) The registry gains display_name_for_viz_type(viz_type) which searches all plugins' native_viz_types maps, replacing the need for a separate viz_type_display_names.json or viz_type_names.py module. ChartInfo gains a chart_type_display_name field populated via the registry, so list_charts / get_chart_info return human-readable chart type names. The MCP system instructions now reference display names rather than internal viz_type identifiers.

H1: Delete 7 dead _pre_validate_* static methods from SchemaValidator — exact duplicates of plugin pre_validate() methods, never called after _pre_validate_chart_type() was updated to delegate to plugin. H2: Inline DatasetValidator._normalize_xy_config/_normalize_table_config into XYChartPlugin/TableChartPlugin.normalize_column_refs() and delete both DatasetValidator helper methods. The 5 other plugins already called _get_canonical_column_name directly; XY and Table now match. H3: Add generate_name()/resolve_viz_type() to ChartTypePlugin protocol and BaseChartPlugin, implement in all 7 plugins. Replace the 7-arm isinstance chain in generate_chart_name() and the 7-arm elif chain in _resolve_viz_type() with single-line registry dispatch. H4: Add a sync comment above _CHART_TYPE_ERROR_HINTS to document that it must stay in sync with the plugin registry. M4: Move logger=logging.getLogger(__name__) from inside XYChartPlugin.get_runtime_warnings() to module scope.

…xes, test repairs On top of the dead-code elimination in the previous commit: - Add lazy _ensure_plugins_loaded() bootstrap to ChartTypeRegistry so the registry is populated even without importing app.py (fixes isolated test runs) - Delegate _RegistryProxy methods to module-level functions so bootstrap runs - Guard register() against empty chart_type strings - Add generate_name + resolve_viz_type to ChartTypePlugin Protocol and BaseChartPlugin; delegate generate_chart_name/_resolve_viz_type in chart_utils to the plugin registry - Add _with_context static helper to BaseChartPlugin (shared by all plugins) - Fix stale 'five methods' → 'eight methods' docstring in plugin.py - Add TypeVar _C to normalize_column_names so mypy infers correct return type - Fix broken tests: update _pre_validate_big_number_config → _pre_validate_chart_type, remove deleted TestNormalizeXYConfig/TestNormalizeTableConfig classes, update runtime validator tests for removed _validate_format_compatibility / _validate_cardinality methods, add x is not None narrowing guards Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nal corrections, cardinality suggestions - Remove redundant local imports from BigNumberChartPlugin.post_map_validate() now that BigNumberChartConfig and is_column_truly_temporal are at top level - Add explanatory comments on the two remaining local get_registry imports in chart_utils.py and dataset_validator.py (circular import prevention) - Fix schema_validator.py and generate_chart.py docstring: XY 'x' field is optional (defaults to dataset primary datetime column), not required - Propagate cardinality suggestions alongside warnings in XYChartPlugin - Clarify app.py instructions: chart_type_display_name is null for viz_types outside the 7 generate_chart-supported types Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All per-method local imports in the 7 chart plugins were moved to module-level. None of them create circular imports: schemas.py, chart_utils.py, and dataset_validator.py are safe to import at plugin load time because those modules guard their own registry lookups with local imports. - big_number: add map_big_number_config, _big_number_chart_what, _summarize_filters, DatasetValidator to top-level imports - pie: add map_pie_config, _pie_chart_what, _summarize_filters, PieChartConfig, DatasetValidator to top-level imports - xy: add map_xy_config, _xy_chart_what/context, XYChartConfig, DatasetValidator, FormatTypeValidator, CardinalityValidator to top-level imports - table: add map_table_config, _table_chart_what, _summarize_filters, TableChartConfig, DatasetValidator to top-level imports - pivot_table: add map_pivot_table_config, _pivot_table_what, _summarize_filters, PivotTableChartConfig, DatasetValidator to top-level imports - mixed_timeseries: add map_mixed_timeseries_config, _mixed_timeseries_what, _summarize_filters, MixedTimeseriesChartConfig, DatasetValidator to top-level - handlebars: add map_handlebars_config, _handlebars_chart_what, _summarize_filters, HandlebarsChartConfig, DatasetValidator to top-level imports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Split long string literal in schema_validator.py line 202 (E501, 94 > 88 chars) - Apply ruff format auto-fixes to big_number.py, handlebars.py, and test_get_chart_data.py

- Move error_schemas import above _C TypeVar definition (E402) - Split two over-length comment lines to ≤88 chars (E501, lines 268 and 380)

_ensure_plugins_loaded() used an unprotected boolean flag, making it unsafe under concurrent first-call scenarios (e.g. gunicorn multi-thread workers). Double-checked locking with threading.Lock eliminates the race. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

update_chart was only running SchemaValidator + Tier-2 compile check, silently skipping DatasetValidator's column-existence + fuzzy-match and column-name normalisation layers that generate_chart runs. A typo like {name: "reveneu"} would save the broken chart and only surface as a render-time failure in the browser. Now matches generate_chart pipeline: - Layer 2: DatasetValidator.validate_against_dataset() — column existence check with fuzzy-match "did you mean?" suggestions returned to the LLM before any DB write occurs - Layer 4: DatasetValidator.normalize_column_names() — case normalisation so "order_date" resolves to "OrderDate" if that is the canonical dataset name Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nts, remove column regex P1.1 registry.py: move _plugins_loaded=True to after successful import so a failed load doesn't permanently poison the registry. P1.3 schemas.py: remove overly restrictive ColumnRef.name / FilterClause.column / BigNumberChartConfig.temporal_column regex that blocked valid column names containing parentheses, slashes, and other SQL-common characters. P2.3 (DRY): eliminate _CHART_TYPE_ERROR_HINTS second-registry in schema_validator.py by adding schema_error_hint() to ChartTypePlugin protocol, BaseChartPlugin default, and all 7 plugin classes. SchemaValidator now delegates to the plugin registry instead of maintaining a parallel dict. P3.3 test_registry.py: add full registry unit-test coverage (register, get, all_types, is_registered, display_name_for_viz_type, proxy methods, duplicate warning, empty chart_type validation, insertion-order guarantee). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… column validation Split an 89-char comment line and an over-limit condition in update_chart.py to satisfy the ruff E501 rule. Also applied ruff format. Two TestUpdateChartValidationGate tests expected CHART_VALIDATION_FAILED but received CHART_DATASET_NOT_FOUND because _validate_update_against_dataset calls DatasetValidator.validate_against_dataset before validate_and_compile, and the existing mocks provided a Mock() object for chart.datasource whose .id attribute is an auto-generated MagicMock (not a real int). Added a patch for DatasetValidator.validate_against_dataset returning (True, None) so the column-validation tier is bypassed and the test reaches the mocked validate_and_compile response as intended. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…onfig Introduces a dynamic filter layer in the chart type registry so operators can disable individual plugins (e.g. `handlebars`) without a code deploy: - `MCP_DISABLED_CHART_PLUGINS: frozenset[str]` — static deny-list in mcp_config.py - `MCP_CHART_PLUGIN_ENABLED_FUNC: Callable[[str], bool] | None` — dynamic hook for Harness/Split/per-user targeting; takes precedence over the deny-list - Both keys are propagated through `get_mcp_config()` defaults registry.py changes: - `_PluginFilterConfig` frozen dataclass replaces two bare globals so configure() replaces them atomically (no torn reads under concurrency) - `configure(disabled, enabled_func)` — called at app init; accepts any iterable for `disabled`; validates `enabled_func` is callable - `_is_plugin_enabled()` — reads config once, fails closed on callable exception - `get()` / `all_types()` / `is_enabled()` apply the filter at lookup time; `is_registered()` and `display_name_for_viz_type()` intentionally bypass it so callers can distinguish "unknown" vs "disabled" and existing charts still resolve display names for disabled viz types schema_validator.py: two-step pre-check — `is_registered()` for unknown types, `is_enabled()` for disabled ones, with distinct `DISABLED_CHART_TYPE` error code. Wiring: - `SupersetAppInitializer.configure_mcp_chart_registry()` called after `configure_feature_flags()` in `init_app()` - `flask_singleton.py` re-calls `registry.configure()` after the MCP config overlay so MCP-specific overrides in `superset_config.py` take effect in standalone MCP mode Tests: 28 cases in test_registry_filters.py covering deny-list, callable hook, fail-closed on exception, all_types() filtering, display_name bypass, atomic reconfigure, and configure() with list/tuple/frozenset inputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codeant-ai-for-open-source · 2026-05-21T10:22:40Z

+        if config_dict.get("metric") and not config_dict["metric"].get("saved_metric"):
+            config_dict["metric"]["name"] = DatasetValidator._get_canonical_column_name(
+                config_dict["metric"]["name"], dataset_context
+            )


Suggestion: Saved metrics are skipped during name normalization, so case-insensitive validation can pass but the emitted metric string can still use the wrong casing and fail later in Superset's exact-name metric lookup. Normalize saved metric names against dataset metrics (not just raw columns) so the final form_data always uses canonical metric names. [api mismatch]

Severity Level: Critical 🚨

- ❌ Big Number charts with mis-cased saved metrics fail. - ⚠️ Validation passes but Superset query errors at runtime. - ⚠️ LLM clients see confusing success-then-fail behavior.

Steps of Reproduction ✅

1. Define a dataset in Superset with a saved metric whose name differs in case from the desired client input, e.g. metric_name="Revenue" (saved metric) on a dataset that ValidationPipeline will load via DatasetValidator._get_dataset_context at `superset/mcp_service/chart/validation/dataset_validator.py:197-252`. 2. Call the MCP tool `generate_chart` at `superset/mcp_service/chart/tool/generate_chart.py:95` with a Big Number config using the saved metric but with different casing, e.g. `"metric": {"name": "revenue", "saved_metric": true}` in `GenerateChartRequest.config` (chart_type="big_number"). 3. ValidationPipeline.validate_request_with_warnings at `superset/mcp_service/chart/validation/pipeline.py:90-55` invokes DatasetValidator.validate_against_dataset at `dataset_validator.py:55-72`, which calls `_validate_saved_metrics` at `dataset_validator.py:485-46`; this checks `col_ref.name.lower()` against `available_metrics` lowercased, so `"revenue"` is accepted as a valid saved metric even though its canonical name is `"Revenue"`. 4. The same pipeline then normalizes column names via DatasetValidator.normalize_column_names at `dataset_validator.py:94-141`, which delegates to BigNumberChartPlugin.normalize_column_refs at `superset/mcp_service/chart/plugins/big_number.py:189-203`; because the metric has `"saved_metric": true`, the guard at lines 192-195 skips `_get_canonical_column_name`, so the metric name remains `"revenue"` and is never canonicalized to `"Revenue"`. 5. After validation, `generate_chart` calls `map_config_to_form_data` at `superset/mcp_service/chart/chart_utils.py:93-131`, which routes to BigNumberChartPlugin.to_form_data at `big_number.py:132-135`; `map_big_number_config` at `chart_utils.py:763-799` builds form_data using `create_metric_object(config.metric)` at `chart_utils.py:11-51`, which returns the raw `col.name` string `"revenue"` for saved metrics. 6. When Superset executes the query, its SQLA connector resolves metrics via a `metrics_by_name` dict (see docstring and usage at `chart_utils.py:491-499` and `superset/connectors/sqla/models.py:1758-38`); because the dict keys are the canonical metric names (e.g. `"Revenue"`), the mis-cased `"revenue"` key is missing and Superset raises `QueryObjectValidationError("Metric '%(metric)s' does not exist")`, causing the Big Number chart generation to fail at runtime even though validation passed.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** superset/mcp_service/chart/plugins/big_number.py **Line:** 192:195 **Comment:** *Api Mismatch: Saved metrics are skipped during name normalization, so case-insensitive validation can pass but the emitted metric string can still use the wrong casing and fail later in Superset's exact-name metric lookup. Normalize saved metric names against dataset metrics (not just raw columns) so the final form_data always uses canonical metric names. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

The normalization is already in place. Both big_number.py and pie.py branch on saved_metric in normalize_column_refs: when saved_metric=True, the name is resolved through DatasetValidator._get_canonical_metric_name(), which performs a case-insensitive lookup exclusively against available_metrics and returns the canonical dataset name. Non-saved metrics go through _get_canonical_column_name instead. The described bug doesn't exist in the current code.

codeant-ai-for-open-source · 2026-05-21T10:22:40Z

+        def _norm_col_list(key: str) -> None:
+            if config_dict.get(key):
+                for col in config_dict[key]:
+                    col["name"] = DatasetValidator._get_canonical_column_name(
+                        col["name"], dataset_context
+                    )
+
+        _norm_col_list("rows")
+        _norm_col_list("metrics")
+        _norm_col_list("columns")


Suggestion: Metrics are normalized with a helper that prefers matching dataset columns before metrics, so a saved metric that case-insensitively collides with a column name can be rewritten to the column identifier and stop resolving as a saved metric. Handle metric normalization with metric-aware precedence (or branch on saved_metric) to preserve metric identity. [logic error]

Severity Level: Critical 🚨

- ❌ Pivot table charts can fail on valid saved metrics. - ⚠️ Column-first canonicalization corrupts saved metric identity. - ⚠️ Users get runtime metric-not-found errors after validation.

Steps of Reproduction ✅

1. Consider a dataset where a physical column and a saved metric share the same case-insensitive name but differ in canonical casing, e.g. available_columns includes {"name": "Revenue"} and available_metrics includes {"name": "REVENUE"} as built in DatasetValidator._get_dataset_context at `superset/mcp_service/chart/validation/dataset_validator.py:197-252`. 2. A client calls the MCP `generate_chart` tool at `superset/mcp_service/chart/tool/generate_chart.py:95` with a pivot_table config referencing the saved metric using a lowercased name, e.g. `"metrics": [{"name": "revenue", "saved_metric": true}]` in `PivotTableChartConfig` (`superset/mcp_service/chart/schemas.py:243-261`). 3. ValidationPipeline.validate_request_with_warnings at `superset/mcp_service/chart/validation/pipeline.py:90-55` runs DatasetValidator.validate_against_dataset at `dataset_validator.py:55-72`, which calls `_validate_saved_metrics` at `dataset_validator.py:485-46`; this checks `col_ref.name.lower()` against `available_metrics` lowercased, so `"revenue"` is accepted as a valid saved metric in spite of the ambiguous column name. 4. In the column-normalization layer, ValidationPipeline._normalize_column_names at `validation/pipeline.py:137-181` calls DatasetValidator.normalize_column_names at `dataset_validator.py:94-141`, which looks up the PivotTableChartPlugin and invokes its normalize_column_refs at `superset/mcp_service/chart/plugins/pivot_table.py:120-134`; `_norm_col_list("metrics")` at lines 123-131 runs `_get_canonical_column_name` at `dataset_validator.py:46-77` for each metric, and because that helper checks `available_columns` before `available_metrics`, the shared lowercase name `"revenue"` is canonicalized to the column's name `"Revenue"` instead of the saved metric's canonical name `"REVENUE"`. 5. The now-normalized config is used in `generate_chart` to build Superset form_data via map_config_to_form_data at `chart_utils.py:93-131`, which routes to PivotTableChartPlugin.to_form_data at `pivot_table.py:107-110`; `map_pivot_table_config` at `superset/mcp_service/chart/chart_utils.py:841-871` builds the `"metrics"` list by calling `create_metric_object(col)` at `chart_utils.py:11-51`, which returns the string `"Revenue"` for the saved metric because `col.saved_metric` is still True. 6. At query execution time, Superset resolves metric strings via `metrics_by_name` (see docstring at `chart_utils.py:491-499` and access pattern at `superset/connectors/sqla/models.py:1758-38`); this dict is keyed by the canonical saved metric name `"REVENUE"`, so the rewritten `"Revenue"` key is missing, causing `QueryObjectValidationError("Metric '%(metric)s' does not exist")` and breaking pivot table chart execution for datasets where saved metric names collide case-insensitively with column names.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** superset/mcp_service/chart/plugins/pivot_table.py **Line:** 123:132 **Comment:** *Logic Error: Metrics are normalized with a helper that prefers matching dataset columns before metrics, so a saved metric that case-insensitively collides with a column name can be rewritten to the column identifier and stop resolving as a saved metric. Handle metric normalization with metric-aware precedence (or branch on `saved_metric`) to preserve metric identity. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai-for-open-source · 2026-05-21T10:23:34Z

+        if "x" not in config:
+            missing_fields.append("'x' (X-axis temporal column)")
+        if "y" not in config:
+            missing_fields.append("'y' (primary Y-axis metrics)")
+        if "y_secondary" not in config:
+            missing_fields.append("'y_secondary' (secondary Y-axis metrics)")


Suggestion: The pre-validation only checks for x, y, and y_secondary keys, but the schema explicitly accepts aliases (x_axis, metrics, metrics_b). Valid payloads that use those aliases will be rejected before Pydantic parsing, causing false validation failures. Make pre-validation honor the same aliases as the schema. [api mismatch]

Severity Level: Major ⚠️

- ❌ SchemaValidator rejects mixed_timeseries configs using alias keys. - ⚠️ Direct callers cannot rely on x_axis/metrics/metrics_b aliases.

Steps of Reproduction ✅

1. Use the schema-level entrypoint `SchemaValidator.validate_request` at `superset/mcp_service/chart/validation/schema_validator.py:40-57` with a raw request dict shaped like: `{"dataset_id": 1, "config": {"chart_type": "mixed_timeseries", "x_axis": {"name": "ds"}, "metrics": [{"name": "revenue", "aggregate": "SUM"}], "metrics_b": [{"name": "orders", "aggregate": "COUNT"}]}}`. 2. Inside `_pre_validate` at `schema_validator.py:64-143`, the function extracts `config = data["config"]` and dispatches to `_pre_validate_chart_type(chart_type, config)` at `schema_validator.py:145-173`, passing this raw config dict (still using the alias keys `x_axis`, `metrics`, and `metrics_b`). 3. `_pre_validate_chart_type` looks up the plugin from the registry and calls `plugin.pre_validate(config)` at `schema_validator.py:190-201`; for `chart_type="mixed_timeseries"` this resolves to `MixedTimeseriesChartPlugin.pre_validate` defined in `superset/mcp_service/chart/plugins/mixed_timeseries.py:44-55`. 4. `MixedTimeseriesChartPlugin.pre_validate` only checks for `"x"`, `"y"`, and `"y_secondary"` keys (lines 50-55) and does not recognize the schema-supported aliases `x_axis`, `metrics`, and `metrics_b` defined on `MixedTimeseriesChartConfig` in `superset/mcp_service/chart/schemas.py:18-51`, so `missing_fields` becomes non-empty, a `ChartGenerationError` is returned, and `SchemaValidator.validate_request` exits early at line 51, rejecting a payload that would successfully parse via Pydantic if alias keys were honored.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** superset/mcp_service/chart/plugins/mixed_timeseries.py **Line:** 50:55 **Comment:** *Api Mismatch: The pre-validation only checks for `x`, `y`, and `y_secondary` keys, but the schema explicitly accepts aliases (`x_axis`, `metrics`, `metrics_b`). Valid payloads that use those aliases will be rejected before Pydantic parsing, causing false validation failures. Make pre-validation honor the same aliases as the schema. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai-for-open-source · 2026-05-21T10:23:34Z

+        return None
+
+    def extract_column_refs(self, config: Any) -> list[ColumnRef]:
+        if not isinstance(config, MixedTimeseriesChartConfig):
+            return []
+        refs: list[ColumnRef] = [config.x]


Suggestion: Metric name normalization rewrites every entry in y/y_secondary without checking saved_metric, so saved metrics can be incorrectly remapped to similarly named dataset columns (column lookup is prioritized), breaking saved-metric resolution later. Skip canonical-column normalization for items marked as saved_metric. [logic error]

Severity Level: Critical 🚨

- ❌ Mixed_timeseries charts using saved metrics may fail at runtime. - ⚠️ Saved metrics shadowed by columns resolve to wrong identifiers.

Steps of Reproduction ✅

1. Call the MCP tool `generate_chart` defined in `superset/mcp_service/chart/tool/generate_chart.py:95-132` with a `GenerateChartRequest` whose `config.chart_type` is `"mixed_timeseries"` and whose `y`/`y_secondary` lists contain saved metrics, e.g. `{"name": "count", "saved_metric": true}` as described by `ColumnRef.saved_metric` in `superset/mcp_service/chart/schemas.py:27-58`. 2. Inside `generate_chart`, the code invokes `ValidationPipeline.validate_request_with_warnings(request.model_dump())` at `generate_chart.py:205-210`; in `ValidationPipeline.validate_request_with_warnings` (`superset/mcp_service/chart/validation/pipeline.py:90-145`), after schema and dataset validation, it calls `_normalize_column_names` at `pipeline.py:227-260`. 3. `_normalize_column_names` delegates to `DatasetValidator.normalize_column_names(config, dataset_id, dataset_context)` at `superset/mcp_service/chart/validation/dataset_validator.py:351-398`; this looks up the `mixed_timeseries` plugin from the registry and calls `plugin.normalize_column_refs(config, dataset_context)`, which is `MixedTimeseriesChartPlugin.normalize_column_refs` at `superset/mcp_service/chart/plugins/mixed_timeseries.py:84-106`. 4. In `normalize_column_refs`, helper `_norm_list` (lines 93-98) iterates over every entry in `y` and `y_secondary` and unconditionally rewrites `col["name"]` using `DatasetValidator._get_canonical_column_name(col["name"], dataset_context)`, ignoring `col.get("saved_metric")`; `_get_canonical_column_name` at `dataset_validator.py:303-334` searches `dataset_context.available_columns` first and only then `available_metrics`, so in a dataset that defines both a column `"COUNT"` and a saved metric `"count"`, a saved-metric reference `{"name": "count", "saved_metric": true}` in `y` is normalized to `"COUNT"`. Later, `map_mixed_timeseries_config` in `superset/mcp_service/chart/chart_utils.py:21-46` calls `create_metric_object` on these `ColumnRef` objects; for saved metrics it returns `col.name` (now `"COUNT"`) as the metric identifier. Because Superset's metric lookup expects the original metric name (e.g. `"count"`), the chart's metrics no longer resolve correctly, causing mixed_timeseries charts using such saved metrics to fail or behave incorrectly at query time.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** superset/mcp_service/chart/plugins/mixed_timeseries.py **Line:** 93:98 **Comment:** *Logic Error: Metric name normalization rewrites every entry in `y`/`y_secondary` without checking `saved_metric`, so saved metrics can be incorrectly remapped to similarly named dataset columns (column lookup is prioritized), breaking saved-metric resolution later. Skip canonical-column normalization for items marked as `saved_metric`. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai-for-open-source · 2026-05-21T10:25:32Z

+    is_col_valid, col_error = DatasetValidator.validate_against_dataset(
+        parsed_config, dataset.id
+    )
+    if not is_col_valid and col_error is not None:
+        logger.warning(
+            "update_chart column validation failed for chart %s: %s",
+            getattr(chart, "id", None),
+            col_error,
+        )
+        return GenerateChartResponse.model_validate(
+            {
+                "chart": None,
+                "error": col_error.model_dump(),
+                "success": False,
+                "schema_version": "2.0",
+                "api_version": "v1",
+            }
+        )


Suggestion: This adds a full dataset validation call before validate_and_compile, but validate_and_compile already performs the same dataset validation internally using the already-loaded dataset object. The extra call triggers redundant dataset context fetching and duplicate validation work on every update request, increasing latency and database load. [performance]

Severity Level: Major ⚠️

- ⚠️ update_chart validation calls DatasetValidator twice per request. - ⚠️ Extra dataset queries increase chart-update latency and load.

Steps of Reproduction ✅

1. Call the MCP `update_chart` tool defined at `superset/mcp_service/chart/tool/update_chart.py:304-595` with `generate_preview=False` and a non-null `config` so that `parsed_config` is set (`update_chart.py:413-415`) and a full visualization update runs. 2. After building the update payload via `_build_update_payload` (`update_chart.py:87-121`), `update_chart` extracts `new_form_data` and, before persisting, invokes `_validate_update_against_dataset(parsed_config, new_form_data, chart)` inside the `mcp.update_chart.validation` log context at `update_chart.py:450-453`. 3. Inside `_validate_update_against_dataset` (`update_chart.py:167-251`), once the dataset ORM object is resolved (`dataset` from `DatasetDAO.find_by_id` at 178-180), the new code block at lines 199-205 calls `DatasetValidator.validate_against_dataset(parsed_config, dataset.id)` without passing a `dataset_context`, causing `DatasetValidator` to fetch dataset context from the database via `_get_dataset_context` (`superset/mcp_service/chart/validation/dataset_validator.py:197-253`) and run full column and aggregation validation once. 4. Immediately afterwards, `_validate_update_against_dataset` calls `validate_and_compile(parsed_config, form_data, dataset, run_compile_check=True)` at `update_chart.py:222-224`, and `validate_and_compile` in `superset/mcp_service/chart/compile.py:19-84` builds a new `dataset_context` from the same ORM dataset and calls `DatasetValidator.validate_against_dataset(config, dataset.id, dataset_context=dataset_context)` again (compile.py:48-52), resulting in two complete Tier-1 dataset validation passes and two dataset-context constructions for every update_chart request.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** superset/mcp_service/chart/tool/update_chart.py **Line:** 203:220 **Comment:** *Performance: This adds a full dataset validation call before `validate_and_compile`, but `validate_and_compile` already performs the same dataset validation internally using the already-loaded dataset object. The extra call triggers redundant dataset context fetching and duplicate validation work on every update request, increasing latency and database load. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

Add _get_canonical_metric_name() to DatasetValidator that searches only available_metrics, preventing a column with matching case-insensitive name from shadowing a saved metric's canonical casing. Update all 7 chart plugins (xy, table, pie, big_number, handlebars, mixed_timeseries, pivot_table) to branch on saved_metric flag: saved metrics now go through _get_canonical_metric_name while regular column refs continue to use _get_canonical_column_name. Fix pre_validate alias handling in xy and mixed_timeseries plugins to accept Pydantic AliasChoices keys (metrics/x_axis/metrics_b) so payloads using canonical Superset field names are not incorrectly rejected. Add TestGetCanonicalMetricName, TestSavedMetricNormalizationCorrectness, and TestPreValidateAliasHandling test classes covering the collision case and alias acceptance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A saved metric and a regular column with the same input name resolve to different display labels after normalization (saved metrics use the dataset's actual casing). Using a plain string key incorrectly flags them as duplicates; keying on (saved_metric, label) avoids the false collision. Fixes test_xy_saved_metric_uses_metric_casing.

bito-code-review

Code Review Agent Run #8680f9

Actionable Suggestions - 7

superset/mcp_service/chart/schemas.py - 1
- CWE-390: Bare Exception Handler · Line 491-496
superset/mcp_service/chart/plugin.py - 1
- Invalid type annotation string · Line 175-175
superset/mcp_service/chart/tool/update_chart.py - 2
- Redundant DB query in validation · Line 203-204
- Missing test coverage for normalization · Line 416-432
superset/mcp_service/chart/plugins/handlebars.py - 1
- Duplicate validation with schema · Line 44-124
tests/unit_tests/mcp_service/chart/validation/test_runtime_validator.py - 1
- Test coverage regression · Line 213-234
superset/mcp_service/chart/validation/runtime/__init__.py - 1
- Avoid catching blind Exception · Line 107-107

Additional Suggestions - 6

superset/initialization/__init__.py - 1
- Missing error handling for MCP init · Line 750-750
  
  Wrap the call to `configure_mcp_chart_registry()` in a try-except block to catch any exceptions from `registry.configure()` and log them so that `init_app()` can continue even if MCP chart registry configuration fails.

superset/mcp_service/chart/validation/schema_validator.py - 2

Import inside function body · Line 151-152

The `from superset.mcp_service.chart.registry import get_registry` import is placed inside the function body at lines 151-152. This causes the import to execute on every call to `_pre_validate_chart_type`. Move it to the module-level imports for better performance.

Code suggestion

--- superset/mcp_service/chart/validation/schema_validator.py
+++ superset/mcp_service/chart/validation/schema_validator.py
 @@ -29,6 +29,7 @@ from superset.mcp_service.chart.schemas import (
  )
  from superset.mcp_service.common.error_schemas import ChartGenerationError
+from superset.mcp_service.chart.registry import get_registry
 
  logger = logging.getLogger(__name__)
 
 @@ -148,9 +149,6 @@ class SchemaValidator:
          config: Dict[str, Any],
      ) -> Tuple[bool, ChartGenerationError | None]:
          """Validate chart type and dispatch to plugin pre-validation."""
-        from superset.mcp_service.chart.registry import get_registry
-
          registry = get_registry()

Misleading variable name · Line 176-176

Variable `valid_types` on line 176 is misleading — `registry.all_types()` returns only enabled chart types (per registry.py:168), so the name suggests all registered types when it actually contains only enabled ones. Rename to `enabled_types` for clarity.

Code suggestion

--- superset/mcp_service/chart/validation/schema_validator.py (lines 175-182)
+++ superset/mcp_service/chart/validation/schema_validator.py
 @@ -173,12 +173,12 @@ class SchemaValidator:
              )
 
          if not registry.is_enabled(chart_type):
-            valid_types = ", ".join(registry.all_types())
+            enabled_types = ", ".join(registry.all_types())
              return False, ChartGenerationError(
                  error_type="disabled_chart_type",
                  message=f"Chart type '{chart_type}' is not enabled on this instance",
                  details=f"Chart type '{chart_type}' is registered but has been "
                  f"disabled by the operator. "
-                f"Enabled chart types: {valid_types}",
+                f"Enabled chart types: {enabled_types}",
                  suggestions=[
-                    f"Use one of the enabled chart types: {valid_types}",
+                    f"Use one of the enabled chart types: {enabled_types}",
                      "Contact your administrator if you believe this is an error",
                  ],
                  error_code="DISABLED_CHART_TYPE",
              )

superset/mcp_service/chart/plugins/xy.py - 3
- Private member access in DatasetValidator · Line 114-114
  
  Accessing private method `_get_canonical_column_name`. Consider using a public API or adding a public wrapper method.
- Private member access in DatasetValidator · Line 115-115
  
  Accessing private method `_get_canonical_metric_name`. Consider using a public API or adding a public wrapper method.
- Private member access in DatasetValidator · Line 129-129
  
  Accessing private method `_normalize_filters`. Consider using a public API or adding a public wrapper method.

Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

superset/mcp_service/chart/registry.py - 1
- Broad exception catch · Line 87-87

Review Details

Files reviewed - 27 · Commit Range: d3af275..f90f1cd
- superset/initialization/__init__.py
- superset/mcp_service/app.py
- superset/mcp_service/chart/chart_utils.py
- superset/mcp_service/chart/plugin.py
- superset/mcp_service/chart/plugins/__init__.py
- superset/mcp_service/chart/plugins/big_number.py
- superset/mcp_service/chart/plugins/handlebars.py
- superset/mcp_service/chart/plugins/mixed_timeseries.py
- superset/mcp_service/chart/plugins/pie.py
- superset/mcp_service/chart/plugins/pivot_table.py
- superset/mcp_service/chart/plugins/table.py
- superset/mcp_service/chart/plugins/xy.py
- superset/mcp_service/chart/registry.py
- superset/mcp_service/chart/schemas.py
- superset/mcp_service/chart/tool/generate_chart.py
- superset/mcp_service/chart/tool/update_chart.py
- superset/mcp_service/chart/validation/dataset_validator.py
- superset/mcp_service/chart/validation/runtime/__init__.py
- superset/mcp_service/chart/validation/schema_validator.py
- superset/mcp_service/flask_singleton.py
- superset/mcp_service/mcp_config.py
- tests/unit_tests/mcp_service/chart/test_big_number_chart.py
- tests/unit_tests/mcp_service/chart/test_registry.py
- tests/unit_tests/mcp_service/chart/test_registry_filters.py
- tests/unit_tests/mcp_service/chart/tool/test_update_chart.py
- tests/unit_tests/mcp_service/chart/validation/test_column_name_normalization.py
- tests/unit_tests/mcp_service/chart/validation/test_runtime_validator.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

/review - Manually triggers a full AI review.
/pause - Pauses automatic reviews on this pull request.
/resume - Resumes automatic reviews.
/resolve - Marks all Bito-posted review comments as resolved.
/abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by

bito-code-review · 2026-05-22T08:19:16Z

+    try:
+        from superset.mcp_service.chart.registry import display_name_for_viz_type
+
+        _display_name = display_name_for_viz_type(_viz_type) if _viz_type else None
+    except Exception:
+        _display_name = None


CWE-390: Bare Exception Handler

Replace the broad except Exception: with except (ImportError, AttributeError): to avoid masking unexpected errors while still handling expected import or attribute issues. (See also: CWE-390)

Code Review Run #8680f9

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

bito-code-review · 2026-05-22T08:19:16Z

+        """
+        ...
+
+    def schema_error_hint(self) -> "ChartGenerationError | None":


Invalid type annotation string

Remove the quotes around the return type annotation in the ChartTypePlugin protocol for schema_error_hint so it reads: def schema_error_hint(self) -> ChartGenerationError | None:.

Code Review Run #8680f9

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

bito-code-review · 2026-05-22T08:19:16Z

+    is_col_valid, col_error = DatasetValidator.validate_against_dataset(
+        parsed_config, dataset.id


Redundant DB query in validation

Lines 203-204 call DatasetValidator.validate_against_dataset(parsed_config, dataset.id) which internally invokes _get_dataset_context(dataset_id) to fetch the dataset from the database. However, _validate_update_against_dataset already has the dataset available from lines 178-180. This causes a redundant database query per update operation. Pass the pre-fetched dataset via the dataset_context parameter to eliminate the duplicate lookup.

Code suggestion

Check the AI-generated fix before applying

--- superset/mcp_service/chart/tool/update_chart.py +++ superset/mcp_service/chart/tool/update_chart.py @@ -200,8 +200,14 @@ def _validate_update_against_dataset( # Column existence + fuzzy-match validation # (mirrors generate_chart pipeline layer 2) from superset.mcp_service.chart.validation.dataset_validator import DatasetValidator + from superset.mcp_service.chart.compile import build_dataset_context_from_orm + + # Reuse the already-fetched dataset to avoid a redundant DB query + dataset_context = build_dataset_context_from_orm(dataset) - is_col_valid, col_error = DatasetValidator.validate_against_dataset( - parsed_config, dataset.id + is_col_valid, col_error = DatasetValidator.validate_against_dataset( + parsed_config, dataset.id, dataset_context=dataset_context )

Code Review Run #8680f9

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

bito-code-review · 2026-05-22T08:19:16Z

+        # Normalize column case to match dataset canonical names
+        # (mirrors generate_chart pipeline layer 4)
+        chart_datasource_id = getattr(chart, "datasource_id", None)
+        if parsed_config is not None and chart_datasource_id is not None:
+            from superset.mcp_service.chart.validation.dataset_validator import (
+                DatasetValidator,
+                NORMALIZATION_EXCEPTIONS,
+            )
+
+            try:
+                parsed_config = DatasetValidator.normalize_column_names(
+                    parsed_config, chart.datasource_id
+                )
+            except NORMALIZATION_EXCEPTIONS as e:
+                logger.warning(
+                    "Column normalization failed for chart %s: %s", chart.id, e
+                )


Missing test coverage for normalization

The new column normalization feature (lines 416-432) has no test coverage. Tests should verify that column names are properly normalized to match dataset canonical names and that exceptions are handled gracefully without crashing the update operation.

Code Review Run #8680f9

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

bito-code-review · 2026-05-22T08:19:17Z

+    def pre_validate(
+        self,
+        config: dict[str, Any],
+    ) -> ChartGenerationError | None:
+        if "handlebars_template" not in config:
+            return ChartGenerationError(
+                error_type="missing_handlebars_template",
+                message="Handlebars chart missing required field: handlebars_template",
+                details=(
+                    "Handlebars charts require a 'handlebars_template' string "
+                    "containing Handlebars HTML template markup"
+                ),
+                suggestions=[
+                    "Add 'handlebars_template' with a Handlebars HTML template",
+                    "Data is available as {{data}} array in the template",
+                    "Example: '<ul>{{#each data}}<li>{{this.name}}: "
+                    "{{this.value}}</li>{{/each}}</ul>'",
+                ],
+                error_code="MISSING_HANDLEBARS_TEMPLATE",
+            )
+
+        template = config.get("handlebars_template")
+        if not isinstance(template, str) or not template.strip():
+            return ChartGenerationError(
+                error_type="invalid_handlebars_template",
+                message="Handlebars template must be a non-empty string",
+                details=(
+                    "The 'handlebars_template' field must be a non-empty string "
+                    "containing valid Handlebars HTML template markup"
+                ),
+                suggestions=[
+                    "Ensure handlebars_template is a non-empty string",
+                    "Example: '<ul>{{#each data}}<li>{{this.name}}</li>{{/each}}</ul>'",
+                ],
+                error_code="INVALID_HANDLEBARS_TEMPLATE",
+            )
+
+        query_mode = config.get("query_mode", "aggregate")
+        if query_mode not in ("aggregate", "raw"):
+            return ChartGenerationError(
+                error_type="invalid_query_mode",
+                message="Invalid query_mode for handlebars chart",
+                details="query_mode must be either 'aggregate' or 'raw'",
+                suggestions=[
+                    "Use 'aggregate' for aggregated data (default)",
+                    "Use 'raw' for individual rows",
+                ],
+                error_code="INVALID_QUERY_MODE",
+            )
+
+        if query_mode == "raw" and not config.get("columns"):
+            return ChartGenerationError(
+                error_type="missing_raw_columns",
+                message="Handlebars chart in 'raw' mode requires 'columns'",
+                details=(
+                    "When query_mode is 'raw', you must specify which columns "
+                    "to include in the query results"
+                ),
+                suggestions=[
+                    "Add 'columns': [{'name': 'column_name'}] for raw mode",
+                    "Or use query_mode='aggregate' with 'metrics' and optional 'groupby'",  # noqa: E501
+                ],
+                error_code="MISSING_RAW_COLUMNS",
+            )
+
+        if query_mode == "aggregate" and not config.get("metrics"):
+            return ChartGenerationError(
+                error_type="missing_aggregate_metrics",
+                message="Handlebars chart in 'aggregate' mode requires 'metrics'",
+                details=(
+                    "When query_mode is 'aggregate' (default), you must specify "
+                    "at least one metric with an aggregate function"
+                ),
+                suggestions=[
+                    "Add 'metrics': [{'name': 'column', 'aggregate': 'SUM'}]",
+                    "Or use query_mode='raw' with 'columns' for individual rows",
+                ],
+                error_code="MISSING_AGGREGATE_METRICS",
+            )
+
+        return None


Duplicate validation with schema

The pre_validate method duplicates validation logic already implemented in HandlebarsChartConfig.validate_query_fields() (schemas.py:1055-1087). This includes checking for missing metrics in aggregate mode (line 109) and missing columns in raw mode (line 94). The schema validator also performs additional checks (mutual exclusivity, aggregate function presence) that pre_validate lacks. Having both creates maintenance divergence risk.

Code Review Run #8680f9

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

bito-code-review · 2026-05-22T08:19:17Z

@@ -226,28 +220,15 @@ def test_validate_table_chart_skips_xy_validations(self):
            ],
        )

-        # These should not be called for table charts
-        with (
-            patch(
-                "superset.mcp_service.chart.validation.runtime.RuntimeValidator."
-                "_validate_format_compatibility"
-            ) as mock_format,
-            patch(
-                "superset.mcp_service.chart.validation.runtime.RuntimeValidator."
-                "_validate_cardinality"
-            ) as mock_cardinality,
-            patch(
-                "superset.mcp_service.chart.validation.runtime.RuntimeValidator."
-                "_validate_chart_type"
-            ) as mock_chart_type,
-        ):
-            # Mock chart type validator to return no warnings
+        # Plugin runtime dispatches to TableChartPlugin which returns no warnings.
+        # Chart type suggester is also stubbed to return no warnings.
+        with patch(
+            "superset.mcp_service.chart.validation.runtime.RuntimeValidator."
+            "_validate_chart_type"
+        ) as mock_chart_type:
            mock_chart_type.return_value = ([], [])

            is_valid, error = RuntimeValidator.validate_runtime_issues(config, 1)

-            # Format and cardinality validation should not be called for table charts
-            mock_format.assert_not_called()
-            mock_cardinality.assert_not_called()
            assert is_valid is True
            assert error is None


Test coverage regression

Test test_validate_table_chart_skips_xy_validations no longer verifies that XY-specific validations are skipped. Original assertions mock_format.assert_not_called() and mock_cardinality.assert_not_called() were removed, weakening test coverage for the skip behavior.

Code Review Run #8680f9

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

bito-code-review · 2026-05-22T08:19:17Z

+            if plugin is None:
+                return []
+            return plugin.get_runtime_warnings(config, dataset_id)
+        except Exception as exc:


Avoid catching blind Exception

Replace bare Exception catch with specific exception types (e.g., ImportError, AttributeError, RuntimeError) to avoid masking unexpected errors.

Code suggestion

Check the AI-generated fix before applying

Suggested change

except Exception as exc:

except (ImportError, AttributeError, RuntimeError) as exc:

Code Review Run #8680f9

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

… redundant annotation quotes - schemas.py: CWE-390 bare except → add `as exc` + debug log so display-name lookup failures are observable (BLE001 suppressed: intentional fallback) - plugin.py: remove redundant forward-reference quotes from schema_error_hint return type (from __future__ import annotations already makes all annotations lazy strings) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bito-code-review · 2026-05-23T15:47:15Z

Code Review Agent Run #40e077

Actionable Suggestions - 0