Skip to content

feat(mcp): add create_dataset tool to register physical tables as datasets#40340

Draft
aminghadersohi wants to merge 3 commits into
apache:masterfrom
aminghadersohi:aminghadersohi/mcp-create-dataset
Draft

feat(mcp): add create_dataset tool to register physical tables as datasets#40340
aminghadersohi wants to merge 3 commits into
apache:masterfrom
aminghadersohi:aminghadersohi/mcp-create-dataset

Conversation

@aminghadersohi
Copy link
Copy Markdown
Contributor

Summary

Adds a create_dataset MCP tool that lets callers register an existing physical
database table as a Superset dataset — the programmatic equivalent of
Data → Datasets → + Dataset in the UI.

  • CreateDatasetRequest schemadatabase_id, schema (alias for schema_name
    to avoid the Pydantic v2 BaseModel.schema() clash), table_name, optional catalog
    and owners; whitespace-only values for schema/catalog are normalised to None
  • Pre-checks — verifies the database exists and the caller has table-level access
    before calling the command
  • create_dataset tool — wraps CreateDatasetCommand, uses
    DatasetInvalidError.get_list_classnames() (public API) to classify wrapped validation
    errors; returns DatasetInfo (same shape as get_dataset_info) so the id feeds
    directly into generate_chart or generate_explore_link
  • Typed error handling — distinct error_type values for each failure class:
    DatabaseNotFoundError, AccessDeniedError, DatasetExistsError,
    TableNotFoundError, ValidationError, CreateFailedError, InternalError
  • 11 unit tests — success, owners forwarding, exists/not-found/validation/internal
    errors, missing required fields, full DatasetInfo shape, database not found,
    access denied, no-schema, with-catalog
  • DEFAULT_INSTRUCTIONS updated to list the new tool

Motivation

The MCP service already exposes create_virtual_dataset for SQL-based datasets. This
PR adds the physical-table counterpart so agents can complete the full
"find DB → register table → chart it" workflow without manual UI steps.

Test plan

  • pytest tests/unit_tests/mcp_service/dataset/tool/test_create_dataset.py -x
  • Existing dataset tool tests still pass
  • pre-commit run --files superset/mcp_service/dataset/tool/create_dataset.py superset/mcp_service/dataset/schemas.py

…asets

Adds create_dataset MCP tool that wraps POST /api/v1/dataset/ so skills and
agents can register an existing physical table as a Superset dataset without
manual UI interaction. Returns DatasetInfo (same shape as get_dataset_info)
so the resulting dataset_id feeds directly into generate_chart.

- CreateDatasetRequest schema (database_id, schema, table_name, owners?)
- Tool file with typed error handling (exists/not-found/validation/internal)
- Registered in dataset/tool/__init__.py and app.py
- DEFAULT_INSTRUCTIONS updated to list create_dataset
- Unit tests covering success, owners, error cases, and full DatasetInfo shape
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.48%. Comparing base (5966bb1) to head (e220068).

Files with missing lines Patch % Lines
...uperset/mcp_service/dataset/tool/create_dataset.py 29.41% 24 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40340      +/-   ##
==========================================
- Coverage   64.20%   63.48%   -0.72%     
==========================================
  Files        2592     2593       +1     
  Lines      139004   139036      +32     
  Branches    32273    32275       +2     
==========================================
- Hits        89241    88268     -973     
- Misses      48231    49237    +1006     
+ Partials     1532     1531       -1     
Flag Coverage Δ
hive 39.05% <40.00%> (-0.25%) ⬇️
mysql 58.56% <40.00%> (-0.26%) ⬇️
postgres 58.64% <40.00%> (-0.26%) ⬇️
presto 40.73% <40.00%> (-0.25%) ⬇️
python 58.88% <40.00%> (-1.58%) ⬇️
sqlite 58.28% <40.00%> (-0.26%) ⬇️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aminghadersohi aminghadersohi force-pushed the aminghadersohi/mcp-create-dataset branch from a7720dd to caa3d97 Compare May 22, 2026 01:43
- schemas.py: restore full apache/master version and add CreateDatasetRequest
  (previous cherry-pick used an older shorter version missing helper functions
  _sanitize_dataset_info_for_llm_context, _humanize_timestamp, etc.)
- create_dataset.py: remove parse_request decorator (not in apache/master yet)
@aminghadersohi aminghadersohi force-pushed the aminghadersohi/mcp-create-dataset branch from caa3d97 to e220068 Compare May 22, 2026 01:44
@netlify
Copy link
Copy Markdown

netlify Bot commented May 22, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit caa3d97
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a0fb4ddfb79ae0008c93160
😎 Deploy Preview https://deploy-preview-40340--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant