feat(mcp): add create_dataset tool to register physical tables as datasets#40340
Draft
aminghadersohi wants to merge 3 commits into
Draft
feat(mcp): add create_dataset tool to register physical tables as datasets#40340aminghadersohi wants to merge 3 commits into
aminghadersohi wants to merge 3 commits into
Conversation
…asets Adds create_dataset MCP tool that wraps POST /api/v1/dataset/ so skills and agents can register an existing physical table as a Superset dataset without manual UI interaction. Returns DatasetInfo (same shape as get_dataset_info) so the resulting dataset_id feeds directly into generate_chart. - CreateDatasetRequest schema (database_id, schema, table_name, owners?) - Tool file with typed error handling (exists/not-found/validation/internal) - Registered in dataset/tool/__init__.py and app.py - DEFAULT_INSTRUCTIONS updated to list create_dataset - Unit tests covering success, owners, error cases, and full DatasetInfo shape
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40340 +/- ##
==========================================
- Coverage 64.20% 63.48% -0.72%
==========================================
Files 2592 2593 +1
Lines 139004 139036 +32
Branches 32273 32275 +2
==========================================
- Hits 89241 88268 -973
- Misses 48231 49237 +1006
+ Partials 1532 1531 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a7720dd to
caa3d97
Compare
- schemas.py: restore full apache/master version and add CreateDatasetRequest (previous cherry-pick used an older shorter version missing helper functions _sanitize_dataset_info_for_llm_context, _humanize_timestamp, etc.) - create_dataset.py: remove parse_request decorator (not in apache/master yet)
caa3d97 to
e220068
Compare
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
create_datasetMCP tool that lets callers register an existing physicaldatabase table as a Superset dataset — the programmatic equivalent of
Data → Datasets → + Dataset in the UI.
CreateDatasetRequestschema —database_id,schema(alias forschema_nameto avoid the Pydantic v2
BaseModel.schema()clash),table_name, optionalcatalogand
owners; whitespace-only values forschema/catalogare normalised toNonebefore calling the command
create_datasettool — wrapsCreateDatasetCommand, usesDatasetInvalidError.get_list_classnames()(public API) to classify wrapped validationerrors; returns
DatasetInfo(same shape asget_dataset_info) so theidfeedsdirectly into
generate_chartorgenerate_explore_linkerror_typevalues for each failure class:DatabaseNotFoundError,AccessDeniedError,DatasetExistsError,TableNotFoundError,ValidationError,CreateFailedError,InternalErrorerrors, missing required fields, full
DatasetInfoshape, database not found,access denied, no-schema, with-catalog
DEFAULT_INSTRUCTIONSupdated to list the new toolMotivation
The MCP service already exposes
create_virtual_datasetfor SQL-based datasets. ThisPR adds the physical-table counterpart so agents can complete the full
"find DB → register table → chart it" workflow without manual UI steps.
Test plan
pytest tests/unit_tests/mcp_service/dataset/tool/test_create_dataset.py -xpre-commit run --files superset/mcp_service/dataset/tool/create_dataset.py superset/mcp_service/dataset/schemas.py