feat(mcp): PR8 - add security layer, type safety, and test infrastructure #35841

aminghadersohi · 2025-10-25T00:22:50Z

SUMMARY

This PR adds critical security and testing infrastructure to complete the production-readiness of the MCP service. It introduces field-level permissions filtering, type-safe DAO protocols, and comprehensive test infrastructure for authentication and server startup validation.

New Features Added:

Security Layer (313 lines)

permissions_utils.py - Field-level permissions and data filtering:

Defines sensitive fields by object type (dataset, chart, dashboard)
Protects SQL queries, connection strings, cache keys, internal IDs
Role-based access control for sensitive field access
Permission-aware serialization with automatic filtering
Security functions:
- get_sensitive_fields(object_type) - Get sensitive field names
- has_permission_for_field(user, field, object_type) - Check field permissions
- filter_sensitive_fields(data, user, object_type) - Auto-filter based on permissions
- filter_sensitive_fields_pydantic(model, user, object_type) - Pydantic model filtering

Protected fields:

Dataset: sql, extra, database_id, changed_by_fk, created_by_fk
Chart: query_context, cache_key, changed_by_fk, created_by_fk
Dashboard: json_metadata, position_json, css, changed_by_fk, created_by_fk

Permission requirements:

Admin users: Can access all sensitive fields
Users with specific permissions: Can access field if they have the required permission
Regular users: Sensitive fields automatically filtered out

Type Safety (59 lines)

dao/base.py - DAO Protocol for consistent data access:

Protocol definition ensuring type safety across all model tools
Standard interface for Data Access Objects
Required methods:
- list() - Paginated listing with filters, search, and ordering
- find_by_id() - Retrieve single record by ID
- get_filterable_columns_and_operators() - Get available filters
- count() - Total record count (for instance info)
Benefits:
- Type checking at development time
- Consistent interfaces across chart/dashboard/dataset DAOs
- Better IDE support and autocomplete
- Prevents interface drift

Test Infrastructure (626 lines)

conftest.py (129 lines) - Shared test fixtures and mocks:

Auto-mocking fixtures for Superset dependencies
Mock Flask app with security manager
Mock user fixtures (admin, regular user, anonymous)
Mock DAOs (ChartDAO, DashboardDAO, DatasetDAO, DatabaseDAO)
Mock command classes for all tool operations
Reusable test utilities across all MCP tests

test_auth.py (317 lines) - Authentication unit tests:

JWT token validation and parsing
User resolution from tokens
User impersonation support
Permission checking
Audit logging validation
Payload sanitization
Error handling for invalid tokens
Auth provider configuration
Test coverage:
- Valid token authentication
- Invalid token handling
- Expired token handling
- Malformed token handling
- User impersonation flows
- Permission-based access control

test_auth_integration.py (147 lines) - JWT integration tests:

End-to-end JWT authentication workflows
FastMCP integration with auth layer
Real token generation and validation
User context propagation through tool calls
Permission enforcement in tool execution
Test scenarios:
- Tool calls with valid authentication
- Unauthenticated requests (rejected)
- Permission-based tool access
- User impersonation workflows

test_server_startup.py (33 lines) - Server startup verification:

MCP server initialization tests
Configuration validation
Service health checks
Error handling during startup

Statistics:

Files Added: 7 (2 production, 1 test config, 4 test files)
Lines Added: ~998
- permissions_utils.py: 313 lines (security layer)
- dao/base.py: 59 lines (type safety)
- conftest.py: 129 lines (test fixtures)
- test_auth.py: 317 lines (auth unit tests)
- test_auth_integration.py: 147 lines (auth integration tests)
- test_server_startup.py: 33 lines (server startup tests)
Test Coverage: Authentication, authorization, server startup, permission filtering
All pre-commit hooks passing

Builds on:

PR feat(mcp): PR1 - Add MCP service scaffold for Apache Superset #35163 (MCP service scaffold)
PR2 (chart listing and info tools)
PR3 (dashboard and dataset listing and info tools)
PR4 (advanced chart tools and SQL Lab integration)
PR5 (dashboard generation and chart placement tools)
PR6 (instance info tool, prompts, and resources)
PR7 (restore tool documentation)

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A - This is a backend security and testing infrastructure enhancement.

TESTING INSTRUCTIONS

Prerequisites

Ensure you have a running Superset instance with the MCP service enabled
Ensure you have the development environment set up (Python venv activated)
Have pytest installed in your environment

Running the Test Suite

# Activate your virtual environment
source venv/bin/activate  # or your venv path

# Run all MCP service tests
pytest tests/unit_tests/mcp_service/ -v

# Run specific test files
pytest tests/unit_tests/mcp_service/system/test_auth.py -v
pytest tests/unit_tests/mcp_service/system/test_auth_integration.py -v
pytest tests/unit_tests/mcp_service/system/test_server_startup.py -v

# Run with coverage
pytest tests/unit_tests/mcp_service/ --cov=superset.mcp_service --cov-report=term-missing

Expected Test Results

test_auth.py (317 lines):

Should see ~15-20 passing tests for:
- JWT token validation
- User resolution
- Permission checking
- Audit logging
- Error handling

test_auth_integration.py (147 lines):

Should see ~8-10 passing tests for:
- End-to-end JWT workflows
- Tool authentication
- Permission enforcement

test_server_startup.py (33 lines):

Should see ~2-3 passing tests for:
- Server initialization
- Configuration validation

Manual Security Testing

Test the permissions filtering manually:

# In a Python shell with Superset context
from superset.mcp_service.utils.permissions_utils import (
    filter_sensitive_fields,
    get_sensitive_fields,
)
from superset import security_manager

# Get sensitive fields for datasets
sensitive = get_sensitive_fields("dataset")
print(f"Sensitive dataset fields: {sensitive}")
# Expected: {'sql', 'extra', 'database_id', 'changed_by_fk', 'created_by_fk'}

# Test filtering with a mock user (requires Superset context)
user = security_manager.find_user(username="admin")
dataset_data = {
    "id": 1,
    "name": "My Dataset",
    "sql": "SELECT * FROM sensitive_table",  # Should be filtered for non-admin
    "table_name": "public_table",
}

filtered = filter_sensitive_fields(dataset_data, user, "dataset")
print(filtered)
# For admin: All fields present
# For regular user: 'sql' field removed

Pre-Commit Validation

# Stage all files
git add -A

# Run pre-commit hooks on changed files
pre-commit run --files \
  superset/mcp_service/utils/permissions_utils.py \
  superset/mcp_service/dao/base.py \
  superset/mcp_service/dao/__init__.py \
  tests/unit_tests/mcp_service/conftest.py \
  tests/unit_tests/mcp_service/system/test_auth.py \
  tests/unit_tests/mcp_service/system/test_auth_integration.py \
  tests/unit_tests/mcp_service/system/test_server_startup.py

# Expected: All hooks pass (mypy, ruff, pylint, etc.)

Verification Checklist

All authentication tests pass
All integration tests pass
Server startup tests pass
permissions_utils.py correctly identifies sensitive fields
Field filtering works for different user roles
DAO Protocol type checking works in IDE
conftest.py fixtures are available to other test files
Pre-commit hooks pass

ADDITIONAL INFORMATION

Has associated issue: No
Required feature flags: No
Changes UI: No
Includes DB Migration: No
Introduces new feature or API: Yes (security layer, type safety, test infrastructure)
Removes existing feature or API: No

Completes Production-Ready Security:

This PR represents a critical security and quality milestone for the MCP service:

Field-level Security - Protects sensitive data from unauthorized access
Type Safety - Ensures consistent interfaces across all data access
Test Coverage - Validates authentication, authorization, and startup
Production Ready - With these additions, the service is ready for production deployment

Files Changed Summary:

New Security (2 files):

superset/mcp_service/utils/permissions_utils.py (313 lines)
superset/mcp_service/dao/base.py (59 lines)
superset/mcp_service/dao/init.py (22 lines - exports)

New Test Infrastructure (4 files):

tests/unit_tests/mcp_service/conftest.py (129 lines)
tests/unit_tests/mcp_service/system/test_auth.py (317 lines)
tests/unit_tests/mcp_service/system/test_auth_integration.py (147 lines)
tests/unit_tests/mcp_service/system/test_server_startup.py (33 lines)

Key Security Features:

permissions_utils.py:
- Centralized sensitive field definitions by object type
- Permission-aware filtering functions
- Support for both dict and Pydantic model filtering
- Role-based access control integration
- Security by default (deny access unless explicitly permitted)
Field-level permissions:
- Protects SQL queries from unauthorized viewing
- Hides database connection strings and credentials
- Filters internal IDs and references
- Removes cache keys and sensitive metadata
- Preserves public fields for all users
dao/base.py Protocol:
- Type-safe interface for all DAOs
- Consistent method signatures
- Better IDE support and type checking
- Prevents interface drift across modules
- Required for instance info statistics

Test Infrastructure Highlights:

conftest.py:
- Auto-mocking strategy for all Superset dependencies
- Prevents accidental database access in unit tests
- Reusable fixtures across all MCP tests
- Mock users with different permission levels
- Mock DAOs for chart/dashboard/dataset operations
Authentication Tests:
- Comprehensive JWT validation coverage
- User resolution and impersonation
- Permission checking workflows
- Error handling for all failure scenarios
- Audit logging validation
Integration Tests:
- End-to-end authentication workflows
- FastMCP integration validation
- Tool execution with auth context
- Permission enforcement in real scenarios

Architecture Benefits:

The security layer uses a declarative approach:

# Define sensitive fields once
SENSITIVE_FIELDS = {
    "dataset": {"sql", "extra", "database_id", ...},
    "chart": {"query_context", "cache_key", ...},
    "dashboard": {"json_metadata", "position_json", ...},
}

# Auto-filter based on user permissions
filtered_data = filter_sensitive_fields(data, user, "dataset")

This design enables:

Centralized security policy management
Easy addition of new sensitive fields
Consistent filtering across all tools
Automatic protection with minimal code changes
Clear audit trail of what's protected

Pattern Compliance:

This PR continues the MINIMAL cherry-pick pattern established in previous PRs:

✅ Copied ONLY the required security and test files
✅ NO development utilities or extra infrastructure
✅ Clean separation of concerns (security, type safety, tests)
✅ All pre-commit hooks passing
✅ Production-ready code quality

Risk Mitigation:

Without this PR, the MCP service would have:

❌ No field-level security (sensitive data exposed)
❌ No type safety guarantees (potential runtime errors)
❌ No authentication test coverage (security vulnerabilities)
❌ No server startup validation (deployment failures)

With this PR:

✅ Field-level security with role-based filtering
✅ Type-safe DAO interfaces with Protocol
✅ Comprehensive authentication test coverage
✅ Server startup and configuration validation

Next Steps:

The MCP service is now feature-complete AND production-ready. Remaining tasks:

Code review for security layer
Final integration testing
Deployment planning
Documentation updates

Future enhancements may include:

Row-level security integration
Fine-grained permission policies
Additional test coverage for module-specific tools
Performance testing under load
Security audit and penetration testing

korbit-ai · 2025-10-25T00:22:55Z

Korbit doesn't automatically review large (3000+ lines changed) pull requests such as this one. If you want me to review anyway, use /korbit-review.

bito-code-review · 2025-10-25T00:22:59Z

Bito Automatic Review Skipped - Large PR

Bito didn't auto-review this change because the pull request exceeded the line limit. No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.

superset/mcp_service/chart/schemas.py

superset/mcp_service/screenshot/webdriver_config.py

+        except Exception as e:
+            from flask import jsonify
+
+            return jsonify({"error": str(e), "status": "error"}), 500


tests/unit_tests/mcp_service/chart/test_chart_utils.py

codecov · 2025-10-25T00:41:59Z

Codecov Report

❌ Patch coverage is 0% with 1526 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.77%. Comparing base (6e27bee) to head (07e57a7).

Files with missing lines	Patch %	Lines
superset/mcp_service/middleware.py	0.00%	269 Missing ⚠️
superset/mcp_service/screenshot/webdriver_pool.py	0.00%	218 Missing ⚠️
...perset/mcp_service/screenshot/pooled_screenshot.py	0.00%	172 Missing ⚠️
superset/mcp_service/utils/retry_utils.py	0.00%	113 Missing ⚠️
superset/mcp_service/utils/error_builder.py	0.00%	101 Missing ⚠️
superset/mcp_service/utils/permissions_utils.py	0.00%	91 Missing ⚠️
superset/mcp_service/sql_lab/sql_lab_utils.py	0.00%	86 Missing ⚠️
superset/mcp_service/mcp_core.py	0.00%	84 Missing ⚠️
superset/mcp_service/sql_lab/execute_sql_core.py	0.00%	58 Missing ⚠️
superset/mcp_service/utils/url_utils.py	0.00%	42 Missing ⚠️
... and 14 more

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #35841       +/-   ##
===========================================
+ Coverage        0   68.77%   +68.77%     
===========================================
  Files           0      622      +622     
  Lines           0    45691    +45691     
  Branches        0     4971     +4971     
===========================================
+ Hits            0    31422    +31422     
- Misses          0    13024    +13024     
- Partials        0     1245     +1245

Flag	Coverage Δ
hive	`44.23% <0.00%> (?)`
mysql	`67.84% <0.00%> (?)`
postgres	`67.89% <0.00%> (?)`
presto	`47.77% <0.00%> (?)`
python	`68.73% <0.00%> (?)`
sqlite	`67.51% <0.00%> (?)`
unit	`100.00% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add 6 new MCP tools for dashboard and dataset management, expanding the MCP service to cover Superset's core resources. Dashboard Tools (3): - list_dashboards: List dashboards with filtering, search, and sorting - Filter by: dashboard_title, published, favorite - Sort by: id, dashboard_title, slug, published, changed_on, created_on - Supports UUID and slug lookups - get_dashboard_info: Retrieve dashboard details by ID, UUID, or slug - Returns charts, layout, position JSON, owners, roles - get_dashboard_available_filters: List filterable columns and operators - Returns DashboardFilter metadata for query building Dataset Tools (3): - list_datasets: List datasets with filtering, search, and sorting - Filter by: table_name, schema, owner, favorite - Sort by: id, table_name, schema, changed_on, created_on - Returns columns and metrics for each dataset - get_dataset_info: Retrieve dataset details by ID or UUID - Returns complete column definitions with types - Returns metrics with SQL expressions - get_dataset_available_filters: List filterable columns and operators - Returns DatasetFilter metadata for query building Core Infrastructure: - ModelGetAvailableFiltersCore: Generic base class for *_available_filters tools - Enhanced schemas with TYPE_CHECKING imports for better type safety - Retry utilities in utils/retry_utils.py (340 lines) - CLAUDE.md guide (431 lines) for LLM agents Schemas (1,107 lines): - dashboard/schemas.py: DashboardInfo, DashboardList, DashboardFilter, etc. (418 lines) - dataset/schemas.py: DatasetInfo, DatasetList, DatasetFilter, etc. (349 lines) - Enhanced chart/schemas.py: Added chart serialization helpers (793 lines total) - Enhanced system/schemas.py: Expanded with more type safety (111 lines total) Testing (1,804 lines): - test_dashboard_tools.py: 15 tests for dashboard listing/retrieval (573 lines) - test_dataset_tools.py: 20 tests for dataset listing/retrieval (1,231 lines) - All 62 MCP tests passing (11 chart + 15 dashboard + 20 dataset + 16 core) Integration: - app.py: Added dashboard and dataset tool imports (13 new lines) - All tools follow ModelListCore/ModelGetInfoCore patterns - Consistent Pydantic schemas with Field descriptions - All tools use @mcp_auth_hook for security This PR completes Phase 2 of the MCP service rollout, providing comprehensive read access to Superset's dashboards and datasets. Chart creation tools will follow in the next PR. 24 files changed, 5,293 insertions(+), 20 deletions(-) fix(mcp): use cryptographically secure random for retry jitter Replace random.uniform() with secrets.SystemRandom().uniform() to address CodeQL security warning about weak random number generation. While jitter for retry delays doesn't require cryptographic randomness, using secrets.SystemRandom() eliminates the CodeQL alert and has minimal performance impact. fix(mcp): replace ReDoS-vulnerable regex with safer alternatives Replace regex patterns that could cause catastrophic backtracking with safer alternatives to address CodeQL security warnings: 1. **Length checks first**: Add explicit length validation before any regex matching to bound the maximum input size 2. **Substring checks instead of complex regex**: Replace patterns like `r"<script[^>]*>.*?</script>"` with simple substring checks like `"<script" in v.lower()` - this is just as effective for detecting attacks but has O(n) complexity instead of potential exponential backtracking 3. **Simplified regex patterns**: For patterns that must use regex, remove backtracking opportunities: - `r"/\*.*?\*/"` → `r"/\*"` (just check for comment start) - Keep only bounded patterns with no `.*` quantifiers **Affected validators:** - ColumnRef.sanitize_name (column names) - ColumnRef.sanitize_label (display labels) - FilterConfig.sanitize_column (filter columns) - FilterConfig.sanitize_value (filter values) - UpdateChartRequest.sanitize_chart_name (chart names) These changes maintain the same security protections while eliminating ReDoS vulnerabilities. The substring approach is actually more robust for XSS/injection detection since it catches partial tags/patterns.

Add 11 new MCP tools for chart creation/modification, data access, SQL execution, and explore link generation. This is the largest MCP PR, adding comprehensive write capabilities and advanced features. Chart Creation & Modification Tools (6): - generate_chart: Create new charts with AI-powered configuration (454 lines) - Supports table and XY chart types (line, bar, area, scatter) - 5-layer validation pipeline (schema, business logic, dataset, Superset, runtime) - XSS/SQL injection prevention with whitelist validation - Column existence validation with fuzzy match suggestions - Auto-generates chart names and handles duplicate labels - Returns chart_id, preview URLs, and explore URL - update_chart: Update existing chart configuration (213 lines) - Modify saved charts by ID or UUID - Full validation pipeline like generate_chart - Updates chart metadata and regenerates previews - update_chart_preview: Update cached preview without saving (158 lines) - Modifies temporary form_data from generate_chart (save_chart=False) - Returns new form_data_key and updated preview - Useful for iterative chart design without cluttering database - get_chart_preview: Get visual preview of any chart (2,082 lines) - Multiple formats: url (PNG screenshot), ascii, table, vega_lite, base64 - Screenshot generation with Selenium WebDriver pool - ASCII art charts for terminal-friendly output - Tabular data representation - VegaLite JSON for interactive visualizations - get_chart_data: Get underlying chart data (649 lines) - Export formats: json, csv, excel - Pagination support with configurable limits - Cache control (use_cache, force_refresh, cache_timeout) - Returns raw data for LLM analysis without rendering - get_chart_available_filters: List filterable chart columns (50 lines) - Returns available filter fields and operators - Helps LLMs discover what chart fields can be filtered SQL Lab Tools (3): - execute_sql: Execute SQL queries against databases (94 lines) - Security validation and timeout protection - DML permission enforcement - Support for parameterized queries - Configurable result limits (max 10,000 rows) - Returns query results in structured format - open_sql_lab_with_context: Generate SQL Lab URL (118 lines) - Pre-populate SQL editor with query and context - Set database connection and schema - Provide dataset context for exploration - Returns URL for direct navigation Explore & Analysis Tools (1): - generate_explore_link: Create pre-configured explore URL (95 lines) - Generate Superset explore URLs with dataset/metrics/filters - Supports table and XY chart configurations - Form data caching for seamless user experience - Returns explore_url for interactive chart building Chart Infrastructure (3,085 lines): - chart_utils.py: Chart creation and update logic (484 lines) - Handles both saved charts and cached previews - Manages form_data lifecycle and expiration - Integrates with ChartDAO for persistence - preview_utils.py: Preview generation from form_data (561 lines) - Converts configs to ASCII art, tables, VegaLite - Handles chart data extraction and formatting - validation/ (5-layer pipeline, 2,040 lines): - schema_validator.py: Pydantic schema validation (307 lines) - dataset_validator.py: Column/metric existence checks (329 lines) - pipeline.py: Orchestrates validation layers (293 lines) - runtime/chart_type_suggester.py: Smart chart type recommendations (437 lines) - runtime/cardinality_validator.py: High-cardinality warnings (195 lines) - runtime/format_validator.py: Axis format validation (225 lines) Screenshot Infrastructure (1,090 lines): - pooled_screenshot.py: Manages WebDriver pool for chart screenshots (483 lines) - Concurrent screenshot generation with connection pooling - Automatic retry on transient failures - Configurable dimensions and timeouts - webdriver_pool.py: WebDriver connection pool (433 lines) - Reuses Selenium connections for performance - Health checks and automatic recovery - Thread-safe pool management - webdriver_config.py: WebDriver configuration (139 lines) - Chrome/Firefox support with headless mode - Reads from WEBDRIVER_* config variables SQL Lab Infrastructure (589 lines): - execute_sql_core.py: Core SQL execution logic (221 lines) - Query execution with error handling - DML permission checks - Result formatting and pagination - sql_lab_utils.py: SQL Lab URL generation (243 lines) - Constructs SQL Lab URLs with context - Form data encoding for URL parameters - schemas.py: Request/response models (109 lines) - ExecuteSqlRequest, ExecuteSqlResponse - OpenSqlLabRequest, OpenSqlLabResponse Enhanced Middleware (740 lines, +666 lines): - Request/response logging with timing - Error response formatting with structured schemas - Flask context integration for all tools - Cache control header management - Form data cleanup for expired entries Enhanced Authentication (auth.py, +38 lines): - has_dataset_access(): Dataset permission checking - Integration with Superset's security_manager - Used by all chart creation/modification tools Common Utilities (738 lines): - error_schemas.py: Structured error responses (103 lines) - BaseError, ValidationError, ChartError, etc. - Consistent error format across all tools - cache_utils.py: Cache control helpers (143 lines) - CacheControl schema for unified caching - use_cache, force_refresh, cache_timeout flags - error_builder.py: Error construction utilities (364 lines) - Builds structured error responses - Validation error formatting - Suggestion generation for typos - url_utils.py: URL generation helpers (128 lines) - get_superset_base_url() for external URLs - Constructs chart/explore/sqllab URLs Commands (33 lines): - create_form_data.py: Form data creation command - Integrates with Superset's CreateFormDataCommand - Used by chart preview tools Testing (2,554 lines, 105 new tests): - test_generate_chart.py: 268 lines, chart creation tests - test_update_chart.py: 385 lines, chart update tests - test_update_chart_preview.py: 474 lines, preview update tests - test_get_chart_preview.py: 290 lines, preview generation tests - test_chart_utils.py: 460 lines, chart utilities tests - test_execute_sql.py: 497 lines, SQL execution tests - test_execute_sql_helper.py: 64 lines, SQL helper tests - test_generate_explore_link.py: 580 lines, explore link tests Total: 167 tests (139 passing, 28 SQL Lab tests need integration fixes) Integration (app.py updates): All new tools imported and auto-registered via @mcp.tool decorators: - 6 chart tools (generate_chart, update_chart, etc.) - 3 SQL Lab tools (execute_sql, open_sql_lab_with_context) - 1 explore tool (generate_explore_link) This PR completes Phase 3 of the MCP service rollout, adding full CRUD capabilities for charts and enabling SQL-based data exploration. Dashboard creation tools will follow in the next PR. 53 files changed, 13,318 insertions(+), 37 deletions(-)

Replace complex regex pattern with substring checks in error sanitization: - r"<script[^>]*>.*?</script>" → "<script" in str_lower - More efficient and avoids catastrophic backtracking - Still catches all XSS attempts in error messages

Replace ReDoS-vulnerable regex patterns in _sanitize_validation_error(): - Remove r"SELECT\s+.*?\s+FROM" pattern (catastrophic backtracking) - Remove r"WHERE\s+.*?(\s+ORDER|\s+GROUP|\s+LIMIT|$)" pattern Security improvements: - Length check FIRST to bound input size - Substring checks for SQL fragment detection - String slicing instead of regex for content redaction - Maintains same security protection without backtracking Refactoring to reduce complexity: - Extract _redact_sql_select helper - Extract _redact_sql_where helper - Extract _get_generic_error_message helper - Main function now has complexity 4 (well under limit of 10) This completes the ReDoS fixes across all MCP service validation and sanitization functions.

Replace unbounded regex patterns in _sanitize_error_for_logging(): - r"postgresql://[^@]+@[^/]+/" → bounded {1,100} quantifiers - r"mysql://[^@]+@[^/]+/" → bounded {1,100} quantifiers - r'[Aa]pi[_-]?[Kk]ey[:\s]*[^\s\'"]+' → bounded {0,5} and {1,100} - r'[Tt]oken[:\s]*[^\s\'"]+' → bounded {0,5} and {1,100} - r"/[a-zA-Z0-9_\-/.]+/superset/" → bounded {1,200} Security improvements: - Length check FIRST to bound input size - Substring checks before applying regex (avoid unnecessary work) - All patterns now have explicit quantifier bounds - Prevents catastrophic backtracking on malicious input This completes all ReDoS fixes in the MCP service.

Remove reference to deleted codeql-config.yml file. CodeQL will now run with default configuration and detect all security issues without suppressions.

CodeQL detected incomplete URL substring sanitization where database URLs could be partially matched at arbitrary positions. Changes: - Remove substring position checks (e.g., "postgresql://" in str) - Use word boundaries (\b) to match complete URLs only - Add whitespace exclusions ([^@\s], [^/\s]) to prevent cross-boundary matches - Apply case-insensitive matching with re.IGNORECASE flag - Ensure full URL path is redacted with [^\s]{0,100} for path component This ensures URLs are completely sanitized regardless of position in error string.

CodeQL detected incomplete URL substring sanitization where URL schemes (javascript:, vbscript:, data:) could match at arbitrary positions. Changes: - Replace substring checks for URL schemes with regex word boundaries - Use \b(javascript|vbscript|data): to match only actual URL schemes - Keep simple substring checks for HTML tags (<script, </script>) - Consolidate event handler check into same pattern list This ensures we only filter actual URL schemes, not arbitrary text containing these substrings.

CodeQL detected incomplete URL substring sanitization in multiple validators. URL schemes (javascript:, vbscript:, data:) could match at arbitrary positions. Fixed in 5 validators: - ColumnRef.sanitize_name - ColumnRef.sanitize_label - FilterConfig.sanitize_column - FilterConfig.sanitize_value - UpdateChartRequest.sanitize_chart_name Changes: - Separate HTML tag checks (substring) from URL scheme checks (regex) - Use \b(javascript|vbscript|data): to match only actual URL schemes - Extract _validate_string_value helper to reduce complexity - Prevents false positives on text containing these substrings All validators now properly distinguish between HTML tags (safe substring checks) and URL schemes (word boundary regex).

Add two new MCP tools for dashboard management: - generate_dashboard: Creates dashboards with charts in grid layout - add_chart_to_existing_dashboard: Adds charts to existing dashboards Changes: - New tool: generate_dashboard.py (236 lines) - New tool: add_chart_to_existing_dashboard.py (282 lines) - Schema additions: 4 new classes in dashboard/schemas.py (+55 lines) - Tests: 11 comprehensive tests (450 lines, all passing) - Updated dashboard/tool/__init__.py exports All pre-commit hooks passing. All tests passing (11/11). fix(mcp): apply ruff auto-fixes for import ordering - Remove unused imports in update_chart.py - Sort imports alphabetically in dashboard/tool/__init__.py These fixes resolve CI ruff errors. fix(mcp): resolve schema issues from rebase - Remove duplicate class definitions in chart/schemas.py (ChartCapabilities, ChartSemantics, PerformanceMetadata, AccessibilityMetadata, VersionedResponse) - Replace `Optional[T]` with modern `T | None` syntax in dashboard/schemas.py - Fix Pydantic forward reference issue by moving GenerateChartResponse after ChartPreviewContent definition - Address mypy type checking errors - Fix ruff formatting issues These issues were introduced during the PR5 rebase and prevented CI from passing. Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> fix: apply ruff formatting and remove unused mypy ignore - Remove extra blank line in mcp_core.py - Add missing blank line after import in superset_config_docker_light.py - Remove unused type: ignore comment in superset_config_docker_light.py fix(mcp): add type ignore for CELERY_CONFIG override in docker light config Fixes mypy errors: - Cannot assign multiple types to name "CELERY_CONFIG" without an explicit "Type[...]" annotation - Incompatible types in assignment (expression has type "None", variable has type "type[CeleryConfig]") The docker light config intentionally overrides CELERY_CONFIG to None to disable Celery. Added type: ignore comment to suppress the expected type mismatch. revert: remove unrelated changes to docker config file This file is not part of the MCP service work. Removing the blank line formatting change to restore it to the base branch state. fix(mcp): restore DAO files and add tool registration to fix failing tests Fixed all 64 failing MCP service unit tests by: 1. **Restored accidentally deleted DAO files**: - superset/mcp_service/dao/base.py - superset/mcp_service/dao/__init__.py These files were accidentally deleted in commit adbdb9b 2. **Added tool imports to server.py**: - Added dashboard tool imports (generate_dashboard, add_chart_to_existing_dashboard, etc.) - Added dataset tool imports (list_datasets, get_dataset_info, etc.) - Added explore tool imports (generate_explore_link) - Added sql_lab tool imports (execute_sql, open_sql_lab_with_context) 3. **Created conftest.py for test discovery**: - Added tests/unit_tests/mcp_service/conftest.py - Imports all tools so they register with MCP instance before tests run - Essential for pytest to discover and run tools in test context All 178 MCP service unit tests now pass. refactor(mcp): remove unused DAO Protocol The custom DAO Protocol in superset/mcp_service/dao/ is unused. All code uses BaseDAO from superset.daos.base instead. The Protocol was originally created as a placeholder with the comment "To be replaced with one from superset core" and has since been replaced. All 178 MCP service tests still pass after removal. update importes fix(mcp): remove tool imports from conftest to prevent test pollution Removed tool imports from tests/unit_tests/mcp_service/conftest.py to prevent side effects during test discovery that were causing integration test failures. Tools are already registered in app.py when the MCP instance is created, so importing them again in conftest.py was unnecessary and causing database constraint violations in unrelated integration tests.

Add comprehensive MCP features for instance metadata, user guidance, and resource templates: **New Tools (1):** - get_superset_instance_info: Detailed instance statistics with dashboards, charts, datasets, activity metrics, and database breakdowns **New Prompts (2):** - superset_quickstart: Interactive onboarding guide for new users - create_chart_guided: AI-powered chart creation workflow with business context **New Resources (2):** - chart_configs: Example chart configuration templates for common visualizations - instance_metadata: Real-time instance metadata and statistics **Infrastructure Added:** - InstanceInfoCore: Configurable base class for comprehensive instance reporting with custom metric calculators - Added 7 new schemas to system/schemas.py: InstanceInfo, InstanceSummary, RecentActivity, DashboardBreakdown, DatabaseBreakdown, PopularContent, GetSupersetInstanceInfoRequest Changes: - Added get_superset_instance_info.py tool (268 lines) with detailed breakdown calculations - Added InstanceInfoCore class to mcp_core.py (173 lines) - Added quickstart.py prompt (94 lines) for user onboarding - Added create_chart_guided.py prompt (195 lines) for AI-assisted chart creation - Added chart_configs.py resource (362 lines) with example configurations - Added instance_metadata.py resource (107 lines) for real-time instance stats - Updated system/tool/__init__.py to export new instance info tool - Added placeholder __init__.py files for dashboard/dataset prompts and resources - Added 7 new Pydantic schemas for instance information responses Statistics: - 17 files changed (16 new, 2 modified) - ~1,200 lines of production code added - All pre-commit hooks passing fix(mcp): convert timestamp to ISO string in InstanceInfo schema Fixed Pydantic validation error where timestamp field was receiving a datetime object instead of the expected string type. The InstanceInfo schema requires timestamp as a string, so now we convert it using .isoformat(). fix(mcp): use BaseDAO instead of DAO Protocol in instance_metadata Replace DAO Protocol casts with BaseDAO[Any] to match InstanceInfoCore type signature. This fixes mypy type checking errors. feat(mcp): register prompts and resources in app.py and document in CLAUDE.md Add imports for chart and system prompts/resources to ensure they are registered with the MCP instance on startup. Update CLAUDE.md with comprehensive instructions for adding new prompts and resources. Changes to app.py: - Import chart.prompts module to register create_chart_guided prompt - Import chart.resources module to register chart_configs resource - Import system.prompts module to register superset_quickstart prompt - Import system.resources module to register instance_metadata resource - Add clear comment explaining prompt/resource registration pattern Changes to CLAUDE.md: - Update section title to include prompts and resources - Add "How to Add a New Prompt" section with complete workflow - Add "How to Add a New Resource" section with complete workflow - Add Quick Checklist for New Prompts - Add Quick Checklist for New Resources - Update line number references (210-242 for tools, 244-253 for prompts/resources) - Explain two-level import pattern (module __init__.py + app.py) This ensures all prompts and resources are available to MCP clients and provides clear documentation for future contributors.

Restore essential behavioral documentation removed in 241cfa5. LLMs need this info to use tools correctly (e.g., charts saved by default). Changes: - generate_chart: Restored save_chart=True default behavior, LLM display guidance - update_chart: Clarified requires saved chart, display URL guidance - generate_dashboard: Noted chart access requirements and 2-column layout - get_chart_info: Added LLM display URL guidance - get_chart_data: Clarified purpose for LLM analysis without image rendering - generate_explore_link: Restored PREFERRED tool guidance vs generate_chart - update_chart_preview: Clarified cached preview workflow and display guidance Token reduction: ~70% vs original verbose docs while preserving critical info. This restores LLM-critical behavioral guidance while maintaining token efficiency. chore: trigger CI re-run for PR7 chore: retry flaky tests for PR7

This PR adds critical field-level permissions infrastructure to the MCP service, providing security controls for sensitive data exposure. Also includes auth improvements from Antonio Rivero for JWT scope handling. **New Security Layer (313 lines)** **permissions_utils.py** - Field-level permissions and data filtering: - Defines sensitive fields by object type (dataset, chart, dashboard) - Protects SQL queries, connection strings, cache keys, internal IDs - Role-based access control for sensitive field access - Permission-aware serialization with automatic filtering - Security functions: - `get_sensitive_fields(object_type)` - Get sensitive field names - `has_permission_for_field(user, field, object_type)` - Check field permissions - `filter_sensitive_fields(data, user, object_type)` - Auto-filter based on permissions - `filter_sensitive_fields_pydantic(model, user, object_type)` - Pydantic model filtering **Protected fields:** - **Dataset**: sql, extra, database_id, changed_by_fk, created_by_fk - **Chart**: query_context, cache_key, changed_by_fk, created_by_fk - **Dashboard**: json_metadata, position_json, css, changed_by_fk, created_by_fk **Permission requirements:** - Admin users: Can access all sensitive fields - Users with specific permissions: Can access field if they have the required permission - Regular users: Sensitive fields automatically filtered out **Auth Improvements (from Antonio Rivero):** - RBAC fallback when JWT scopes are not present - Updated user_resolver signature to accept app parameter - Fixed permission names (can_read/can_write) - Stateless HTTP mode for FastMCP - Updated app.py and mcp_config.py for JWT handling **Statistics:** - **Files Modified**: 2 (app.py, mcp_config.py) - **Files Added**: 1 (permissions_utils.py - 313 lines) - **Total Lines**: ~400 lines - **All 178 MCP service unit tests passing** - **All pre-commit hooks passing** **Builds on:** - PR7 (restore tool documentation) **Key Security Features:** 1. **Centralized sensitive field definitions by object type** 2. **Permission-aware filtering functions** 3. **Support for both dict and Pydantic model filtering** 4. **Role-based access control integration** 5. **Security by default** (deny access unless explicitly permitted) **Architecture Benefits:** The security layer uses a declarative approach: ```python # Define sensitive fields once SENSITIVE_FIELDS = { "dataset": {"sql", "extra", "database_id", ...}, "chart": {"query_context", "cache_key", ...}, "dashboard": {"json_metadata", "position_json", ...}, } # Auto-filter based on user permissions filtered_data = filter_sensitive_fields(data, user, "dataset") ``` This design enables: - Centralized security policy management - Easy addition of new sensitive fields - Consistent filtering across all tools - Automatic protection with minimal code changes - Clear audit trail of what's protected **Pattern Compliance:** This PR continues the **MINIMAL cherry-pick pattern** established in previous PRs: - ✅ Copied ONLY the required security file - ✅ Auth improvements for JWT scope handling - ✅ Clean separation of concerns - ✅ All pre-commit hooks passing - ✅ Production-ready code quality

aminghadersohi · 2025-10-31T01:00:11Z

closing as this stack is too hard to maintain. i am just gonna keep the last one. PR10

pull-request-size bot added the size/XXL label Oct 25, 2025

dosubot bot added authentication Related to authentication change:backend Requires changing the backend risk:breaking-change Issues or PRs that will introduce breaking changes labels Oct 25, 2025

aminghadersohi force-pushed the feat/mcp_service_pr8 branch from 430b4bc to 33e7731 Compare October 25, 2025 00:37

github-advanced-security bot found potential problems Oct 25, 2025

View reviewed changes

aminghadersohi force-pushed the feat/mcp_service_pr8 branch 3 times, most recently from f997c3d to c0d3d58 Compare October 25, 2025 01:27

aminghadersohi requested review from betodealmeida, craig-rueda, dpgaspar, eschutho, geido, kgabryje, mistercrunch, nytai, rusackas, sadpandajoe and villebro as code owners October 25, 2025 01:27

github-actions bot added the github_actions Pull requests that update GitHub Actions code label Oct 25, 2025

aminghadersohi force-pushed the feat/mcp_service_pr8 branch 5 times, most recently from 89b4883 to f2d6850 Compare October 25, 2025 03:37

github-actions bot removed the github_actions Pull requests that update GitHub Actions code label Oct 25, 2025

aminghadersohi force-pushed the feat/mcp_service_pr8 branch 4 times, most recently from 6aca205 to a8f6163 Compare October 29, 2025 05:21

aminghadersohi added 15 commits October 30, 2025 20:24

fix(codeql): remove config-file reference from workflow

6c96a12

Remove reference to deleted codeql-config.yml file. CodeQL will now run with default configuration and detect all security issues without suppressions.

chore: trigger CodeQL re-analysis

1ac489f

fix(security): use urlparse in test to avoid CodeQL substring warning

a61ff52

aminghadersohi force-pushed the feat/mcp_service_pr8 branch from a8f6163 to 07e57a7 Compare October 31, 2025 00:32

aminghadersohi closed this Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mcp): PR8 - add security layer, type safety, and test infrastructure #35841

feat(mcp): PR8 - add security layer, type safety, and test infrastructure #35841

Uh oh!

aminghadersohi commented Oct 25, 2025

Uh oh!

korbit-ai bot commented Oct 25, 2025

Uh oh!

bito-code-review bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Check warning

Uh oh!

codecov bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

aminghadersohi commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(mcp): PR8 - add security layer, type safety, and test infrastructure #35841

feat(mcp): PR8 - add security layer, type safety, and test infrastructure #35841

Uh oh!

Conversation

aminghadersohi commented Oct 25, 2025

SUMMARY

Security Layer (313 lines)

Type Safety (59 lines)

Test Infrastructure (626 lines)

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Prerequisites

Running the Test Suite

Expected Test Results

Manual Security Testing

Pre-Commit Validation

Verification Checklist

ADDITIONAL INFORMATION

Uh oh!

korbit-ai bot commented Oct 25, 2025

Uh oh!

bito-code-review bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check warning

Uh oh!

Uh oh!

codecov bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aminghadersohi commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bito-code-review bot commented Oct 25, 2025 •

edited

Loading

codecov bot commented Oct 25, 2025 •

edited

Loading