fix(mcp): relax column name regex, improve generate_chart validation errors and examples#39915
Draft
aminghadersohi wants to merge 6 commits intoapache:masterfrom
Draft
Conversation
…errors and examples - Remove overly strict regex pattern from ColumnRef.name, FilterConfig.column, and BigNumberChartConfig.temporal_column — sanitize_name/sanitize_column already handle XSS/SQL injection; the pattern rejected valid column names like "1Q_revenue" (digit-prefixed) or "order-date" (hyphenated) - Extend generate_chart docstring with usage examples for all supported chart types: pie, big_number (with/without trendline), pivot_table, mixed_timeseries, handlebars - Improve _enhance_validation_error fallback in SchemaValidator to produce type-specific, actionable messages instead of raw pydantic error strings (extract _format_single_error helper to reduce cyclomatic complexity) - Add tests verifying digit-prefixed/hyphenated column names now pass, and that XSS/SQL injection is still blocked by sanitize_name()
- FilterConfig.column: add check_sql_keywords=True to sanitize_column (Copilot review: sanitize_column was missing SQL keyword checking) - BigNumberChartConfig.temporal_column: add sanitize_temporal_column field_validator using sanitize_user_input with check_sql_keywords=True (Copilot review: no validator after regex removal left field unprotected) - generate_chart docstring IMPORTANT: list all chart types, not just xy/table (Copilot review: IMPORTANT section was misleading after adding more examples) - Fix test_xss_attempt_blocked: nh3 strips HTML tags instead of rejecting, so rename to test_xss_tags_are_stripped (asserts tag is removed) and add test_event_handler_injection_blocked (on...= patterns ARE rejected) - Fix _format_single_error literal_error: preserve pydantic 'Input should be' message instead of replacing with custom format (broke existing test test_non_value_error_pydantic_body_is_surfaced) - Add test_sql_injection_in_filter_column_blocked to verify FilterConfig now rejects SQL injection column names
- Remove unused 'type: ignore[return-value]' from sanitize_temporal_column (mypy correctly infers the return type; comment was unnecessary) - Fix test_xss_tags_are_stripped → test_script_tag_blocked: nh3 strips the entire script element including its content, leaving an empty string that the allow_empty=False guard then rejects with ValidationError
Closed
2 tasks
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
…ner text nh3.clean() removes HTML tag delimiters but preserves the text content between them, so '<script>alert(1)</script>' becomes 'alert(1)' rather than an empty string. Update the test to assert the tag is stripped (not that a ValidationError is raised).
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #39915 +/- ##
==========================================
- Coverage 64.37% 63.87% -0.50%
==========================================
Files 2569 2583 +14
Lines 134745 136692 +1947
Branches 31278 31519 +241
==========================================
+ Hits 86739 87313 +574
- Misses 46508 47863 +1355
- Partials 1498 1516 +18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
nh3 behavior for '<script>alert(1)</script>' varies by version: - some versions strip entire element (empty → ValidationError) - others strip only tag delimiters (preserving 'alert(1)') Accept both outcomes: no ValidationError means no <script> tag stored.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses validation rigidity in the
generate_chartMCP tool that caused unnecessary failures when using valid but unconventionally-named columns.Changes:
Relax column name regex — Remove the
pattern=r"^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$"constraint fromColumnRef.name,FilterConfig.column, andBigNumberChartConfig.temporal_column. Many real-world column names (digit-prefixed like1Q_revenue, hyphenated likeorder-date) were silently rejected with cryptic pydantic errors. The existingsanitize_name()/sanitize_column()validators already block XSS and SQL injection — the regex added no security value and only hurt usability.Add
sanitize_temporal_columnvalidator —BigNumberChartConfig.temporal_columnnow has a field_validator usingsanitize_user_inputwithcheck_sql_keywords=True, matching the protection level ofColumnRef.sanitize_name.Add
check_sql_keywords=TruetoFilterConfig.sanitize_column— ensures SQL injection patterns are blocked for filter column names.Extend docstring examples — Add
generate_chartusage examples for all supported chart types:pie,big_number(with and without trendline),pivot_table,mixed_timeseries,handlebars. Update IMPORTANT section to list all 7 supported chart types.Improve validation error messages — Extract
_format_single_errorhelper from_enhance_validation_error(reduces cyclomatic complexity) and make the fallback produce type-specific, actionable messages forstring_pattern_mismatch,missing, andvalue_errorpydantic error types.literal_errorpreserves the original pydantic "Input should be ..." message.Tests — New
TestColumnRefNameRelaxedPatternandTestFilterConfigColumnRelaxedPatternclasses verify: digit-prefixed and hyphenated column names now pass; script-tag XSS is blocked (nh3 strips to empty, empty-value guard rejects); event-handler injection is blocked; SQL injection is blocked;FilterConfigSQL injection is blocked.Testing
pytest tests/unit_tests/mcp_service/chart/test_chart_schemas.py -xgenerate_chartwith a column named1Q_revenueororder-datesucceeds