Bugfix: Ensure valid regex patterns in tool schemas#169
Merged
Conversation
…ibility Wrap schema retrieval in a new toolSchema helper that converts POSIX character classes and (?flags)pattern syntax into JavaScript-compatible forms, and ensures all strings are valid UTF-8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrite toJSRegex as a pipeline over the patterns StringPattern`PatternConvert
produces: strip leading (?flags), drop scope-less inner (?-m-s) modifiers,
map all 12 POSIX character classes (including the nested [[:a:][:b:]] form),
convert \A/\z/\Z anchors, normalize \x{HEX} escapes to \xNN/\uNNNN, and
rewrite unescaped dots to [\s\S] when dotall was set.
Fixes the previous behavior where the common "(?ms).*" pattern emitted for
every plain-string tool parameter was wrapped in literal-regex delimiters
("/.*/ms"), which JSON Schema validators treat as literal forward slashes
and reject.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move toJSRegex and its helper functions into Kernel/Utilities.wl and expose it through the shared Common context, since it's a general-purpose regex utility rather than MCP-server-specific. Tests are relocated to a new Tests/Utilities.wlt file accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap the conversion pipeline in Enclose/Confirm to surface any unexpected step failures, expand the header comment with the accepted non-goals (multiline anchors, PCRE-only constructs, missing u-flag), and record a follow-up to warn on unhandled patterns during server creation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `(?ms).*` pattern matches any string, so it adds no validation constraint. Drop it entirely instead of routing through toJSRegex, which can't meaningfully convert inline-flag syntax to JavaScript. Also corrects the comment describing `safeString` — it filters private-use characters, not UTF-8 validity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap the body in Enclose/throwInternalFailure and ConfirmBy the toJSRegex result, so a non-string return surfaces as an internal failure rather than silently producing an invalid schema. Add inline comments explaining why the `(?ms).*` pattern is safe to drop and what the remaining branch handles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover the regression from the recent regex-sanitization bugfix plus behavior for plain String, Number/Integer/Boolean, optional parameters, Help-as-description, enumerations, and PUA-character escaping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `Kernel/Utilities.wl` to the project structure listing in AGENTS.md and to the Related Files section of docs/tools.md, noting that `toolSchema` routes tool schema "pattern" fields through `toJSRegex` for JSON Schema / JavaScript compatibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes invalid JSON Schema "pattern" strings emitted for MCP tool input schemas by adding a conversion pipeline that rewrites ICU/PCRE-style regex output into a JS/ECMA-262-compatible form, and by routing tool schema generation through that sanitizer.
Changes:
- Added
toJSRegex(Common-exported, implemented inKernel/Utilities.wl) to convert common ICU/PCRE constructs into JS-compatible regex strings. - Added
toolSchemainKernel/StartMCPServer.wland updated tool listing to sanitize"pattern"fields (and drop the redundant(?ms).*match-any pattern). - Added unit tests for both utilities and server behavior, plus documentation and spelling dictionary updates.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| TODO.md | Added a follow-up TODO about warning on unhandled regex constructs. |
| Tests/Utilities.wlt | New test coverage for toJSRegex conversions (POSIX classes, anchors, escapes, dotall rewriting). |
| Tests/StartMCPServer.wlt | New regression/behavior tests for toolSchema output patterns and schema structure. |
| Kernel/Utilities.wl | Implements toJSRegex and helper conversion routines. |
| Kernel/StartMCPServer.wl | Uses toolSchema for MCP tool inputSchema, sanitizing "pattern" values. |
| Kernel/CommonSymbols.wl | Exports toJSRegex from the Common context. |
| docs/tools.md | Documents the new sanitizer path (toolSchema) and toJSRegex. |
| AGENTS.md | Adds Utilities module mention including toJSRegex. |
| .cspell.json | Adds regex-related words used in code/tests/docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Match only the "(?:(?-...)" wrapper form emitted by PatternConvert around RegularExpression[] contents. The previous pattern would strip any "(?-m-s)"-style token anywhere in the regex, which could silently change semantics of user-supplied patterns that intentionally use inline modifiers mid-pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the "\u{NNNNN}" escape (which is only valid under the JS regex "u"
flag) with a UTF-16 surrogate pair of "\uXXXX" escapes. JSON Schema
validators that do not set the "u" flag still match the composed
supplementary character via the surrogate pair, so converted patterns
stay valid without requiring any flag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
convertHexEscape routed zero-padded escapes like "\x{0000A0}" into the
surrogate-pair path because it branched on the hex string's length. The
SupplementaryRange assert in supplementaryToSurrogatePair then failed.
Parse the code point once and branch on its numeric value so any
length/zero-padding form of \x{...} is handled correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
"pattern"fields were emitted in ICU/PCRE forms (POSIX character classes,(?flags)pattern,\A/\z/\Z,\x{HEX}, etc.) that JSON Schema / ECMA 262 validators reject — most visibly, every plain-string parameter shipped(?ms).*, which was wrapped as/.*/msand treated as literal forward slashes.toJSRegexutility inKernel/Utilities.wlthat pipelines the output ofStringPattern`PatternConvertinto JS-compatible regex (strips leading(?flags), drops scope-less(?-m-s)modifiers, maps all 12 POSIX classes including nested[[:a:][:b:]], converts\A/\z/\Z, normalizes\x{HEX}to\xNN/\uNNNN, and rewrites unescaped.under dotall), and atoolSchemahelper inStartMCPServer.wlthat routes pattern fields through it and drops the redundant(?ms).*entirely.Enclose/Confirmso unexpected inputs surface as internal failures instead of silently producing invalid schemas. Added unit tests inTests/Utilities.wltandTests/StartMCPServer.wlt, and referenced the new module inAGENTS.mdanddocs/tools.md.Test plan
TestReportonTests/Utilities.wltpasses (coverstoJSRegexacross POSIX classes, flag stripping, anchor conversion, hex escapes, dotall rewriting)TestReportonTests/StartMCPServer.wltpasses (coverstoolSchemafor plain String, Number/Integer/Boolean, optional params, Help-as-description, enumerations, PUA escaping, and the(?ms).*regression)CodeInspectorclean onKernel/Utilities.wl,Kernel/StartMCPServer.wl, and the test files🤖 Generated with Claude Code