fix(toolcall): eliminate strings.ToLower panics from Unicode case folding#460
Conversation
Avoid lowercasing ignored XML tails in toolcall
…ding Replace all strings.ToLower usage with ASCII case-insensitive matching (hasASCIIPrefixFoldAt, indexASCIIFold, hasDSMLPrefix) to prevent slice bounds errors when Unicode characters change byte length after case folding (e.g., Turkish İ U+0130 → i + combining dot: 2 bytes → 3 bytes). Root cause: code created a strings.ToLower(text) copy, found byte positions in that copy, then used those positions to slice the original text — byte offsets that were valid in the lowercased copy became out-of-bounds in the original when case folding changed byte lengths. Files changed: - toolcalls_scan.go: remove 5 lower usages, add hasDSMLPrefix - toolcalls_parse_markup.go: remove 3 lower usages, add indexASCIIFold - toolcalls_markup.go: SanitizeLooseCDATA lower removal - toolcalls_parse.go: updateCDATAStateForStrip lower removal - tool_prompt.go: align DSML pipe characters with tool call spec - tool_prompt_test.go: fix pre-existing test character mismatch
|
@waiwaic is attempting to deploy a commit to the cjack's projects Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e00e482a6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
感谢您的项目!这个 PR 修复了 如果方便的话,希望能给我 dev 分支的推送权限,这样后续贡献修复和功能时可以更高效,不必每次都走 fork PR 流程。非常感谢! |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The closing tag format was <|/DSML|tag> but must be </|DSML|tag>. The scanner's closing-tag detection checks text[1] == '/', so the slash must come immediately after '<', before the first full-width pipe (U+FF5C). Tags like <|/DSML|tool_calls> would not set closing=true and would not match any tool markup name. Files fixed: - internal/toolcall/tool_prompt.go: all closing tags - internal/promptcompat/prompt_build_test.go: 1 test expectation
感谢您的PR 正如上面的review,对于标签格式的误改动可以提前发现 |
|
hasASCIIPrefixFoldAt和hasDSMLPrefix函数有重复,可以提出来合并一起 |
PR #460 introduced fullwidth pipe characters (|) in DSML tool call formatting to improve parsing robustness, but models exposed to these fullwidth pipes in system prompts exhibit significantly higher rates of tool output hallucinations. Reverting to halfwidth pipes (|) drastically reduces tokenizer/perplexity-driven hallucinations while retaining the existing confusable-hardening in the parser. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Replace all
strings.ToLower(text)usage in the toolcall scanner with ASCII case-insensitive matching helpers (hasASCIIPrefixFoldAt,indexASCIIFold,hasDSMLPrefix) to preventslice bounds out of rangepanics when Unicode characters change byte length after case folding.Root cause: Code created a
strings.ToLower(text)copy, found byte positions in that copy, then used those positions to slice the original text. When case folding changed byte lengths (e.g., Turkish İ U+0130 → i + combining dot: 2 bytes → 3 bytes), offsets valid in the lowercased copy became out-of-bounds in the original.Impact: Panic at
consumeToolMarkupNamePrefixOnce(toolcalls_scan.go:240) when processing tool markup containing certain Unicode characters.Changes
internal/toolcall/toolcalls_scan.golowerusages, addhasDSMLPrefixhelperinternal/toolcall/toolcalls_parse_markup.golowerusages, addindexASCIIFoldhelperinternal/toolcall/toolcalls_markup.goSanitizeLooseCDATAlower removalinternal/toolcall/toolcalls_parse.goupdateCDATAStateForStriplower removalinternal/toolcall/tool_prompt.gointernal/toolcall/tool_prompt_test.gointernal/promptcompat/prompt_build_test.gointernal/httpapi/claude/handler_util_test.goTest Plan
go test ./internal/toolcall/...— all 55+ tests passgo test ./...— full suite passesbash scripts/lint.sh— 0 issuesnpm run build --prefix webui— builds successfully