Skip to content

fix: sanitize unpaired Unicode surrogates in recall and temporal storage#68

Merged
BYK merged 1 commit intomainfrom
fix/sanitize-surrogates
Apr 14, 2026
Merged

fix: sanitize unpaired Unicode surrogates in recall and temporal storage#68
BYK merged 1 commit intomainfrom
fix/sanitize-surrogates

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented Apr 14, 2026

Summary

Fixes The request body is not valid JSON: no low surrogate in string error when using the recall tool.

  • Root cause: Tool outputs (bash, grep, read) can contain unpaired Unicode surrogates from binary file content. These survive through partsToText() into the temporal_messages DB, and when the recall tool includes them in its response, JSON serialization for the LLM API fails.
  • Fix: New sanitizeSurrogates() in src/markdown.ts replaces unpaired surrogates (high without low, lone low) with U+FFFD. Applied at two layers:
    • Ingestion: partsToText() in src/temporal.ts — prevents bad data from entering the DB
    • Output: inline() in src/markdown.ts — sanitizes any pre-existing bad data when formatting recall results
  • 8 new tests covering surrogate sanitization edge cases and JSON round-trip safety

…all output

Tool outputs (bash, grep, read) can contain unpaired surrogates from binary
file content. These survive into the DB and break JSON serialization when
included in recall tool responses ('no low surrogate in string').

Fix: sanitize at ingestion (partsToText) and output (inline) using a regex
that replaces unpaired surrogates with U+FFFD.
@BYK BYK enabled auto-merge (squash) April 14, 2026 14:22
@BYK BYK mentioned this pull request Apr 14, 2026
@BYK BYK merged commit 8990b55 into main Apr 14, 2026
1 check passed
@BYK BYK deleted the fix/sanitize-surrogates branch April 14, 2026 14:23
@craft-deployer craft-deployer bot mentioned this pull request Apr 14, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant