fix(opencode): repair common tool-input shape failures before retry#29412
Open
paymog wants to merge 1 commit into
Open
fix(opencode): repair common tool-input shape failures before retry#29412paymog wants to merge 1 commit into
paymog wants to merge 1 commit into
Conversation
Contributor
|
The following comment was made by an LLM, it may be inaccurate: Related PR Found:
Note on #29361:
No duplicate PRs found with the same repair functionality. |
Open-weight models (deepseek, glm, qwen, ...) emit a small, repeatable set of shape mistakes in tool-call arguments: null at optional fields, stringified JSON arrays, empty-object placeholders, and bare scalars in array positions. The current "Invalid tool input" prose error is rarely enough for the model to find the fix on its own; it loops on the same malformed call. Adds a validate-then-repair layer: schema decode runs unchanged, valid inputs are never touched. On failure, the parse error's own issue tree localizes the bug, four targeted shape repairs run at the failing paths, and we re-decode. Loops until the input parses, no further repair applies, or the round bound is hit; on terminal failure the original schema-level error is what reaches the model. Adds FilePathInput for fields that flow to fopen/stat. A small fraction of model emissions wrap paths in markdown auto-links like "[notes.md](http://notes.md)" - a post-training distribution leak from chat output applied where it makes no sense. FilePathInput unwraps only the degenerate case where link text equals the URL with the protocol stripped; real markdown passes through. read tool: surfaces the offset/limit pairing decision back to the model when only one was provided. The runtime already defaults the missing side; previously the model had no signal that this happened and could not self-correct on the next turn. Closes anomalyco#26498
eb2d3cc to
f8df27a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #26498
Type of change
What does this PR do?
Adds a validate-then-repair layer for tool-call arguments. When the schema decode of an LLM-emitted tool call fails, the parse error's own issue tree is walked to locate the failing paths, four targeted shape repairs are tried at those paths, and the input is re-decoded. Successful inputs are never touched.
The four repairs cover the failure modes that repeatedly come up against open-weight models in opencode issues (#26498, #26913, #24566, #22329, #24722) and similar codebases:
nullat an optional field → drop the key'["a","b"]'→ parse to a real array{}at an optional field → drop the keyInvalidType/AnyOfleaf) → wrap as[scalar]Ordering is load-bearing: (2) must precede (4), or a stringified array would be double-wrapped into
['["a","b"]']. The order is encoded in a singlerepairAtswitch so it stays a single source of truth.Effect Schema short-circuits at the first failing element of an array or struct, so a single repair pass can only fix one path.
recover()loops decode → repair → decode up to 6 rounds; if no repair applies or the bound is hit, it surfaces the original schema-level error to the model, never a repair-induced cascade.On success,
Effect.annotateCurrentSpan("tool.input_repaired", ...)records which repairs fired at which paths, so per-(model, tool) repair rates can be watched in telemetry and regressions on a specific contract surface before users notice.Two adjacent fixes for related failure modes:
FilePathInput(inpackages/core/src/schema.ts): a schema for fields that flow to filesystem operations. Some open-weight models occasionally emit paths wrapped in markdown auto-links like"[notes.md](http://notes.md)"— a post-training distribution leak from chat-style output applied where it makes no sense.FilePathInputunwraps only the degenerate case (link text equals URL with protocol stripped); real markdown such as[click](https://example.com)passes through untouched. Used byread,write,edit, andlsptools. Encoding the intent at the schema level plugs the leak for every path field at once.readtool pairing hint: the runtime already defaultsoffset=1andlimit=2000when one is omitted, but previously the model had no signal that this happened and could not self-correct on the next turn. Adds an informational note (noError:prefix, so the TUI does not paint it red) inside the<content>block when only one of the pair was provided, telling the model what default was applied.How this differs from #26496
#26496 closes the same issue with a DeepSeek-specific prompt-section that tells the model the expected JSON shape. That helps but it depends on the model following the instruction, scales linearly with the number of models, and pays a token cost on every turn. This PR fixes the contract at the parse boundary instead — the schema is the prior, repairs run only at paths the schema explicitly disagreed with, valid inputs pay zero overhead, and the same code path covers any model whose failures fall in the same shape catalogue. The two approaches are not mutually exclusive; the prompt-side change could still ship alongside this for the cases where prevention is cheaper than repair.
How did you verify your code works?
Tests (all in this PR):
packages/opencode/test/tool/repair.test.ts— 13 tests covering each repair in isolation, the load-bearing ordering case ('["a","b"]'must parse, not wrap), therecover()loop converging on a multi-path bad payload, and the unrepairable-failure case surfacing the original error.packages/core/test/schema.test.ts— 7 tests forFilePathInputcovering: plain paths pass through, degenerate auto-link unwraps in three URL-shape variants, real markdown links preserved, embedded auto-links inside longer paths handled correctly, the description annotation reaching the JSON Schema output.Existing tests:
bun test test/tool/— full opencode tool test suite (303 tests including JSON Schema snapshot tests for every tool, parameter accept/reject, and the existingTool.defineregression forInvalidArgumentsError) passes unchanged.bun typecheckclean across bothpackages/opencodeandpackages/core.The
FilePathInputdescription annotation is placed on the encoded side of thedecodeTotransformation so the JSON Schema emitted to the LLM still carries it — the snapshot tests forread,write,edit, andlspconfirm this.Screenshots / recordings
N/A — no UI change.
Checklist