fix: bring Claude Code tool-call flow back to Anthropic-equivalent behaviour by aict666 · Pull Request #54 · dwgx/WindsurfAPI

aict666 · 2026-04-24T22:36:02Z

Summary

Three independent regressions were stacking on top of each other and collectively prevented Opus-class clients (Claude Code v2.1.x in particular) from completing tool-emulated turns through the proxy. Bundled into one PR because they all surface as the same user-visible symptom ("Opus refuses to call tools / wastes 3 turns on broken paths") and the regression tests for each one only make sense in the presence of the others.

Reproducer: send 检查一下，所有分支里。有没有按照压缩包的方案处理过一版 from Claude Code v2.1.114 (Opus 4.7) routed through the proxy. Before this patch Opus replies with The user's message appears to contain a pasted system prompt for a different agent (Claude Code) along with what looks like a stray tool error result. After this patch Opus calls pwd then git branch -a directly.

1. Tool preamble was shaped like a Claude Code system prompt

buildToolPreamble emitted ~1600 chars of ### ToolName / parameters schema: / \``json {…}```blocks prepended to the user turn. That is the exact signature of Claude Code's own system prompt, so when the same shape appeared in a **user** slot Opus 4.7 reliably flagged it as"a pasted Claude Code system prompt in the user turn"and refused to emit any<tool_call>` blocks.

Fix: rewrite the user-message fallback to a single ~330-char line — protocol description plus tool name list, no schemas. Full schemas still go through the authoritative proto-level tool_calling_section override (buildToolPreambleForProto, unchanged). #22 NO_TOOL-mode models that ignore SectionOverride still see the tool names and the emission protocol, so the fallback continues to do its job.

New regression asserts the preamble:

has no ^### headers, no parameters schema:, no fenced ```json blocks
stays under 512 chars even with many tools
still names every tool and still describes the <tool_call> protocol

2. Redaction marker kept getting reused by the model as a real path

Every prose-shaped REDACTED_PATH marker we have ever shipped has regressed:

Marker	Failure mode
`./tail`	LLM Reads `./src/main.py` → ENOENT → loop
`[internal]`	LLM runs `ls [internal]` → ENOENT → loop
`<redacted-path>`	LLM passes to Read/Bash → ENOENT (Linux) / Errno 22 (Windows) → loop
`(internal path redacted)`	zsh parses `cd (…)` as glob-qualifier syntax → cryptic `unknown file attribute: i` → Opus gets confused, stops calling tools
`redacted internal path`	Opus echoes verbatim into bash: `cd redacted internal path && git…` → `cd: too many arguments` → wastes 2-3 turns before finally calling `pwd`

Common cause: any marker that contains ASCII word characters reads as a plausible path or identifier to the model, and the model puts it back into the next tool_call.

Fix: switch REDACTED_PATH to a single Unicode ellipsis (U+2026, …).

Zero ASCII word chars
No shell metacharacter
Three UTF-8 bytes
Universally read by humans and LLMs as "content omitted"
Essentially no training data of cd … or Read("…") as a real operation, so the reuse-as-path loop never starts
If the model still echoes it into shell, the failure mode is a single clean ENOENT instead of a cryptic glob-qualifier error

The new marker-shape regression suite makes future regressions impossible by asserting:

No shell metacharacters in ()[]{}<>|&;$`\"'\\*?
No / and no \\
Zero ASCII word characters [A-Za-z0-9_] — closes the door on every prose-shaped marker we have already regressed through.

3. `neutralizeCascadeIdentity` missed common Cascade-isms

Existing patterns covered I am Cascade / I'm Cascade / developed by Codeium but missed other common third-person leaks the upstream planner produces. Added:

Cascade is an? (?:AI )?(?:coding )?assistant
(?:As|Acting as) Cascade
(?:Codeium|Windsurf)('s)? Cascade
built by (?:Codeium|Windsurf)
(?:the )?Cascade('s)? workspace → the workspace
(the leading article is consumed by the same regex so we never produce the the the workspace double-article artefact)

Exported neutralizeCascadeIdentity from chat.js (matches the existing shouldUseCascadeReuse export pattern) and added 7 unit tests covering first-person, third-person, provider attribution, workspace narration, the dictionary-word negative case (a cascade of rocks), unknown-provider passthrough, and falsy inputs.

Tests

npm test — 80/80 pass (72 existing + 7 identity + 1 marker-shape).

Risk

Low. All three changes are purely cosmetic on the wire — the protocol contract, parser, streaming pipeline, tool-result round-trip, and proto-level injection are untouched. Behaviour change is strictly less aggressive: the preamble is gentler, the marker is shorter, the identity rewrites are more thorough. No new dependencies, no API changes.

…haviour Three independent regressions were stacking on top of each other and collectively prevented Opus-class clients (Claude Code v2.1.x in particular) from completing tool-emulated turns through the proxy. This patch fixes all three together because they all surfaced as the same user-visible symptom — "Opus refuses to call tools / wastes 3 turns on broken paths" — and the regression tests for each one only make sense in the presence of the others. 1. tool preamble was shaped like a Claude Code system prompt ───────────────────────────────────────────────────────────── buildToolPreamble emitted ~1600 chars of `### ToolName / parameters schema: / ```json {…}``` blocks prepended to the user turn. That is the exact signature of Claude Code's own system prompt, so when it appeared in a user slot Opus 4.7 reliably flagged it as "a pasted Claude Code system prompt in the user turn" and refused to emit any <tool_call> blocks. Verified end-to-end with the user's reproducer. Fix: rewrite the user-message fallback to a single ~330-char line — protocol description plus tool name list, no schemas. Full schemas still go through the authoritative proto-level tool_calling_section override (buildToolPreambleForProto, unchanged). dwgx#22 NO_TOOL-mode models that ignore SectionOverride still see the tool names and the emission protocol, so the fallback continues to do its job. 2. redaction marker was being re-used by the model as a real path ───────────────────────────────────────────────────────────────── Every prose-shaped REDACTED_PATH marker we have ever shipped has regressed in some way: ./tail LLM Reads ./src/main.py → ENOENT loop [internal] LLM runs `ls [internal]` → ENOENT loop <redacted-path> LLM passes to Read/Bash → ENOENT (Linux) / Errno 22 (Windows) → loop (internal path redacted) zsh `cd (… )` → glob qualifier syntax → "unknown file attribute: i" → Opus gets confused, stops calling tools redacted internal path Opus echoes verbatim into bash: `cd redacted internal path && git…` → `cd: too many arguments` → wastes 2-3 turns before finally calling pwd Common cause: every marker that contains ASCII word characters reads as a plausible path or identifier to the model, and the model puts it back into the next tool_call. Fix: switch REDACTED_PATH to a single Unicode ellipsis (U+2026, '…'). Zero ASCII word chars, no shell metacharacter, three UTF-8 bytes, universally read by humans and LLMs as "content omitted". There is essentially no training data of `cd …` or `Read("…")` as a real operation, so the reuse-as-path loop never starts. If the model still echoes it into shell, the failure mode is a single clean ENOENT instead of a cryptic glob-qualifier error. Add a marker-shape regression suite that asserts: (a) no shell metacharacters in `()[]{}<>|&;$\`"'\\*?` (b) no `/` and no `\\` (c) zero ASCII word characters [A-Za-z0-9_] — closes the door on every prose-shaped marker we have already regressed through. 3. neutralizeCascadeIdentity missed common Cascade-isms ─────────────────────────────────────────────────────── Existing patterns covered "I am Cascade" / "I'm Cascade" / "developed by Codeium" but missed other common third-person leaks the upstream planner produces. Added: - `Cascade is an? (?:AI )?(?:coding )?assistant` - `(?:As|Acting as) Cascade` - `(?:Codeium|Windsurf)('s)? Cascade` - `built by (?:Codeium|Windsurf)` - `(?:the )?Cascade('s)? workspace` → "the workspace" (the leading article is consumed by the same regex so we never produce the "the the workspace" double-article artefact) Export neutralizeCascadeIdentity from chat.js (matches the existing shouldUseCascadeReuse export pattern) and add 7 unit tests covering first-person, third-person, provider attribution, workspace narration, the dictionary-word negative case ("a cascade of rocks"), unknown-provider passthrough, and falsy inputs. Tests: 80/80 pass (72 existing + 7 identity + 1 marker-shape).

After the marker/preamble/identity fixes earlier in this PR, Claude Code v2.1.114 still confidently narrated the wrong workspace contents: when the user was sitting in `~/IdeaProjects/flux-panel` (12+ files including `flux-panel.zip`) Opus would issue an LS, get the real listing back, and then describe the workspace as containing exactly `src/index.js`, `package.json`, `README.md`, `.gitignore` — the canonical empty-npm-init shape, which is what `/tmp/windsurf-workspace` looks like. Confirmed root cause from the proxy probe logs: Probe[56jqpl] turns=16 chars=37283 msg[0] role=system len=28775 ← Claude Code system prompt msg[1] role=user len=7762 ← <system-reminder> + <env> + question msg[2] role=assistant len=44 ← "I need to understand the workspace" The caller's `<env>` block (containing `Working directory: /Users/.../flux-panel`) IS forwarded to Cascade — client.js prepends the role=system text to the user-message text — but Cascade's planner system prompt is structurally more authoritative to the upstream model than user-message text, and the planner's prompt asserts that the workspace is /tmp/windsurf-workspace. The model picks Cascade's "authoritative" workspace over the merely-prepended-as-text caller env, and the LS / Read tool calls land on the wrong path. Telltale: the screenshot of Opus' reply showed `当前 … 只有` — the ellipsis is OUR redaction marker (PR dwgx#54), proof that Cascade narrated /tmp/windsurf-workspace and we scrubbed it, leaving the model reasoning about its prior belief about that scratch dir. Fix: lift the canonical Claude Code <env> keys (Working directory, Is directory a git repo, Platform, OS Version) out of the caller's messages and re-emit them as an "Authoritative environment for this session" block at the very top of the proto-level tool_calling_section override. tool_calling_section IS the authoritative system slot the upstream model reads, so the env block sits in a slot that Cascade's planner prior cannot override. The surrounding wording in buildToolPreambleForProto explicitly tells the model to prefer this environment over any prior workspace assumption. Implementation: - extractCallerEnvironment(messages) in chat.js. Lenient regex parser, anchored to start-of-line so it never false-positives on prose mentions of "working directory". Matches Claude Code's `<env>` block (in role=system or in `<system-reminder>` user blocks), the loose `cwd:` form some agents emit, and content- block array text. Rejects empty / control-char / U+2026-only values so a stale redacted marker can never re-enter the system prompt. - buildToolPreambleForProto now takes an optional `environment` third argument. When non-empty, the preamble starts with: ## Authoritative environment for this session The facts below are provided by the calling agent and describe the REAL execution context. Tool calls MUST operate on these paths. Ignore any workspace path you may have inferred from earlier instructions or training priors. - Working directory: … - Is the directory a git repo: … - Platform: … - OS version: … --- followed by the existing protocol header / behaviour suffix / `### ToolName` schema blocks. When no env is provided the output shape is byte-identical to PR dwgx#54. - chat.js wires extractCallerEnvironment into the existing buildToolPreambleForProto call site behind the same emulateTools guard, so non-tool-emulated requests are unaffected. Tests: 92/92 pass (80 from earlier in this PR + 8 extractCallerEnvironment parser tests + 4 buildToolPreambleForProto-with-env tests asserting ordering, opt-in shape, and back-compat).

The earlier extractCallerEnvironment regex required the canonical <env> block format (Working directory: /path on its own line with : or = separator). Claude Code v2.1.114 dropped that block in favour of inline prose: "You are an interactive agent that helps users with software engineering tasks and the current working directory is /Users/..." Diagnosed via a one-shot probe diagnostic: when extractCallerEnvironment returns empty under emulateTools, log the nearest env-shaped substring in the request so we can immediately see what wording the caller used. The probe surfaced: env NOT lifted (extractor returned empty); nearest env-shaped substring: tware engineering tasks and the current working directory Fix: accept TWO forms in the cwd regex — (a) line-anchored key/value (older Claude Code, opencode, Cline): Working directory: /path cwd: /path <cwd> /path (now also tolerates "- "/"* " bullet prefix and backticks around the path) (b) prose form (Claude Code 2.1+): ...current working directory is /path (no separator, no newline anchor — the path simply trails the phrase) The captured slot is locked to [/~]... so abstract prose mentions ("the working directory you choose", "the working directory in the docs") never produce a match — the captured group must look like an actual filesystem path. The other env keys (Is directory a git repo, Platform, OS Version) keep the line-anchored form but pick up the same bullet-prefix tolerance for consistency. Capture-group handling in the loop is updated to read whichever alternative captured (match[1] || match[2]) so both forms work. The probe diagnostic itself is kept in the commit because it remains useful — the next time a client changes its env wording, the log line will surface the new form within seconds instead of hours of guesswork. Tests: 94/94 pass. New cases: - cwd from Claude Code 2.1+ prose form - cwd from prose form with backticks around the path - "abstract prose without an actual path" (negative case)

Live regression on Claude Code v2.1.114: the new prose-form regex correctly rejects the boilerplate "...the current working directory." mention (no path follows), so cwd extraction returns empty. Platform and OS Version lifting still succeed, so the env block was rendered with ONLY: ## Authoritative environment for this session Tool calls MUST operate on these paths. Ignore any workspace path you may have inferred from earlier instructions or training priors. - Platform: darwin - OS version: Darwin 25.4.0 Opus 4.7 saw an authoritative-shaped block whose body is operationally useless (no path to operate on) and pattern-matched it as injection noise: "It looks like the conversation context got mixed up — the message I received is a system prompt for Claude Code along with truncated tool output from a previous session, rather than an actual question or task from you." The model then refused to proceed. Fix: gate the entire env block on cwd presence. Platform/OS/git status without cwd are useless for the original goal (tell the model where to run tools); emitting them anyway turns the env block into noise that trips Opus' injection guard. The rule "no cwd → no block" both removes the noise and lets the model learn cwd via its own `pwd` tool call, which already works on every Anthropic-format client we have tested (verified live: Bash(pwd && ls -la) returned the real /Users/jaxyu/IdeaProjects/flux-panel listing on the same trace). Tests: 94/94 pass.

This is the actual root cause of the "context got truncated", 40KB-of- rambling, max_wait=180s, toolCalls=0 failure mode that has been chasing us across Claude Code v2.1.114 / Opus 4.7 sessions. Background. We emit OpenAI-style tools[] to Cascade by: 1. proto-level: tool_calling_section + additional_instructions_section overrides carrying the full protocol + per-tool schemas (authoritative). 2. user-message fallback: a compact "Tools available this turn: A, B, … To call one, emit <tool_call>…</tool_call>" banner prepended to the LAST user message, for models that ignore proto overrides on turn 1. normalizeMessagesForCascade also rewrites role:'tool' messages into role:'user' wrappers shaped `<tool_result tool_call_id="…">…</tool_result>` so the model sees its tool history in a single conversational format. The bug. In multi-turn tool-using conversations, by turn 3+ the LAST user message is *always* a synthetic <tool_result> wrapper, never the caller's real user input. The fallback loop happily prepends the "Tools available this turn: …" banner to that wrapper anyway, so the model sees: Tools available this turn: Bash, Read, Edit, … To call one, emit <tool_call>{"name":"…","arguments":{…}}</tool_call>. After the last <tool_call>, stop generating; the caller returns results in the next turn as <tool_result tool_call_id="…">…</tool_result>. <tool_result tool_call_id="call_42">[real tool output]</tool_result> Opus 4.7 reliably pattern-matches this as either prompt injection or a truncated conversation: the banner has the SHAPE of an opening "here are your tools" announcement but it's appearing in the middle of an active tool-using session, immediately in front of a tool_result. The model concludes the conversation is corrupted and refuses to keep calling tools. Live trace (Claude Code v2.1.114, Opus 4.7, 26 tools, real session debugging the flux-panel zip): turn= 2 Cascade done {ms= 5051 toolCalls=0 textLen=147} turn= 4 Cascade done {ms= 5045 toolCalls=0 textLen=147} turn= 6 Cascade done {ms= 5041 toolCalls=0 textLen=147} turn= 8 Cascade done {ms= 5545 toolCalls=0 textLen=147} … turn=22 Cascade done {ms=180163 toolCalls=0 textLen=42474 reason=max_wait polls=357} Three minutes of confused prose, zero tool calls, then max_wait. The visible output to the user was "It looks like the conversation got truncated and I'm seeing fragments of tool output without a clear current request from you." — exact words. Fix. Skip the user-message preamble injection when the target user message starts with `<tool_result`. The proto-level overrides already carry the authoritative protocol; the fallback only exists to bootstrap the very first turn for models that ignore proto. Replaying it on every later turn is pure noise, and (now confirmed) actively harmful. Tests (97 → 97 still pass; 3 new regression cases): - injects preamble on a first-turn real user message - does NOT inject preamble when the last user message is a synthetic tool_result - still injects on the latest real user turn even when older turns contain tool_results

Stop guessing. When the polling loop times out instead of seeing a clean idle_done, the model has been generating tokens for ~3 minutes straight without yielding a stop signal. We have no idea what those tokens look like — the existing summary just shows textLen=34201 and moves on. Dumping a bounded head + tail of the accumulated text gives us the shape of the failure (generation loop? rejected tool-call format? mid-thought truncation?) in a single log line, which is what we need to actually fix the underlying cause instead of chasing preamble shapes.

…ly-zhang to S+ - baily-zhang PR #61 (Opus 4.7 multimodal context bloat) — third major contribution after #36 and #45, now de-facto maintainer of the reuse-fingerprint / trajectory-offset machinery - abwuge PR #58 (docker/nginx deploy fix) — first-time contributor, +3/-2 surgical, unblocked the docker-compose Restart loop - aict666 PR #54 (tool preamble slimming + redact marker 6th-gen U+2026 ellipsis + identity coverage extension) — fourth major contribution - aict666 PR #53 (redact marker shell-safety regression) — second contribution, was missing from the prior credits update - baily-zhang upgraded from S to S+ (parity with aict666)

The Pages site at dwgx.github.io/WindsurfAPI/ had only 4 names listed in the footer (dd373156, colin1112a, motto1, youfak). 8 contributors were missing from the public site even though most of them landed S+/S level fixes (aict666 #44/#51/#53/#54, baily-zhang #36/#45/#61, smeinecke #43, abwuge #58). Adds a dedicated `#contributors` section before the footer with one card per contributor: avatar, GitHub link, weight badge (S+/S/A+/A/B+), PR list, and a one-paragraph 繁體中文 description of what each fix actually solved. Cards reuse the existing panel-card warm/coral palette to fit the site's aesthetic. Footer one-liner is also expanded to all 8 names ordered by weight, with a "完整名單 ↑" anchor back to the new section. CSS additions: contrib-grid, contrib-card, contrib-avatar, contrib-weight + 5 weight-tier classes (-S-plus, -S, -A-plus, -A, -B-plus). All gradient/hover behaviour matches the existing panel-card styling.

baily-zhang's fourth landed PR (#36 / #45 / #61 / #62) — adding the PR #62 entry to the dashboard credits panel as a separate card so the inline-script regression-test win is visible alongside the cascade-reuse machinery work. v2.0.5 covers everything since 2.0.4: - aict666 #54 tool preamble slim + redact U+2026 - abwuge #58 docker/nginx deploy fix - baily #61 Opus 4.7 multimodal context bloat - baily #62 dashboard escape regression - own commits: empty-message validator, internal_error backoff, upstream_transient_error category, Opus 4.6 reuse widening, /v1/responses endpoint for Codex CLI compatibility (#56, #63)

aict666 added 6 commits April 25, 2026 06:35

dwgx merged commit 9a390c1 into dwgx:master Apr 25, 2026
2 checks passed

dwgx mentioned this pull request Apr 25, 2026

[Bug] 客户端是Openclaw时报错 #28

Closed

dwgx mentioned this pull request Apr 25, 2026

[Bug] Claude Code 工具调用失败 #47

Closed

aict666 deleted the fix/anthropic-equivalence-batch branch April 26, 2026 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bring Claude Code tool-call flow back to Anthropic-equivalent behaviour#54

fix: bring Claude Code tool-call flow back to Anthropic-equivalent behaviour#54
dwgx merged 6 commits intodwgx:masterfrom
aict666:fix/anthropic-equivalence-batch

aict666 commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aict666 commented Apr 24, 2026

Summary

1. Tool preamble was shaped like a Claude Code system prompt

2. Redaction marker kept getting reused by the model as a real path

3. neutralizeCascadeIdentity missed common Cascade-isms

Tests

Risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3. `neutralizeCascadeIdentity` missed common Cascade-isms