fix: bring Claude Code tool-call flow back to Anthropic-equivalent behaviour#54
Merged
dwgx merged 6 commits intodwgx:masterfrom Apr 25, 2026
Merged
Conversation
…haviour
Three independent regressions were stacking on top of each other and
collectively prevented Opus-class clients (Claude Code v2.1.x in
particular) from completing tool-emulated turns through the proxy. This
patch fixes all three together because they all surfaced as the same
user-visible symptom — "Opus refuses to call tools / wastes 3 turns on
broken paths" — and the regression tests for each one only make sense
in the presence of the others.
1. tool preamble was shaped like a Claude Code system prompt
─────────────────────────────────────────────────────────────
buildToolPreamble emitted ~1600 chars of `### ToolName / parameters
schema: / ```json {…}``` blocks prepended to the user turn. That is
the exact signature of Claude Code's own system prompt, so when it
appeared in a user slot Opus 4.7 reliably flagged it as
"a pasted Claude Code system prompt in the user turn" and refused to
emit any <tool_call> blocks. Verified end-to-end with the user's
reproducer.
Fix: rewrite the user-message fallback to a single ~330-char line —
protocol description plus tool name list, no schemas. Full schemas
still go through the authoritative proto-level tool_calling_section
override (buildToolPreambleForProto, unchanged). dwgx#22 NO_TOOL-mode
models that ignore SectionOverride still see the tool names and the
emission protocol, so the fallback continues to do its job.
2. redaction marker was being re-used by the model as a real path
─────────────────────────────────────────────────────────────────
Every prose-shaped REDACTED_PATH marker we have ever shipped has
regressed in some way:
./tail LLM Reads ./src/main.py → ENOENT loop
[internal] LLM runs `ls [internal]` → ENOENT loop
<redacted-path> LLM passes to Read/Bash → ENOENT (Linux)
/ Errno 22 (Windows) → loop
(internal path redacted) zsh `cd (… )` → glob qualifier syntax →
"unknown file attribute: i" → Opus
gets confused, stops calling tools
redacted internal path Opus echoes verbatim into bash:
`cd redacted internal path && git…` →
`cd: too many arguments` → wastes 2-3
turns before finally calling pwd
Common cause: every marker that contains ASCII word characters reads
as a plausible path or identifier to the model, and the model puts
it back into the next tool_call.
Fix: switch REDACTED_PATH to a single Unicode ellipsis (U+2026, '…').
Zero ASCII word chars, no shell metacharacter, three UTF-8 bytes,
universally read by humans and LLMs as "content omitted". There is
essentially no training data of `cd …` or `Read("…")` as a real
operation, so the reuse-as-path loop never starts. If the model
still echoes it into shell, the failure mode is a single clean
ENOENT instead of a cryptic glob-qualifier error.
Add a marker-shape regression suite that asserts:
(a) no shell metacharacters in `()[]{}<>|&;$\`"'\\*?`
(b) no `/` and no `\\`
(c) zero ASCII word characters [A-Za-z0-9_] — closes the door on
every prose-shaped marker we have already regressed through.
3. neutralizeCascadeIdentity missed common Cascade-isms
───────────────────────────────────────────────────────
Existing patterns covered "I am Cascade" / "I'm Cascade" / "developed
by Codeium" but missed other common third-person leaks the upstream
planner produces. Added:
- `Cascade is an? (?:AI )?(?:coding )?assistant`
- `(?:As|Acting as) Cascade`
- `(?:Codeium|Windsurf)('s)? Cascade`
- `built by (?:Codeium|Windsurf)`
- `(?:the )?Cascade('s)? workspace` → "the workspace"
(the leading article is consumed by the same regex so we never
produce the "the the workspace" double-article artefact)
Export neutralizeCascadeIdentity from chat.js (matches the existing
shouldUseCascadeReuse export pattern) and add 7 unit tests covering
first-person, third-person, provider attribution, workspace
narration, the dictionary-word negative case ("a cascade of rocks"),
unknown-provider passthrough, and falsy inputs.
Tests: 80/80 pass (72 existing + 7 identity + 1 marker-shape).
After the marker/preamble/identity fixes earlier in this PR, Claude
Code v2.1.114 still confidently narrated the wrong workspace contents:
when the user was sitting in `~/IdeaProjects/flux-panel` (12+ files
including `flux-panel.zip`) Opus would issue an LS, get the real
listing back, and then describe the workspace as containing exactly
`src/index.js`, `package.json`, `README.md`, `.gitignore` — the
canonical empty-npm-init shape, which is what `/tmp/windsurf-workspace`
looks like.
Confirmed root cause from the proxy probe logs:
Probe[56jqpl] turns=16 chars=37283
msg[0] role=system len=28775 ← Claude Code system prompt
msg[1] role=user len=7762 ← <system-reminder> + <env> + question
msg[2] role=assistant len=44 ← "I need to understand the workspace"
The caller's `<env>` block (containing `Working directory:
/Users/.../flux-panel`) IS forwarded to Cascade — client.js prepends
the role=system text to the user-message text — but Cascade's planner
system prompt is structurally more authoritative to the upstream
model than user-message text, and the planner's prompt asserts that
the workspace is /tmp/windsurf-workspace. The model picks Cascade's
"authoritative" workspace over the merely-prepended-as-text caller
env, and the LS / Read tool calls land on the wrong path.
Telltale: the screenshot of Opus' reply showed `当前 … 只有` — the
ellipsis is OUR redaction marker (PR dwgx#54), proof that Cascade narrated
/tmp/windsurf-workspace and we scrubbed it, leaving the model
reasoning about its prior belief about that scratch dir.
Fix: lift the canonical Claude Code <env> keys (Working directory,
Is directory a git repo, Platform, OS Version) out of the caller's
messages and re-emit them as an "Authoritative environment for this
session" block at the very top of the proto-level
tool_calling_section override. tool_calling_section IS the
authoritative system slot the upstream model reads, so the env block
sits in a slot that Cascade's planner prior cannot override. The
surrounding wording in buildToolPreambleForProto explicitly tells
the model to prefer this environment over any prior workspace
assumption.
Implementation:
- extractCallerEnvironment(messages) in chat.js. Lenient regex
parser, anchored to start-of-line so it never false-positives on
prose mentions of "working directory". Matches Claude Code's
`<env>` block (in role=system or in `<system-reminder>` user
blocks), the loose `cwd:` form some agents emit, and content-
block array text. Rejects empty / control-char / U+2026-only
values so a stale redacted marker can never re-enter the system
prompt.
- buildToolPreambleForProto now takes an optional `environment`
third argument. When non-empty, the preamble starts with:
## Authoritative environment for this session
The facts below are provided by the calling agent and
describe the REAL execution context. Tool calls MUST operate
on these paths. Ignore any workspace path you may have
inferred from earlier instructions or training priors.
- Working directory: …
- Is the directory a git repo: …
- Platform: …
- OS version: …
---
followed by the existing protocol header / behaviour suffix /
`### ToolName` schema blocks. When no env is provided the output
shape is byte-identical to PR dwgx#54.
- chat.js wires extractCallerEnvironment into the existing
buildToolPreambleForProto call site behind the same emulateTools
guard, so non-tool-emulated requests are unaffected.
Tests: 92/92 pass (80 from earlier in this PR + 8 extractCallerEnvironment
parser tests + 4 buildToolPreambleForProto-with-env tests asserting
ordering, opt-in shape, and back-compat).
The earlier extractCallerEnvironment regex required the canonical
<env> block format (Working directory: /path on its own line with
: or = separator). Claude Code v2.1.114 dropped that block in
favour of inline prose:
"You are an interactive agent that helps users with software
engineering tasks and the current working directory is /Users/..."
Diagnosed via a one-shot probe diagnostic: when extractCallerEnvironment
returns empty under emulateTools, log the nearest env-shaped substring
in the request so we can immediately see what wording the caller used.
The probe surfaced:
env NOT lifted (extractor returned empty); nearest env-shaped
substring: tware engineering tasks and the current working directory
Fix: accept TWO forms in the cwd regex —
(a) line-anchored key/value (older Claude Code, opencode, Cline):
Working directory: /path
cwd: /path
<cwd> /path
(now also tolerates "- "/"* " bullet prefix and backticks
around the path)
(b) prose form (Claude Code 2.1+):
...current working directory is /path
(no separator, no newline anchor — the path simply trails the
phrase)
The captured slot is locked to [/~]... so abstract prose mentions
("the working directory you choose", "the working directory in the
docs") never produce a match — the captured group must look like an
actual filesystem path.
The other env keys (Is directory a git repo, Platform, OS Version)
keep the line-anchored form but pick up the same bullet-prefix
tolerance for consistency.
Capture-group handling in the loop is updated to read whichever
alternative captured (match[1] || match[2]) so both forms work.
The probe diagnostic itself is kept in the commit because it remains
useful — the next time a client changes its env wording, the log line
will surface the new form within seconds instead of hours of
guesswork.
Tests: 94/94 pass. New cases:
- cwd from Claude Code 2.1+ prose form
- cwd from prose form with backticks around the path
- "abstract prose without an actual path" (negative case)
Live regression on Claude Code v2.1.114: the new prose-form regex
correctly rejects the boilerplate "...the current working directory."
mention (no path follows), so cwd extraction returns empty. Platform
and OS Version lifting still succeed, so the env block was rendered
with ONLY:
## Authoritative environment for this session
Tool calls MUST operate on these paths. Ignore any workspace path
you may have inferred from earlier instructions or training priors.
- Platform: darwin
- OS version: Darwin 25.4.0
Opus 4.7 saw an authoritative-shaped block whose body is operationally
useless (no path to operate on) and pattern-matched it as injection
noise:
"It looks like the conversation context got mixed up — the message
I received is a system prompt for Claude Code along with truncated
tool output from a previous session, rather than an actual question
or task from you."
The model then refused to proceed.
Fix: gate the entire env block on cwd presence. Platform/OS/git status
without cwd are useless for the original goal (tell the model where to
run tools); emitting them anyway turns the env block into noise that
trips Opus' injection guard. The rule "no cwd → no block" both removes
the noise and lets the model learn cwd via its own `pwd` tool call,
which already works on every Anthropic-format client we have tested
(verified live: Bash(pwd && ls -la) returned the real
/Users/jaxyu/IdeaProjects/flux-panel listing on the same trace).
Tests: 94/94 pass.
This is the actual root cause of the "context got truncated", 40KB-of-
rambling, max_wait=180s, toolCalls=0 failure mode that has been chasing
us across Claude Code v2.1.114 / Opus 4.7 sessions.
Background. We emit OpenAI-style tools[] to Cascade by:
1. proto-level: tool_calling_section + additional_instructions_section
overrides carrying the full protocol + per-tool schemas
(authoritative).
2. user-message fallback: a compact "Tools available this turn: A, B,
… To call one, emit <tool_call>…</tool_call>" banner prepended to
the LAST user message, for models that ignore proto overrides on
turn 1.
normalizeMessagesForCascade also rewrites role:'tool' messages into
role:'user' wrappers shaped `<tool_result tool_call_id="…">…</tool_result>`
so the model sees its tool history in a single conversational format.
The bug. In multi-turn tool-using conversations, by turn 3+ the LAST
user message is *always* a synthetic <tool_result> wrapper, never the
caller's real user input. The fallback loop happily prepends the
"Tools available this turn: …" banner to that wrapper anyway, so the
model sees:
Tools available this turn: Bash, Read, Edit, … To call one, emit
<tool_call>{"name":"…","arguments":{…}}</tool_call>. After the last
<tool_call>, stop generating; the caller returns results in the
next turn as <tool_result tool_call_id="…">…</tool_result>.
<tool_result tool_call_id="call_42">[real tool output]</tool_result>
Opus 4.7 reliably pattern-matches this as either prompt injection or
a truncated conversation: the banner has the SHAPE of an opening
"here are your tools" announcement but it's appearing in the middle
of an active tool-using session, immediately in front of a tool_result.
The model concludes the conversation is corrupted and refuses to keep
calling tools.
Live trace (Claude Code v2.1.114, Opus 4.7, 26 tools, real session
debugging the flux-panel zip):
turn= 2 Cascade done {ms= 5051 toolCalls=0 textLen=147}
turn= 4 Cascade done {ms= 5045 toolCalls=0 textLen=147}
turn= 6 Cascade done {ms= 5041 toolCalls=0 textLen=147}
turn= 8 Cascade done {ms= 5545 toolCalls=0 textLen=147}
…
turn=22 Cascade done {ms=180163 toolCalls=0 textLen=42474
reason=max_wait polls=357}
Three minutes of confused prose, zero tool calls, then max_wait. The
visible output to the user was "It looks like the conversation got
truncated and I'm seeing fragments of tool output without a clear
current request from you." — exact words.
Fix. Skip the user-message preamble injection when the target user
message starts with `<tool_result`. The proto-level overrides already
carry the authoritative protocol; the fallback only exists to bootstrap
the very first turn for models that ignore proto. Replaying it on every
later turn is pure noise, and (now confirmed) actively harmful.
Tests (97 → 97 still pass; 3 new regression cases):
- injects preamble on a first-turn real user message
- does NOT inject preamble when the last user message is a synthetic
tool_result
- still injects on the latest real user turn even when older turns
contain tool_results
Stop guessing. When the polling loop times out instead of seeing a clean idle_done, the model has been generating tokens for ~3 minutes straight without yielding a stop signal. We have no idea what those tokens look like — the existing summary just shows textLen=34201 and moves on. Dumping a bounded head + tail of the accumulated text gives us the shape of the failure (generation loop? rejected tool-call format? mid-thought truncation?) in a single log line, which is what we need to actually fix the underlying cause instead of chasing preamble shapes.
Closed
dwgx
added a commit
that referenced
this pull request
Apr 25, 2026
…ly-zhang to S+ - baily-zhang PR #61 (Opus 4.7 multimodal context bloat) — third major contribution after #36 and #45, now de-facto maintainer of the reuse-fingerprint / trajectory-offset machinery - abwuge PR #58 (docker/nginx deploy fix) — first-time contributor, +3/-2 surgical, unblocked the docker-compose Restart loop - aict666 PR #54 (tool preamble slimming + redact marker 6th-gen U+2026 ellipsis + identity coverage extension) — fourth major contribution - aict666 PR #53 (redact marker shell-safety regression) — second contribution, was missing from the prior credits update - baily-zhang upgraded from S to S+ (parity with aict666)
dwgx
added a commit
that referenced
this pull request
Apr 25, 2026
The Pages site at dwgx.github.io/WindsurfAPI/ had only 4 names listed in the footer (dd373156, colin1112a, motto1, youfak). 8 contributors were missing from the public site even though most of them landed S+/S level fixes (aict666 #44/#51/#53/#54, baily-zhang #36/#45/#61, smeinecke #43, abwuge #58). Adds a dedicated `#contributors` section before the footer with one card per contributor: avatar, GitHub link, weight badge (S+/S/A+/A/B+), PR list, and a one-paragraph 繁體中文 description of what each fix actually solved. Cards reuse the existing panel-card warm/coral palette to fit the site's aesthetic. Footer one-liner is also expanded to all 8 names ordered by weight, with a "完整名單 ↑" anchor back to the new section. CSS additions: contrib-grid, contrib-card, contrib-avatar, contrib-weight + 5 weight-tier classes (-S-plus, -S, -A-plus, -A, -B-plus). All gradient/hover behaviour matches the existing panel-card styling.
Closed
dwgx
added a commit
that referenced
this pull request
Apr 25, 2026
baily-zhang's fourth landed PR (#36 / #45 / #61 / #62) — adding the PR #62 entry to the dashboard credits panel as a separate card so the inline-script regression-test win is visible alongside the cascade-reuse machinery work. v2.0.5 covers everything since 2.0.4: - aict666 #54 tool preamble slim + redact U+2026 - abwuge #58 docker/nginx deploy fix - baily #61 Opus 4.7 multimodal context bloat - baily #62 dashboard escape regression - own commits: empty-message validator, internal_error backoff, upstream_transient_error category, Opus 4.6 reuse widening, /v1/responses endpoint for Codex CLI compatibility (#56, #63)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent regressions were stacking on top of each other and collectively prevented Opus-class clients (Claude Code v2.1.x in particular) from completing tool-emulated turns through the proxy. Bundled into one PR because they all surface as the same user-visible symptom ("Opus refuses to call tools / wastes 3 turns on broken paths") and the regression tests for each one only make sense in the presence of the others.
Reproducer: send
检查一下,所有分支里。有没有按照 压缩包的方案 处理过一版from Claude Code v2.1.114 (Opus 4.7) routed through the proxy. Before this patch Opus replies withThe user's message appears to contain a pasted system prompt for a different agent (Claude Code) along with what looks like a stray tool error result. After this patch Opus callspwdthengit branch -adirectly.1. Tool preamble was shaped like a Claude Code system prompt
buildToolPreambleemitted ~1600 chars of### ToolName / parameters schema: / \``json {…}```blocks prepended to the user turn. That is the exact signature of Claude Code's own system prompt, so when the same shape appeared in a **user** slot Opus 4.7 reliably flagged it as"a pasted Claude Code system prompt in the user turn"and refused to emit any<tool_call>` blocks.Fix: rewrite the user-message fallback to a single ~330-char line — protocol description plus tool name list, no schemas. Full schemas still go through the authoritative proto-level
tool_calling_sectionoverride (buildToolPreambleForProto, unchanged). #22 NO_TOOL-mode models that ignoreSectionOverridestill see the tool names and the emission protocol, so the fallback continues to do its job.New regression asserts the preamble:
^###headers, noparameters schema:, no fenced ```json blocks<tool_call>protocol2. Redaction marker kept getting reused by the model as a real path
Every prose-shaped
REDACTED_PATHmarker we have ever shipped has regressed:./tail./src/main.py→ ENOENT → loop[internal]ls [internal]→ ENOENT → loop<redacted-path>(internal path redacted)cd (…)as glob-qualifier syntax → crypticunknown file attribute: i→ Opus gets confused, stops calling toolsredacted internal pathcd redacted internal path && git…→cd: too many arguments→ wastes 2-3 turns before finally callingpwdCommon cause: any marker that contains ASCII word characters reads as a plausible path or identifier to the model, and the model puts it back into the next
tool_call.Fix: switch
REDACTED_PATHto a single Unicode ellipsis (U+2026,…).cd …orRead("…")as a real operation, so the reuse-as-path loop never startsThe new marker-shape regression suite makes future regressions impossible by asserting:
()[]{}<>|&;$`\"'\\*?/and no\\[A-Za-z0-9_]— closes the door on every prose-shaped marker we have already regressed through.3.
neutralizeCascadeIdentitymissed common Cascade-ismsExisting patterns covered
I am Cascade/I'm Cascade/developed by Codeiumbut missed other common third-person leaks the upstream planner produces. Added:Cascade is an? (?:AI )?(?:coding )?assistant(?:As|Acting as) Cascade(?:Codeium|Windsurf)('s)? Cascadebuilt by (?:Codeium|Windsurf)(?:the )?Cascade('s)? workspace→the workspace(the leading article is consumed by the same regex so we never produce the
the the workspacedouble-article artefact)Exported
neutralizeCascadeIdentityfromchat.js(matches the existingshouldUseCascadeReuseexport pattern) and added 7 unit tests covering first-person, third-person, provider attribution, workspace narration, the dictionary-word negative case (a cascade of rocks), unknown-provider passthrough, and falsy inputs.Tests
npm test— 80/80 pass (72 existing + 7 identity + 1 marker-shape).Risk
Low. All three changes are purely cosmetic on the wire — the protocol contract, parser, streaming pipeline, tool-result round-trip, and proto-level injection are untouched. Behaviour change is strictly less aggressive: the preamble is gentler, the marker is shorter, the identity rewrites are more thorough. No new dependencies, no API changes.