Skip to content

persona/cognition: migrate tool agent loop to Rust (replaces XML-strip workaround) #969

@joelteply

Description

@joelteply

Context

PR #950 issue #75 (d): non-Anthropic-protocol inference paths (cloud routing, DMR, llama.cpp without structured tool_calls support) cause models to emit raw <tool_use>...</tool_use> XML directly in user-visible chat. Anvil + bigmama-wsl confirmed across Mac/MPS + Linux/CUDA. Bonus failure: model also fabricates <tool_result>...</tool_result> envelopes inline with hallucinated content.

Ship-now mitigation landed in b7383d4: PersonaResponseGenerator strips both XML envelopes before posting + logs when stripping fires. This makes chat clean but doesn't actually invoke the tools the model was trying to call. The user-experienced symptom (fabricated tool RESULT in surrounding prose, e.g. "I read the file. It contains foo bar." when the model never read anything) persists in the surrounding prose because the strip only removes the structured envelopes.

What's needed

The proper fix is the Rust cognition tool-loop migration:

  1. Detect tool calls in the model's emitted text using existing AgentToolExecutor.parseResponse() (Rust IPC, sub-microsecond — RustCoreIPCClient.toolParsingParse). This already exists.
  2. Actually invoke each detected tool through the existing PersonaToolExecutor.executeTools().
  3. Feed real tool results back to the model in a second-pass inference (the "tool agent loop" — currently lives in AgentToolExecutor but isn't wired into the cognition path).
  4. Final response is the post-loop text → already clean of tool_use/tool_result XML by virtue of structured execution.

Why deferred from #950

  • Tool agent loop migration is non-trivial scope (touches PersonaResponseGenerator, RustCognitionBridge, and the Rust cognition crate's response::respond entry).
  • Risk-cost on a near-merge PR is high.
  • The strip workaround removes the visible XML noise without behavior change.

Files involved

  • src/system/user/server/modules/PersonaResponseGenerator.ts:639-688 — current strip-then-post path. Replace strip block with: parse → execute → second-pass inference → post.
  • src/system/user/server/modules/PersonaToolExecutor.ts:485-502 — existing parseResponse + stripToolBlocks are the right primitives; just need to wire executeTools into the response path.
  • src/system/tools/server/AgentToolExecutor.ts:300-358 — has parseToolCalls, stripToolBlocks, parseResponse (Rust-accelerated). Reuse.
  • workers/continuum-core/src/persona/response.rs (or equivalent) — Rust side: pull tool-execution into personaRespond so the cognition crate owns the full loop.

Acceptance

  • Persona using a non-tool-call-native model (e.g. qwen3.5 via DMR) emits a tool call, the system invokes the tool, real result returns, model second-pass uses the result, posted message reflects real data.
  • The strip-fallback log (✂️ ${persona}: stripped N leaked tool-call XML block(s) ...) effectively never fires for properly-routed personas.
  • The strip remains as a belt for unexpected leak modes (defense-in-depth) but counter stays near zero.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions