[BUG] DeepSeek Web Cookie Provider fails to translate Tool Calls (Function Calling) back to OpenAI format #2794

xspylol · 2026-05-27T15:49:47Z

xspylol
May 27, 2026

When using the DeepSeek Web Cookie provider as a custom endpoint for an autonomous agent (Hermes Agent), the model does not execute tools. Instead, it outputs the tool calls as raw markdown text (e.g., typing bash ...).

Expected Behavior:

When the client sends an OpenAI-compatible request with a tools array, the OmniRoute adapter should:
Inject the tool schemas into the DeepSeek Web UI's system prompt (since the web UI doesn't natively support the tools JSON parameter).
Parse the model's text output (which likely uses XML tags or specific markdown blocks for tool use).
Translate that text back into a standard OpenAI tool_calls JSON array before returning the response to the client.

Actual Behavior:

OmniRoute appears to be stripping the tools array from the request and returning the model's raw text response. Because the client (Hermes) never receives a structured tool_calls object, it treats the response as a standard conversational message and prints the bash command to the screen instead of executing it.

Environment:

Client: Hermes Agent (sending standard OpenAI tools array)
OmniRoute Provider: DeepSeek Web Cookie
Symptom: Tool calls returned as raw text instead of finish_reason: "tool_calls".

diegosouzapw · 2026-05-27T20:27:46Z

diegosouzapw
May 27, 2026
Maintainer

Hey @xspylol! Confirmed -- the web-cookie variants do NOT carry the tool-call translation layer. They were built as conversational proxies (text-in / text-out against the public web chat), and the upstream UI does not natively accept the OpenAI tools array. Today the adapter strips it before forwarding, which is why your agent only sees raw markdown.

The proper fix is what you described: prompt-inject the tool schemas into the system message, then parse the model's output (DeepSeek's web frontend uses a <tool>...</tool> block on V4) back into OpenAI tool_calls JSON before returning. I'll open a tracking issue for this and link it here once filed -- the same approach will apply to the other web-cookie providers once one lands.

Workaround for now: use the API-based deepseek provider for any agentic/tool workflow (it has native function calling); reserve deepseek-web for chat-only flows. Will update here once the tracking issue is created and we have a target version.

0 replies

xspylol · 2026-05-27T22:43:11Z

xspylol
May 27, 2026
Author

Awesome, thanks for the confirmation and the detailed breakdown! That makes total sense. I'll use the official API for my Hermes agent workflows and keep the web-cookie for regular chat until the translation layer is built. Really appreciate you looking into this and opening a tracking issue for it!

0 replies

diegosouzapw · 2026-05-28T02:38:04Z

diegosouzapw
May 28, 2026
Maintainer

Tracking issue is up: #2820 — covers the bidirectional translation layer (request side: serialize OpenAI tools into a system-prompt block per upstream format; response side: parse <tool>...</tool> blocks back into OpenAI tool_calls JSON). Plan is to land deepseek-web first since its block format is the most stable, then generalize to the other 6 web-cookie providers + the upcoming Qwen/Kimi/GLM/MiMo/MiniMax web variants (#2725) automatically. Will ping here when it ships. Thanks again for the clean repro!

1 reply

xspylol May 29, 2026
Author

Hey! Following up on our conversation about adding tool-call translation to the web-cookie providers.
I was really eager to get my agent (Hermes) working with deepseek-web today, so I went ahead and built a local Python "Man-in-the-Middle" proxy to test the exact architecture you mentioned. I can confirm it works perfectly!
Here is the exact flow my proxy uses to bypass the HTTP 400 block and force tool-calling:
Intercept the client's request and strip the tools array (so the upstream web scraper accepts it).
Inject a short system prompt instructing DeepSeek to output XML {"name": "...", "arguments": {...}} or standard markdown bash blocks.
Buffer the response (force non-streaming upstream if necessary) to catch the full text.
Parse the text using Regex, extract the command, and reformat it into a standard OpenAI tool_calls JSON object.
Return the JSON (or fake a streaming delta) to the client.
Hermes instantly recognizes the JSON and executes the terminal commands flawlessly.
I've attached the ~150 line Python FastAPI script below as a working reference implementation. When you have time to tackle the umbrella issue (#2725) and build the native translation layer into OmniRoute, hopefully, this logic helps speed things up!
Thanks again for the awesome project.

The Code

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse, Response, StreamingResponse
import httpx
import json
import re
import time

app = FastAPI()
OMNIROUTE_BASE = "http://localhost:20128"

SYSTEM_INJECTION = """
You are an autonomous agent with access to terminal and file tools.
To execute a tool, you MUST output the following XML format exactly, and nothing else:

{"name": "tool_name", "arguments": {"arg1": "value1"}}

If you are not executing a tool, just respond normally.
"""

@app.post("/v1/chat/completions")
async def proxy_request(request: Request):
body = await request.json()
has_tools = "tools" in body and body["tools"]
client_wants_stream = body.get("stream", False)

if has_tools:
    if "messages" not in body: body["messages"] = []
    if body["messages"] and body["messages"][0]["role"] == "system":
        body["messages"][0]["content"] += "\n\n" + SYSTEM_INJECTION
    else:
        body["messages"].insert(0, {"role": "system", "content": SYSTEM_INJECTION})
    
    del body["tools"]
    if "tool_choice" in body: del body["tool_choice"]
    
    # CRITICAL FIX: Force upstream to be non-streaming so we can parse the whole text at once
    body["stream"] = False 

async with httpx.AsyncClient() as client:
    resp = await client.post(f"{OMNIROUTE_BASE}/v1/chat/completions", json=body, timeout=180.0)
    
# Buffer the full response text regardless of what OmniRoute sent
full_content = ""
content_type = resp.headers.get("content-type", "")

if "text/event-stream" in content_type:
    for line in resp.text.splitlines():
        if line.startswith("data: "):
            chunk_data = line[6:]
            if chunk_data == "[DONE]": break
            try:
                chunk_json = json.loads(chunk_data)
                delta = chunk_json.get("choices", [{}])[0].get("delta", {})
                if "content" in delta and delta["content"]:
                    full_content += delta["content"]
            except: pass
else:
    try:
        data_json = resp.json()
        full_content = data_json.get("choices", [{}])[0].get("message", {}).get("content", "") or ""
    except:
        full_content = resp.text

# Parse the tool
tool_json = None
match = re.search(r'<tool>\s*(\{.*?\})\s*</tool>', full_content, re.DOTALL)
if match:
    try: tool_json = json.loads(match.group(1))
    except: pass
    
if not tool_json:
    bash_match = re.search(r'```(?:bash|sh|shell)\n(.*?)```', full_content, re.DOTALL)
    if bash_match: 
        tool_json = {"name": "terminal", "arguments": {"command": bash_match.group(1).strip()}}

# Format response for Hermes
if client_wants_stream:
    async def fake_stream():
        if tool_json:
            # Fake a streaming tool call delta
            tool_call_delta = {
                "id": "call_fix_001", "type": "function",
                "function": {"name": tool_json.get("name", "terminal"), "arguments": json.dumps(tool_json.get("arguments", {}))}
            }
            yield f"data: {json.dumps({'id': 'chatcmpl-proxy', 'object': 'chat.completion.chunk', 'created': int(time.time()), 'model': body.get('model', 'deepseek-web'), 'choices': [{'index': 0, 'delta': {'tool_calls': [tool_call_delta]}, 'finish_reason': None}]})}\n\n"
            yield f"data: {json.dumps({'id': 'chatcmpl-proxy', 'object': 'chat.completion.chunk', 'created': int(time.time()), 'model': body.get('model', 'deepseek-web'), 'choices': [{'index': 0, 'delta': {}, 'finish_reason': 'tool_calls'}]})}\n\n"
            yield "data: [DONE]\n\n"
        else:
            # No tool call, just stream the text back in chunks
            chunk_size = 20
            for i in range(0, len(full_content), chunk_size):
                delta_content = full_content[i:i+chunk_size]
                yield f"data: {json.dumps({'id': 'chatcmpl-proxy', 'object': 'chat.completion.chunk', 'created': int(time.time()), 'model': body.get('model', 'deepseek-web'), 'choices': [{'index': 0, 'delta': {'content': delta_content}, 'finish_reason': None}]})}\n\n"
            yield f"data: {json.dumps({'id': 'chatcmpl-proxy', 'object': 'chat.completion.chunk', 'created': int(time.time()), 'model': body.get('model', 'deepseek-web'), 'choices': [{'index': 0, 'delta': {}, 'finish_reason': 'stop'}]})}\n\n"
            yield "data: [DONE]\n\n"

    return StreamingResponse(fake_stream(), media_type="text/event-stream")
    
else:
    # Client wanted standard JSON
    data = {
        "id": "chatcmpl-proxy", "object": "chat.completion", "created": int(time.time()),
        "model": body.get("model", "deepseek-web"),
        "choices": [{"index": 0, "message": {"role": "assistant", "content": full_content if not tool_json else None}, "finish_reason": "tool_calls" if tool_json else "stop"}],
        "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
    }
    if tool_json:
        data["choices"][0]["message"]["tool_calls"] = [{
            "id": "call_fix_001", "type": "function", 
            "function": {"name": tool_json.get("name", "terminal"), "arguments": json.dumps(tool_json.get("arguments", {}))}
        }]
    return JSONResponse(content=data)

@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE", "PATCH", "OPTIONS", "HEAD"])
async def catch_all(request: Request, path: str):
target_url = f"{OMNIROUTE_BASE}/{path}"
if request.url.query: target_url += f"?{request.url.query}"
headers = dict(request.headers)
headers.pop("host", None)
headers.pop("content-length", None)
async with httpx.AsyncClient() as client:
try:
resp = await client.request(request.method, target_url, headers=headers, content=await request.body(), timeout=60.0)
excluded_headers = {"content-encoding", "content-length", "transfer-encoding", "connection"}
response_headers = {k: v for k, v in resp.headers.items() if k.lower() not in excluded_headers}
return Response(content=resp.content, status_code=resp.status_code, headers=response_headers, media_type=resp.headers.get("content-type"))
except Exception as e:
return JSONResponse(content={"error": str(e)}, status_code=500)

if name == "main":
import uvicorn
print("Starting Tool-Call Translation Proxy on port 8000...")
uvicorn.run(app, host="127.0.0.1", port=8000)

diegosouzapw · 2026-05-30T12:33:33Z

diegosouzapw
May 30, 2026
Maintainer

Thanks for the working repro and the reference code, @xspylol -- I've moved the substantive discussion to your fuller write-up in #2925 so it all lives in one place. Short version: you're right that nothing shipped yet -- #2820 was closed prematurely and deepseek-web still 400s on tools, so I've reopened it. I also split out the persistent-session / rolling-window memory half (the other limitation you hit) into #2942, since it's a distinct trade-off from the current fresh-session-per-request behavior. Both credit your design + reference impl; I'll ping you on #2925 as they progress.

0 replies

diegosouzapw · 2026-06-10T13:27:00Z

diegosouzapw
Jun 10, 2026
Maintainer

Closing the loop, @xspylol -- the tool-call translation you specced (and built a reference MITM for) shipped in v3.8.9, and #2820 is now closed.

What landed, matching your design:

Request side: OpenAI tools are serialized into a <tool>...</tool> system-prompt contract instead of being stripped (open-sse/executors/deepseek-web.ts, open-sse/translator/webTools.ts).
Response side: the model's text reply is buffered and <tool> blocks are parsed back into OpenAI tool_calls JSON (parseToolCallsFromText, deepseek-web.ts:1032).
The two follow-ups you flagged also landed: persistent session + rolling-window memory ([feature] Persistent session + rolling-window memory for deepseek-web (agentic multi-turn) #2942), the API-only persistSession/historyWindow knobs got a dashboard toggle ([feature] Dashboard toggle for deepseek-web persistSession / historyWindow (v3.8.9 shipped them API-only) #3153), and the parser was hardened for bare-JSON / fuzzy tool-name matching beyond just <tool>-wrapped output ([feature] Harden web-cookie tool-call parser: bare-JSON detection + fuzzy tool-name matching #3154).

Your repro and reference impl drove all of this -- thank you. If your Hermes agent still hits anything off on deepseek-web after updating to v3.8.9+, the best place is #2925 where the broader agentic work continues.

0 replies

richardchen874-sys · 2026-06-14T07:41:19Z

richardchen874-sys
Jun 14, 2026

This is a very useful repro.

For agentic workflows, I think the important distinction is chat compatibility vs workflow compatibility. A web-cookie provider may be fine for normal chat, but once the client expects OpenAI-style tools, tool_calls, streaming deltas, and multi-step execution, the adapter has to behave much closer to a real API provider.

Your FastAPI proxy is a good proof of the architecture:

strip unsupported tools before sending upstream
inject a compact tool-use contract
buffer or normalize the model output
parse <tool> / command blocks
return OpenAI-compatible tool_calls
preserve streaming behavior for clients like Hermes

For production agent use, I’d still separate this from the official DeepSeek API path. Web-cookie translation is useful for experimentation, but long coding-agent loops expose stability issues very quickly: tool-call consistency, stale sessions, rate limits, fallback behavior, and response-format drift.

This is exactly why OpenAI-compatible providers need to be tested at the workflow level, not just with a single chat completion request.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] DeepSeek Web Cookie Provider fails to translate Tool Calls (Function Calling) back to OpenAI format #2794

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[BUG] DeepSeek Web Cookie Provider fails to translate Tool Calls (Function Calling) back to OpenAI format #2794

Uh oh!

xspylol May 27, 2026

Expected Behavior:

Actual Behavior:

Environment:

Replies: 6 comments · 1 reply

Uh oh!

diegosouzapw May 27, 2026 Maintainer

Uh oh!

xspylol May 27, 2026 Author

Uh oh!

diegosouzapw May 28, 2026 Maintainer

Uh oh!

xspylol May 29, 2026 Author

The Code

Uh oh!

diegosouzapw May 30, 2026 Maintainer

Uh oh!

diegosouzapw Jun 10, 2026 Maintainer

Uh oh!

richardchen874-sys Jun 14, 2026

xspylol
May 27, 2026

Replies: 6 comments 1 reply

diegosouzapw
May 27, 2026
Maintainer

xspylol
May 27, 2026
Author

diegosouzapw
May 28, 2026
Maintainer

xspylol May 29, 2026
Author

diegosouzapw
May 30, 2026
Maintainer

diegosouzapw
Jun 10, 2026
Maintainer

richardchen874-sys
Jun 14, 2026