Skip to content

Parallel tool calls: model mis-attributes (swaps) tool results in the final response #1

@MagicCube

Description

@MagicCube

Summary

When the model issues multiple tool calls in parallel and then composes its final answer, it can bind each tool's result to the wrong call. In the reproduction below, two parallel weather_report calls each return correctly, but the model swaps the two results in its final summary — reporting City A's weather for City B and vice versa. The error is silent: the output is well-formed and internally self-consistent, just factually wrong.

Environment

  • Model: MiniMax-M3 (tested in the Playground)
  • Decoding: temperature=1, top_p=1, max_tokens=2048, response_format=text
  • Tools: single function weather_report({ city: string }), served over MCP
  • Tool-calling mode: parallel

System prompt

<identity>
You're Tara, a helpful assistant that can utilize the tools provided to answer questions and help with tasks.
</identity>

Context

{
  "model": "MiniMax-M3",
  "messages": [
    {
      "role": "system",
      "content": "<identity>\nYou're Tara, a helpful assistant that can utilize the tools provided to answer questions and help with tasks.\n</identity>"
    },
    {
      "role": "user",
      "content": "What's the weather like in Shanghai and Beijing?"
    },
    {
      "role": "assistant",
      "content": "<think>\nThe user wants weather for two cities. I can call both weather tools in parallel since they are independent.\n</think>\n",
      "name": "MiniMax AIMiniMax AIMiniMax AIMiniMax AIMiniMax AI",
      "audio_content": "",
      "tool_calls": [
        {
          "id": "call_function_r5r5j66p96o7_1",
          "type": "function",
          "function": {
            "name": "weather_report",
            "arguments": "{\"city\": \"Shanghai\"}"
          }
        },
        {
          "id": "call_function_r5r5j66p96o7_2",
          "type": "function",
          "function": {
            "name": "weather_report",
            "arguments": "{\"city\": \"Beijing\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_function_r5r5j66p96o7_2",
      "content": "Sunny, 21-29 degrees"
    },
    {
      "role": "tool",
      "tool_call_id": "call_function_r5r5j66p96o7_1",
      "content": "Cloudy, 17-22 degrees"
    },
    {
      "role": "assistant",
      "content": "Here's the current weather in both cities:\n\n- **Shanghai**: ☀️ Sunny, with temperatures ranging from **21–29°C**\n- **Beijing**: ☁️ Cloudy, with temperatures ranging from **17–22°C**\n\nShanghai is warmer and sunny, while Beijing is cooler and overcast. Let me know if you'd like more details or a forecast!",
      "name": "MiniMax AIMiniMax AIMiniMax AI",
      "audio_content": ""
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "weather_report",
        "description": "Retrieves a weather report for a specified city. Use this when you need current weather conditions or a concise weather summary for a known city name. Do not use this when geographic coordinates, historical weather data, weather alerts, or multi-day forecasts are required unless those capabilities are explicitly supported elsewhere.",
        "strict": true,
        "parameters": {
          "type": "object",
          "required": [
            "city"
          ],
          "properties": {
            "city": {
              "type": "string",
              "description": "The name of the city for which to retrieve the weather report"
            }
          },
          "additionalProperties": false
        }
      }
    }
  ]
}

Screenshot

Image

Steps to reproduce

  1. Send user message: What's the weather like in Shanghai and Beijing?
  2. The model emits a <think> block: "The user wants weather for two cities. I can call both weather tools in parallel since they are independent."
  3. The model issues two parallel weather_report calls and receives both results.
  4. The model composes a final natural-language answer.

Trace

Tool calls and results (ground truth from the tool):

# Tool call Result returned
1 weather_report({ "city": "Shanghai" }) Cloudy, 17–22 degrees
2 weather_report({ "city": "Beijing" }) Sunny, 21–29 degrees

Correct mapping:

  • Shanghai → Cloudy, 17–22°C
  • Beijing → Sunny, 21–29°C

Model's final answer:

  • Shanghai: ☀️ Sunny, with temperatures ranging from 21–29°C
  • Beijing: ☁️ Cloudy, with temperatures ranging from 17–22°C

Shanghai is warmer and sunny, while Beijing is cooler and overcast.

Expected behavior

The final answer should reflect each tool result bound to the call that produced it:

  • Shanghai → Cloudy, 17–22°C
  • Beijing → Sunny, 21–29°C

Actual behavior

The two results are swapped. Shanghai is reported with Beijing's result and Beijing with Shanghai's result. The model then adds a (wrong) summarizing sentence that is internally consistent with the swapped data, so nothing in the output flags the error.

Behaviors in other models

  • DeepSeek v3.2 also has the same problem, however it has been fixed in both DeepSeek v4 Flash and Pro.
  • Doubao v1.8 has this problem too, and now it has been solved in the v2.x models.

Likely root cause

The model appears to associate parallel tool results positionally / by completion (arrival) order rather than by tool_call_id. When parallel calls resolve out of issue-order — common over MCP — the result blocks get attached to the wrong originating call. The distinguishing city argument from each call is not being used to re-anchor the corresponding result.

Impact

  • High severity for any agent that uses parallel tool calls: results are confidently mis-attributed with no error signal.
  • The failure is data-level (valid format, wrong content), so it passes schema/format validation and can ship to production unnoticed.
  • Affects exactly the case parallel calling is meant to optimize: multi-entity queries (multiple cities, users, tickets, files, rows, etc.).
  • Particularly risky for single-tool / multi-call fan-out (same tool name, different arguments), where the shared tool name likely increases the chance of confusion.

Suggested checks

  • Confirm the model binds each tool result to its originating tool_call_id, not to position or arrival order.
  • Add an eval that fires N parallel calls with distinguishable results and asserts the correct argument→result mapping, including when results are returned in shuffled order.
  • Specifically cover the same-tool / different-args fan-out pattern (N parallel calls to one tool name).

Workarounds (for users, until fixed)

  • Have tools echo their key arguments in the result payload, e.g. {"city": "Shanghai", "weather": "Cloudy, 17–22°C"}, so the model has an in-band anchor instead of relying on call ordering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions