Summary
When the model issues multiple tool calls in parallel and then composes its final answer, it can bind each tool's result to the wrong call. In the reproduction below, two parallel weather_report calls each return correctly, but the model swaps the two results in its final summary — reporting City A's weather for City B and vice versa. The error is silent: the output is well-formed and internally self-consistent, just factually wrong.
Environment
- Model:
MiniMax-M3 (tested in the Playground)
- Decoding:
temperature=1, top_p=1, max_tokens=2048, response_format=text
- Tools: single function
weather_report({ city: string }), served over MCP
- Tool-calling mode: parallel
System prompt
<identity>
You're Tara, a helpful assistant that can utilize the tools provided to answer questions and help with tasks.
</identity>
Context
{
"model": "MiniMax-M3",
"messages": [
{
"role": "system",
"content": "<identity>\nYou're Tara, a helpful assistant that can utilize the tools provided to answer questions and help with tasks.\n</identity>"
},
{
"role": "user",
"content": "What's the weather like in Shanghai and Beijing?"
},
{
"role": "assistant",
"content": "<think>\nThe user wants weather for two cities. I can call both weather tools in parallel since they are independent.\n</think>\n",
"name": "MiniMax AIMiniMax AIMiniMax AIMiniMax AIMiniMax AI",
"audio_content": "",
"tool_calls": [
{
"id": "call_function_r5r5j66p96o7_1",
"type": "function",
"function": {
"name": "weather_report",
"arguments": "{\"city\": \"Shanghai\"}"
}
},
{
"id": "call_function_r5r5j66p96o7_2",
"type": "function",
"function": {
"name": "weather_report",
"arguments": "{\"city\": \"Beijing\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_function_r5r5j66p96o7_2",
"content": "Sunny, 21-29 degrees"
},
{
"role": "tool",
"tool_call_id": "call_function_r5r5j66p96o7_1",
"content": "Cloudy, 17-22 degrees"
},
{
"role": "assistant",
"content": "Here's the current weather in both cities:\n\n- **Shanghai**: ☀️ Sunny, with temperatures ranging from **21–29°C**\n- **Beijing**: ☁️ Cloudy, with temperatures ranging from **17–22°C**\n\nShanghai is warmer and sunny, while Beijing is cooler and overcast. Let me know if you'd like more details or a forecast!",
"name": "MiniMax AIMiniMax AIMiniMax AI",
"audio_content": ""
}
],
"tools": [
{
"type": "function",
"function": {
"name": "weather_report",
"description": "Retrieves a weather report for a specified city. Use this when you need current weather conditions or a concise weather summary for a known city name. Do not use this when geographic coordinates, historical weather data, weather alerts, or multi-day forecasts are required unless those capabilities are explicitly supported elsewhere.",
"strict": true,
"parameters": {
"type": "object",
"required": [
"city"
],
"properties": {
"city": {
"type": "string",
"description": "The name of the city for which to retrieve the weather report"
}
},
"additionalProperties": false
}
}
}
]
}
Screenshot
Steps to reproduce
- Send user message:
What's the weather like in Shanghai and Beijing?
- The model emits a
<think> block: "The user wants weather for two cities. I can call both weather tools in parallel since they are independent."
- The model issues two parallel
weather_report calls and receives both results.
- The model composes a final natural-language answer.
Trace
Tool calls and results (ground truth from the tool):
| # |
Tool call |
Result returned |
| 1 |
weather_report({ "city": "Shanghai" }) |
Cloudy, 17–22 degrees |
| 2 |
weather_report({ "city": "Beijing" }) |
Sunny, 21–29 degrees |
Correct mapping:
- Shanghai → Cloudy, 17–22°C
- Beijing → Sunny, 21–29°C
Model's final answer:
- Shanghai: ☀️ Sunny, with temperatures ranging from 21–29°C
- Beijing: ☁️ Cloudy, with temperatures ranging from 17–22°C
Shanghai is warmer and sunny, while Beijing is cooler and overcast.
Expected behavior
The final answer should reflect each tool result bound to the call that produced it:
- Shanghai → Cloudy, 17–22°C
- Beijing → Sunny, 21–29°C
Actual behavior
The two results are swapped. Shanghai is reported with Beijing's result and Beijing with Shanghai's result. The model then adds a (wrong) summarizing sentence that is internally consistent with the swapped data, so nothing in the output flags the error.
Behaviors in other models
- DeepSeek v3.2 also has the same problem, however it has been fixed in both DeepSeek v4 Flash and Pro.
- Doubao v1.8 has this problem too, and now it has been solved in the v2.x models.
Likely root cause
The model appears to associate parallel tool results positionally / by completion (arrival) order rather than by tool_call_id. When parallel calls resolve out of issue-order — common over MCP — the result blocks get attached to the wrong originating call. The distinguishing city argument from each call is not being used to re-anchor the corresponding result.
Impact
- High severity for any agent that uses parallel tool calls: results are confidently mis-attributed with no error signal.
- The failure is data-level (valid format, wrong content), so it passes schema/format validation and can ship to production unnoticed.
- Affects exactly the case parallel calling is meant to optimize: multi-entity queries (multiple cities, users, tickets, files, rows, etc.).
- Particularly risky for single-tool / multi-call fan-out (same tool name, different arguments), where the shared tool name likely increases the chance of confusion.
Suggested checks
- Confirm the model binds each tool result to its originating
tool_call_id, not to position or arrival order.
- Add an eval that fires N parallel calls with distinguishable results and asserts the correct argument→result mapping, including when results are returned in shuffled order.
- Specifically cover the same-tool / different-args fan-out pattern (N parallel calls to one tool name).
Workarounds (for users, until fixed)
- Have tools echo their key arguments in the result payload, e.g.
{"city": "Shanghai", "weather": "Cloudy, 17–22°C"}, so the model has an in-band anchor instead of relying on call ordering.
Summary
When the model issues multiple tool calls in parallel and then composes its final answer, it can bind each tool's result to the wrong call. In the reproduction below, two parallel
weather_reportcalls each return correctly, but the model swaps the two results in its final summary — reporting City A's weather for City B and vice versa. The error is silent: the output is well-formed and internally self-consistent, just factually wrong.Environment
MiniMax-M3(tested in the Playground)temperature=1,top_p=1,max_tokens=2048,response_format=textweather_report({ city: string }), served over MCPSystem prompt
Context
{ "model": "MiniMax-M3", "messages": [ { "role": "system", "content": "<identity>\nYou're Tara, a helpful assistant that can utilize the tools provided to answer questions and help with tasks.\n</identity>" }, { "role": "user", "content": "What's the weather like in Shanghai and Beijing?" }, { "role": "assistant", "content": "<think>\nThe user wants weather for two cities. I can call both weather tools in parallel since they are independent.\n</think>\n", "name": "MiniMax AIMiniMax AIMiniMax AIMiniMax AIMiniMax AI", "audio_content": "", "tool_calls": [ { "id": "call_function_r5r5j66p96o7_1", "type": "function", "function": { "name": "weather_report", "arguments": "{\"city\": \"Shanghai\"}" } }, { "id": "call_function_r5r5j66p96o7_2", "type": "function", "function": { "name": "weather_report", "arguments": "{\"city\": \"Beijing\"}" } } ] }, { "role": "tool", "tool_call_id": "call_function_r5r5j66p96o7_2", "content": "Sunny, 21-29 degrees" }, { "role": "tool", "tool_call_id": "call_function_r5r5j66p96o7_1", "content": "Cloudy, 17-22 degrees" }, { "role": "assistant", "content": "Here's the current weather in both cities:\n\n- **Shanghai**: ☀️ Sunny, with temperatures ranging from **21–29°C**\n- **Beijing**: ☁️ Cloudy, with temperatures ranging from **17–22°C**\n\nShanghai is warmer and sunny, while Beijing is cooler and overcast. Let me know if you'd like more details or a forecast!", "name": "MiniMax AIMiniMax AIMiniMax AI", "audio_content": "" } ], "tools": [ { "type": "function", "function": { "name": "weather_report", "description": "Retrieves a weather report for a specified city. Use this when you need current weather conditions or a concise weather summary for a known city name. Do not use this when geographic coordinates, historical weather data, weather alerts, or multi-day forecasts are required unless those capabilities are explicitly supported elsewhere.", "strict": true, "parameters": { "type": "object", "required": [ "city" ], "properties": { "city": { "type": "string", "description": "The name of the city for which to retrieve the weather report" } }, "additionalProperties": false } } } ] }Screenshot
Steps to reproduce
What's the weather like in Shanghai and Beijing?<think>block: "The user wants weather for two cities. I can call both weather tools in parallel since they are independent."weather_reportcalls and receives both results.Trace
Tool calls and results (ground truth from the tool):
weather_report({ "city": "Shanghai" })Cloudy, 17–22 degreesweather_report({ "city": "Beijing" })Sunny, 21–29 degreesCorrect mapping:
Model's final answer:
Expected behavior
The final answer should reflect each tool result bound to the call that produced it:
Actual behavior
The two results are swapped. Shanghai is reported with Beijing's result and Beijing with Shanghai's result. The model then adds a (wrong) summarizing sentence that is internally consistent with the swapped data, so nothing in the output flags the error.
Behaviors in other models
Likely root cause
The model appears to associate parallel tool results positionally / by completion (arrival) order rather than by
tool_call_id. When parallel calls resolve out of issue-order — common over MCP — the result blocks get attached to the wrong originating call. The distinguishingcityargument from each call is not being used to re-anchor the corresponding result.Impact
Suggested checks
tool_call_id, not to position or arrival order.Workarounds (for users, until fixed)
{"city": "Shanghai", "weather": "Cloudy, 17–22°C"}, so the model has an in-band anchor instead of relying on call ordering.