Parallel tool calls: model mis-attributes (swaps) tool results in the final response

## Summary

When the model issues **multiple tool calls in parallel** and then composes its final answer, it can bind each tool's result to the **wrong call**. In the reproduction below, two parallel `weather_report` calls each return correctly, but the model **swaps the two results** in its final summary — reporting City A's weather for City B and vice versa. The error is silent: the output is well-formed and internally self-consistent, just factually wrong.

## Environment

- **Model:** `MiniMax-M3` (tested in the Playground)
- **Decoding:** `temperature=1`, `top_p=1`, `max_tokens=2048`, `response_format=text`
- **Tools:** single function `weather_report({ city: string })`, served over MCP
- **Tool-calling mode:** parallel

## System prompt

```
<identity>
You're Tara, a helpful assistant that can utilize the tools provided to answer questions and help with tasks.
</identity>
```

## Context

```json
{
  "model": "MiniMax-M3",
  "messages": [
    {
      "role": "system",
      "content": "<identity>\nYou're Tara, a helpful assistant that can utilize the tools provided to answer questions and help with tasks.\n</identity>"
    },
    {
      "role": "user",
      "content": "What's the weather like in Shanghai and Beijing?"
    },
    {
      "role": "assistant",
      "content": "<think>\nThe user wants weather for two cities. I can call both weather tools in parallel since they are independent.\n</think>\n",
      "name": "MiniMax AIMiniMax AIMiniMax AIMiniMax AIMiniMax AI",
      "audio_content": "",
      "tool_calls": [
        {
          "id": "call_function_r5r5j66p96o7_1",
          "type": "function",
          "function": {
            "name": "weather_report",
            "arguments": "{\"city\": \"Shanghai\"}"
          }
        },
        {
          "id": "call_function_r5r5j66p96o7_2",
          "type": "function",
          "function": {
            "name": "weather_report",
            "arguments": "{\"city\": \"Beijing\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_function_r5r5j66p96o7_2",
      "content": "Sunny, 21-29 degrees"
    },
    {
      "role": "tool",
      "tool_call_id": "call_function_r5r5j66p96o7_1",
      "content": "Cloudy, 17-22 degrees"
    },
    {
      "role": "assistant",
      "content": "Here's the current weather in both cities:\n\n- **Shanghai**: ☀️ Sunny, with temperatures ranging from **21–29°C**\n- **Beijing**: ☁️ Cloudy, with temperatures ranging from **17–22°C**\n\nShanghai is warmer and sunny, while Beijing is cooler and overcast. Let me know if you'd like more details or a forecast!",
      "name": "MiniMax AIMiniMax AIMiniMax AI",
      "audio_content": ""
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "weather_report",
        "description": "Retrieves a weather report for a specified city. Use this when you need current weather conditions or a concise weather summary for a known city name. Do not use this when geographic coordinates, historical weather data, weather alerts, or multi-day forecasts are required unless those capabilities are explicitly supported elsewhere.",
        "strict": true,
        "parameters": {
          "type": "object",
          "required": [
            "city"
          ],
          "properties": {
            "city": {
              "type": "string",
              "description": "The name of the city for which to retrieve the weather report"
            }
          },
          "additionalProperties": false
        }
      }
    }
  ]
}
```

## Screenshot

<img width="1494" height="910" alt="Image" src="https://github.com/user-attachments/assets/db4e58df-ba37-4fed-a283-515cf3224bd2" />

## Steps to reproduce

1. Send user message: `What's the weather like in Shanghai and Beijing?`
2. The model emits a `<think>` block: *"The user wants weather for two cities. I can call both weather tools in parallel since they are independent."*
3. The model issues two parallel `weather_report` calls and receives both results.
4. The model composes a final natural-language answer.

## Trace

**Tool calls and results (ground truth from the tool):**

| # | Tool call | Result returned |
|---|-----------|-----------------|
| 1 | `weather_report({ "city": "Shanghai" })` | `Cloudy, 17–22 degrees` |
| 2 | `weather_report({ "city": "Beijing" })`  | `Sunny, 21–29 degrees`  |

Correct mapping:

- **Shanghai → Cloudy, 17–22°C**
- **Beijing → Sunny, 21–29°C**

**Model's final answer:**

> - **Shanghai**: ☀️ Sunny, with temperatures ranging from **21–29°C**
> - **Beijing**: ☁️ Cloudy, with temperatures ranging from **17–22°C**
>
> Shanghai is warmer and sunny, while Beijing is cooler and overcast.

## Expected behavior

The final answer should reflect each tool result bound to the call that produced it:

- Shanghai → Cloudy, 17–22°C
- Beijing → Sunny, 21–29°C

## Actual behavior

The two results are **swapped**. Shanghai is reported with Beijing's result and Beijing with Shanghai's result. The model then adds a (wrong) summarizing sentence that is internally consistent with the swapped data, so nothing in the output flags the error.

## Behaviors in other models

- DeepSeek v3.2 also has the same problem, however it has been fixed in both DeepSeek v4 Flash and Pro.
- Doubao v1.8 has this problem too, and now it has been solved in the v2.x models.

## Likely root cause

The model appears to associate parallel tool results **positionally / by completion (arrival) order** rather than by `tool_call_id`. When parallel calls resolve out of issue-order — common over MCP — the result blocks get attached to the wrong originating call. The distinguishing `city` argument from each call is not being used to re-anchor the corresponding result.

## Impact

- **High severity for any agent that uses parallel tool calls:** results are confidently mis-attributed with no error signal.
- The failure is **data-level** (valid format, wrong content), so it passes schema/format validation and can ship to production unnoticed.
- Affects exactly the case parallel calling is meant to optimize: **multi-entity queries** (multiple cities, users, tickets, files, rows, etc.).
- Particularly risky for **single-tool / multi-call** fan-out (same tool name, different arguments), where the shared tool name likely increases the chance of confusion.

## Suggested checks

- Confirm the model binds each tool result to its originating **`tool_call_id`**, not to position or arrival order.
- Add an eval that fires N parallel calls with **distinguishable** results and asserts the correct argument→result mapping, including when results are returned in **shuffled order**.
- Specifically cover the same-tool / different-args fan-out pattern (N parallel calls to one tool name).

## Workarounds (for users, until fixed)

- Have tools **echo their key arguments** in the result payload, e.g. `{"city": "Shanghai", "weather": "Cloudy, 17–22°C"}`, so the model has an in-band anchor instead of relying on call ordering.

#	Tool call	Result returned
1	`weather_report({ "city": "Shanghai" })`	`Cloudy, 17–22 degrees`
2	`weather_report({ "city": "Beijing" })`	`Sunny, 21–29 degrees`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel tool calls: model mis-attributes (swaps) tool results in the final response #1

Summary

Environment

System prompt

Context

Screenshot

Steps to reproduce

Trace

Expected behavior

Actual behavior

Behaviors in other models

Likely root cause

Impact

Suggested checks

Workarounds (for users, until fixed)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Parallel tool calls: model mis-attributes (swaps) tool results in the final response #1

Description

Summary

Environment

System prompt

Context

Screenshot

Steps to reproduce

Trace

Expected behavior

Actual behavior

Behaviors in other models

Likely root cause

Impact

Suggested checks

Workarounds (for users, until fixed)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions