Skip to content

fix: dedupe duplicated tool call fields#7765

Open
bugkeep wants to merge 1 commit intoAstrBotDevs:masterfrom
bugkeep:bugfix/7694-dedupe-toolcall
Open

fix: dedupe duplicated tool call fields#7765
bugkeep wants to merge 1 commit intoAstrBotDevs:masterfrom
bugkeep:bugfix/7694-dedupe-toolcall

Conversation

@bugkeep
Copy link
Copy Markdown
Contributor

@bugkeep bugkeep commented Apr 24, 2026

Fixes #7694.

Some OpenAI-compatible proxies can duplicate streaming chunks; when that happens, tool_call.id and tool_call.function.name can end up as a self-concatenated string (s + s). We now defensively de-duplicate those fields during completion parsing, and add a unit test covering the regression.

Summary by Sourcery

Handle duplicated tool call metadata from OpenAI-compatible proxies during completion parsing and add coverage for this regression.

Bug Fixes:

  • Normalize self-concatenated tool call IDs and function names when parsing OpenAI-style completions to avoid duplicated metadata from streaming proxies.

Tests:

  • Add an async unit test verifying that self-concatenated tool call IDs and names are de-duplicated when parsing OpenAI completions.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider moving _dedupe_self_concatenated to a module-level helper (or shared utility) so it can be reused and more easily unit-tested in isolation rather than as an inner function.
  • The min_len thresholds (16 for IDs and 8 for names) are embedded as magic numbers; it would be clearer to lift these into named constants or document why these particular values are appropriate for the expected formats.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider moving `_dedupe_self_concatenated` to a module-level helper (or shared utility) so it can be reused and more easily unit-tested in isolation rather than as an inner function.
- The `min_len` thresholds (16 for IDs and 8 for names) are embedded as magic numbers; it would be clearer to lift these into named constants or document why these particular values are appropriate for the expected formats.

## Individual Comments

### Comment 1
<location path="tests/test_openai_source.py" line_range="1183-1185" />
<code_context>
+async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields():
+    provider = _make_provider()
+    try:
+        tool_call_id = "call_95fae017db5b4a91b1259aba"
+        tool_name = "astr_kb_search"
+        completion = ChatCompletion.model_validate(
+            {
+                "id": "chatcmpl-toolcall-dup",
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where only one of `tool_call.id` or `tool_call.function.name` is duplicated to verify they are handled independently.

The current test only covers the case where both ID and function name are duplicated together. Please add two more cases: (a) duplicated ID with a normal function name, and (b) duplicated function name with a normal ID, to confirm each field’s deduping behavior is independent.

Suggested implementation:

```python
@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_id_only():
    provider = _make_provider()
    try:
        tool_call_id = "call_95fae017db5b4a91b1259aba"
        tool_name = "astr_kb_search"
        completion = ChatCompletion.model_validate(
            {
                "id": "chatcmpl-toolcall-dup-id-only",
                "object": "chat.completion",
                "created": 0,
                "model": "gpt-4o-mini",
                "choices": [
                    {
                        "index": 0,
                        "message": {
                            "role": "assistant",
                            "content": None,
                            "refusal": None,
                            "tool_calls": [
                                {
                                    "id": f"{tool_call_id}{tool_call_id}",
                                    "type": "function",
                                    "function": {
                                        "name": tool_name,
                                        "arguments": '{"query": "test"}',
                                    },
                                }
                            ],
                        },
                        "logprobs": None,
                        "finish_reason": "tool_calls",
                    }
                ],
                "usage": {
                    "prompt_tokens": 0,
                    "completion_tokens": 0,
                    "total_tokens": 0,
                },
            }
        )

        events = [e async for e in provider._parse_openai_completion(completion)]
        assert len(events) == 1

        output_message = events[0].output_message
        assert output_message is not None
        assert output_message.tool_calls is not None
        assert len(output_message.tool_calls) == 1

        tool_call = output_message.tool_calls[0]
        assert tool_call.id == tool_call_id
        assert tool_call.function.name == tool_name
    finally:
        await provider.terminate()


@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_function_name_only():
    provider = _make_provider()
    try:
        tool_call_id = "call_95fae017db5b4a91b1259aba"
        tool_name = "astr_kb_search"
        completion = ChatCompletion.model_validate(
            {
                "id": "chatcmpl-toolcall-dup-name-only",
                "object": "chat.completion",
                "created": 0,
                "model": "gpt-4o-mini",
                "choices": [
                    {
                        "index": 0,
                        "message": {
                            "role": "assistant",
                            "content": None,
                            "refusal": None,
                            "tool_calls": [
                                {
                                    "id": tool_call_id,
                                    "type": "function",
                                    "function": {
                                        "name": f"{tool_name}{tool_name}",
                                        "arguments": '{"query": "test"}',
                                    },
                                }
                            ],
                        },
                        "logprobs": None,
                        "finish_reason": "tool_calls",
                    }
                ],
                "usage": {
                    "prompt_tokens": 0,
                    "completion_tokens": 0,
                    "total_tokens": 0,
                },
            }
        )

        events = [e async for e in provider._parse_openai_completion(completion)]
        assert len(events) == 1

        output_message = events[0].output_message
        assert output_message is not None
        assert output_message.tool_calls is not None
        assert len(output_message.tool_calls) == 1

        tool_call = output_message.tool_calls[0]
        assert tool_call.id == tool_call_id
        assert tool_call.function.name == tool_name
    finally:
        await provider.terminate()


@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields():
    provider = _make_provider()
    try:
        tool_call_id = "call_95fae017db5b4a91b1259aba"
        tool_name = "astr_kb_search"
        completion = ChatCompletion.model_validate(
            {
                "id": "chatcmpl-toolcall-dup",
                "object": "chat.completion",
                "created": 0,
                "model": "gpt-4o-mini",
                "choices": [
                    {
                        "index": 0,
                        "message": {
                            "role": "assistant",
                            "content": None,
                            "refusal": None,

```

The new tests assume:
1. The helper under test is `provider._parse_openai_completion` and that it yields events with an `output_message.tool_calls` structure identical to the existing test.
2. The existing test constructs `tool_calls` as shown (a list with `id`, `type: "function"`, and `function: {name, arguments}`).

Please:
- Ensure the signature and usage of `provider._parse_openai_completion` and the event/`output_message` shape match those used in `test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields`. If the existing test uses a different helper or field path, mirror that in the two new tests.
- Align the `tool_calls` payload shape (keys like `"tool_calls"`, `"type"`, `"function"`, `"arguments"`) with whatever is used in the existing dedupe test; update field names if they differ.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1183 to +1185
tool_call_id = "call_95fae017db5b4a91b1259aba"
tool_name = "astr_kb_search"
completion = ChatCompletion.model_validate(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test case where only one of tool_call.id or tool_call.function.name is duplicated to verify they are handled independently.

The current test only covers the case where both ID and function name are duplicated together. Please add two more cases: (a) duplicated ID with a normal function name, and (b) duplicated function name with a normal ID, to confirm each field’s deduping behavior is independent.

Suggested implementation:

@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_id_only():
    provider = _make_provider()
    try:
        tool_call_id = "call_95fae017db5b4a91b1259aba"
        tool_name = "astr_kb_search"
        completion = ChatCompletion.model_validate(
            {
                "id": "chatcmpl-toolcall-dup-id-only",
                "object": "chat.completion",
                "created": 0,
                "model": "gpt-4o-mini",
                "choices": [
                    {
                        "index": 0,
                        "message": {
                            "role": "assistant",
                            "content": None,
                            "refusal": None,
                            "tool_calls": [
                                {
                                    "id": f"{tool_call_id}{tool_call_id}",
                                    "type": "function",
                                    "function": {
                                        "name": tool_name,
                                        "arguments": '{"query": "test"}',
                                    },
                                }
                            ],
                        },
                        "logprobs": None,
                        "finish_reason": "tool_calls",
                    }
                ],
                "usage": {
                    "prompt_tokens": 0,
                    "completion_tokens": 0,
                    "total_tokens": 0,
                },
            }
        )

        events = [e async for e in provider._parse_openai_completion(completion)]
        assert len(events) == 1

        output_message = events[0].output_message
        assert output_message is not None
        assert output_message.tool_calls is not None
        assert len(output_message.tool_calls) == 1

        tool_call = output_message.tool_calls[0]
        assert tool_call.id == tool_call_id
        assert tool_call.function.name == tool_name
    finally:
        await provider.terminate()


@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_function_name_only():
    provider = _make_provider()
    try:
        tool_call_id = "call_95fae017db5b4a91b1259aba"
        tool_name = "astr_kb_search"
        completion = ChatCompletion.model_validate(
            {
                "id": "chatcmpl-toolcall-dup-name-only",
                "object": "chat.completion",
                "created": 0,
                "model": "gpt-4o-mini",
                "choices": [
                    {
                        "index": 0,
                        "message": {
                            "role": "assistant",
                            "content": None,
                            "refusal": None,
                            "tool_calls": [
                                {
                                    "id": tool_call_id,
                                    "type": "function",
                                    "function": {
                                        "name": f"{tool_name}{tool_name}",
                                        "arguments": '{"query": "test"}',
                                    },
                                }
                            ],
                        },
                        "logprobs": None,
                        "finish_reason": "tool_calls",
                    }
                ],
                "usage": {
                    "prompt_tokens": 0,
                    "completion_tokens": 0,
                    "total_tokens": 0,
                },
            }
        )

        events = [e async for e in provider._parse_openai_completion(completion)]
        assert len(events) == 1

        output_message = events[0].output_message
        assert output_message is not None
        assert output_message.tool_calls is not None
        assert len(output_message.tool_calls) == 1

        tool_call = output_message.tool_calls[0]
        assert tool_call.id == tool_call_id
        assert tool_call.function.name == tool_name
    finally:
        await provider.terminate()


@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields():
    provider = _make_provider()
    try:
        tool_call_id = "call_95fae017db5b4a91b1259aba"
        tool_name = "astr_kb_search"
        completion = ChatCompletion.model_validate(
            {
                "id": "chatcmpl-toolcall-dup",
                "object": "chat.completion",
                "created": 0,
                "model": "gpt-4o-mini",
                "choices": [
                    {
                        "index": 0,
                        "message": {
                            "role": "assistant",
                            "content": None,
                            "refusal": None,

The new tests assume:

  1. The helper under test is provider._parse_openai_completion and that it yields events with an output_message.tool_calls structure identical to the existing test.
  2. The existing test constructs tool_calls as shown (a list with id, type: "function", and function: {name, arguments}).

Please:

  • Ensure the signature and usage of provider._parse_openai_completion and the event/output_message shape match those used in test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields. If the existing test uses a different helper or field path, mirror that in the two new tests.
  • Align the tool_calls payload shape (keys like "tool_calls", "type", "function", "arguments") with whatever is used in the existing dedupe test; update field names if they differ.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a defensive de-duplication mechanism for tool call fields to handle issues where certain OpenAI-compatible proxies duplicate streaming chunks. It adds a helper function to detect and fix self-concatenated strings in tool IDs and function names, along with a corresponding test case. The review feedback suggests refactoring the nested helper function into a static method on the class to maintain consistency with the existing codebase structure and improve organization.

Comment on lines +848 to +852
def _dedupe_self_concatenated(value: str, *, min_len: int) -> str:
if not value or len(value) < min_len or (len(value) % 2) != 0:
return value
half = len(value) // 2
return value[:half] if value[:half] == value[half:] else value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better code organization and consistency with other helper methods in this class (like _safe_json_dump), consider moving this nested function out of _parse_openai_completion and defining it as a staticmethod on the ProviderOpenAIOfficial class. This improves discoverability and aligns with the existing structure of the file.

After moving it, you would update the calls to self._dedupe_self_concatenated(...).

References
  1. Refactor logic into shared helper functions to improve code organization and reusability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]在查询的时候提示:未找到指定的工具: astr_kb_searchastr_kb_search,将跳过。

1 participant