fix: dedupe duplicated tool call fields#7765
fix: dedupe duplicated tool call fields#7765bugkeep wants to merge 1 commit intoAstrBotDevs:masterfrom
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider moving
_dedupe_self_concatenatedto a module-level helper (or shared utility) so it can be reused and more easily unit-tested in isolation rather than as an inner function. - The
min_lenthresholds (16 for IDs and 8 for names) are embedded as magic numbers; it would be clearer to lift these into named constants or document why these particular values are appropriate for the expected formats.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider moving `_dedupe_self_concatenated` to a module-level helper (or shared utility) so it can be reused and more easily unit-tested in isolation rather than as an inner function.
- The `min_len` thresholds (16 for IDs and 8 for names) are embedded as magic numbers; it would be clearer to lift these into named constants or document why these particular values are appropriate for the expected formats.
## Individual Comments
### Comment 1
<location path="tests/test_openai_source.py" line_range="1183-1185" />
<code_context>
+async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields():
+ provider = _make_provider()
+ try:
+ tool_call_id = "call_95fae017db5b4a91b1259aba"
+ tool_name = "astr_kb_search"
+ completion = ChatCompletion.model_validate(
+ {
+ "id": "chatcmpl-toolcall-dup",
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where only one of `tool_call.id` or `tool_call.function.name` is duplicated to verify they are handled independently.
The current test only covers the case where both ID and function name are duplicated together. Please add two more cases: (a) duplicated ID with a normal function name, and (b) duplicated function name with a normal ID, to confirm each field’s deduping behavior is independent.
Suggested implementation:
```python
@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_id_only():
provider = _make_provider()
try:
tool_call_id = "call_95fae017db5b4a91b1259aba"
tool_name = "astr_kb_search"
completion = ChatCompletion.model_validate(
{
"id": "chatcmpl-toolcall-dup-id-only",
"object": "chat.completion",
"created": 0,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": None,
"refusal": None,
"tool_calls": [
{
"id": f"{tool_call_id}{tool_call_id}",
"type": "function",
"function": {
"name": tool_name,
"arguments": '{"query": "test"}',
},
}
],
},
"logprobs": None,
"finish_reason": "tool_calls",
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
},
}
)
events = [e async for e in provider._parse_openai_completion(completion)]
assert len(events) == 1
output_message = events[0].output_message
assert output_message is not None
assert output_message.tool_calls is not None
assert len(output_message.tool_calls) == 1
tool_call = output_message.tool_calls[0]
assert tool_call.id == tool_call_id
assert tool_call.function.name == tool_name
finally:
await provider.terminate()
@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_function_name_only():
provider = _make_provider()
try:
tool_call_id = "call_95fae017db5b4a91b1259aba"
tool_name = "astr_kb_search"
completion = ChatCompletion.model_validate(
{
"id": "chatcmpl-toolcall-dup-name-only",
"object": "chat.completion",
"created": 0,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": None,
"refusal": None,
"tool_calls": [
{
"id": tool_call_id,
"type": "function",
"function": {
"name": f"{tool_name}{tool_name}",
"arguments": '{"query": "test"}',
},
}
],
},
"logprobs": None,
"finish_reason": "tool_calls",
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
},
}
)
events = [e async for e in provider._parse_openai_completion(completion)]
assert len(events) == 1
output_message = events[0].output_message
assert output_message is not None
assert output_message.tool_calls is not None
assert len(output_message.tool_calls) == 1
tool_call = output_message.tool_calls[0]
assert tool_call.id == tool_call_id
assert tool_call.function.name == tool_name
finally:
await provider.terminate()
@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields():
provider = _make_provider()
try:
tool_call_id = "call_95fae017db5b4a91b1259aba"
tool_name = "astr_kb_search"
completion = ChatCompletion.model_validate(
{
"id": "chatcmpl-toolcall-dup",
"object": "chat.completion",
"created": 0,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": None,
"refusal": None,
```
The new tests assume:
1. The helper under test is `provider._parse_openai_completion` and that it yields events with an `output_message.tool_calls` structure identical to the existing test.
2. The existing test constructs `tool_calls` as shown (a list with `id`, `type: "function"`, and `function: {name, arguments}`).
Please:
- Ensure the signature and usage of `provider._parse_openai_completion` and the event/`output_message` shape match those used in `test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields`. If the existing test uses a different helper or field path, mirror that in the two new tests.
- Align the `tool_calls` payload shape (keys like `"tool_calls"`, `"type"`, `"function"`, `"arguments"`) with whatever is used in the existing dedupe test; update field names if they differ.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| tool_call_id = "call_95fae017db5b4a91b1259aba" | ||
| tool_name = "astr_kb_search" | ||
| completion = ChatCompletion.model_validate( |
There was a problem hiding this comment.
suggestion (testing): Add a test case where only one of tool_call.id or tool_call.function.name is duplicated to verify they are handled independently.
The current test only covers the case where both ID and function name are duplicated together. Please add two more cases: (a) duplicated ID with a normal function name, and (b) duplicated function name with a normal ID, to confirm each field’s deduping behavior is independent.
Suggested implementation:
@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_id_only():
provider = _make_provider()
try:
tool_call_id = "call_95fae017db5b4a91b1259aba"
tool_name = "astr_kb_search"
completion = ChatCompletion.model_validate(
{
"id": "chatcmpl-toolcall-dup-id-only",
"object": "chat.completion",
"created": 0,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": None,
"refusal": None,
"tool_calls": [
{
"id": f"{tool_call_id}{tool_call_id}",
"type": "function",
"function": {
"name": tool_name,
"arguments": '{"query": "test"}',
},
}
],
},
"logprobs": None,
"finish_reason": "tool_calls",
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
},
}
)
events = [e async for e in provider._parse_openai_completion(completion)]
assert len(events) == 1
output_message = events[0].output_message
assert output_message is not None
assert output_message.tool_calls is not None
assert len(output_message.tool_calls) == 1
tool_call = output_message.tool_calls[0]
assert tool_call.id == tool_call_id
assert tool_call.function.name == tool_name
finally:
await provider.terminate()
@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_function_name_only():
provider = _make_provider()
try:
tool_call_id = "call_95fae017db5b4a91b1259aba"
tool_name = "astr_kb_search"
completion = ChatCompletion.model_validate(
{
"id": "chatcmpl-toolcall-dup-name-only",
"object": "chat.completion",
"created": 0,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": None,
"refusal": None,
"tool_calls": [
{
"id": tool_call_id,
"type": "function",
"function": {
"name": f"{tool_name}{tool_name}",
"arguments": '{"query": "test"}',
},
}
],
},
"logprobs": None,
"finish_reason": "tool_calls",
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
},
}
)
events = [e async for e in provider._parse_openai_completion(completion)]
assert len(events) == 1
output_message = events[0].output_message
assert output_message is not None
assert output_message.tool_calls is not None
assert len(output_message.tool_calls) == 1
tool_call = output_message.tool_calls[0]
assert tool_call.id == tool_call_id
assert tool_call.function.name == tool_name
finally:
await provider.terminate()
@pytest.mark.asyncio
async def test_parse_openai_completion_dedupes_self_concatenated_tool_call_fields():
provider = _make_provider()
try:
tool_call_id = "call_95fae017db5b4a91b1259aba"
tool_name = "astr_kb_search"
completion = ChatCompletion.model_validate(
{
"id": "chatcmpl-toolcall-dup",
"object": "chat.completion",
"created": 0,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": None,
"refusal": None,The new tests assume:
- The helper under test is
provider._parse_openai_completionand that it yields events with anoutput_message.tool_callsstructure identical to the existing test. - The existing test constructs
tool_callsas shown (a list withid,type: "function", andfunction: {name, arguments}).
Please:
- Ensure the signature and usage of
provider._parse_openai_completionand the event/output_messageshape match those used intest_parse_openai_completion_dedupes_self_concatenated_tool_call_fields. If the existing test uses a different helper or field path, mirror that in the two new tests. - Align the
tool_callspayload shape (keys like"tool_calls","type","function","arguments") with whatever is used in the existing dedupe test; update field names if they differ.
There was a problem hiding this comment.
Code Review
This pull request introduces a defensive de-duplication mechanism for tool call fields to handle issues where certain OpenAI-compatible proxies duplicate streaming chunks. It adds a helper function to detect and fix self-concatenated strings in tool IDs and function names, along with a corresponding test case. The review feedback suggests refactoring the nested helper function into a static method on the class to maintain consistency with the existing codebase structure and improve organization.
| def _dedupe_self_concatenated(value: str, *, min_len: int) -> str: | ||
| if not value or len(value) < min_len or (len(value) % 2) != 0: | ||
| return value | ||
| half = len(value) // 2 | ||
| return value[:half] if value[:half] == value[half:] else value |
There was a problem hiding this comment.
For better code organization and consistency with other helper methods in this class (like _safe_json_dump), consider moving this nested function out of _parse_openai_completion and defining it as a staticmethod on the ProviderOpenAIOfficial class. This improves discoverability and aligns with the existing structure of the file.
After moving it, you would update the calls to self._dedupe_self_concatenated(...).
References
- Refactor logic into shared helper functions to improve code organization and reusability.
Fixes #7694.
Some OpenAI-compatible proxies can duplicate streaming chunks; when that happens, tool_call.id and tool_call.function.name can end up as a self-concatenated string (s + s). We now defensively de-duplicate those fields during completion parsing, and add a unit test covering the regression.
Summary by Sourcery
Handle duplicated tool call metadata from OpenAI-compatible proxies during completion parsing and add coverage for this regression.
Bug Fixes:
Tests: