[Bug]: vLLM response on tool_calls does not align with OpenAI standard #14951

mshensg · 2025-03-17T11:36:28Z

Your current environment: vLLM 0.7.3 latest

We are trying to use tool_calls with vLLM running llama 3.1 or 3.2. We found that the tool_calls data returned from vLLM is not the same as what OpenAI demonstrated so the OpenAI Adaptors are not working as expected (the function name is concated as a very long string so it cannot be found).

As per OpenAI API document: OpenAI Document for streaming function calling

OpenAI streams

[{"index": 0, "id": "call_DdmO9pD3xa9XTPNJ32zg2hcA", "function": {"arguments": "", "name": "get_weather"}, "type": "function"}]
[{"index": 0, "id": null, "function": {"arguments": "{"", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": "location", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": "":"", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": "Paris", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": ",", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": " France", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": ""}", "name": null}, "type": null}]

However, what we get from vLLM is:

vLLM streams

{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"{""}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"location"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"":""}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"Paris"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":","}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":" France"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":""}"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":""}}]},"logprobs":null,"finish_reason":"stop","stop_reason":null}]}

Then the processor is trying to do the same as receiving data from OpenAI and cause the confusion on function name.

So I found this is due to the code in vllm/entrypoints/openai/serving_chat.py

in first_iteration, it is sending a simple choice:

and in each chunk sending the function name repeatedly:

So I would like to suggest a following change:

and also:

I have attached the complete code as attached:

serving_chat.py

Thanks for looking into this.

🐛 Describe the bug

We are using Elastic Observability AI Assistant to connect to vLLM running llama3.2 to identify the problem. We don't have a code for this issue. Sample trace is as below:

Function title_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversation called but was not available
at createFunctionNotFoundError (/usr/share/kibana/node_modules/@kbn/observability-ai-assistant-plugin/common/conversation_complete.js:71:10)
at Object.next (/usr/share/kibana/node_modules/@kbn/observability-ai-assistant-plugin/server/service/client/adapters/fail_on_non_existing_function_call.js:25:55)
at /usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/tap.js:20:81
at OperatorSubscriber._this._next (/usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/OperatorSubscriber.js:33:21)
at OperatorSubscriber.Subscriber.next (/usr/share/kibana/node_modules/rxjs/dist/cjs/internal/Subscriber.js:51:18)
...

Tool structuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutput called but was not available
at createToolNotFoundError (/usr/share/kibana/node_modules/@kbn/inference-plugin/server/chat_complete/errors.js:32:10)
at /usr/share/kibana/node_modules/@kbn/inference-plugin/server/util/validate_tool_calls.js:33:49
at Array.map ()
at validateToolCalls (/usr/share/kibana/node_modules/@kbn/inference-plugin/server/util/validate_tool_calls.js:29:20)
at /usr/share/kibana/node_modules/@kbn/inference-plugin/server/chat_complete/utils/chunks_into_message.js:48:77
at /usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/map.js:10:37
...

After checking the network dump, we realize the problem is because the misalignment of vLLM stream tool_calls message with OpenAI (which we are using OpenAI compatible protocol). Please let me know if need more information.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

mshensg added the bug label Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: vLLM response on tool_calls does not align with OpenAI standard #14951

[Bug]: vLLM response on tool_calls does not align with OpenAI standard #14951

mshensg commented Mar 17, 2025 •

edited

Loading

[Bug]: vLLM response on tool_calls does not align with OpenAI standard #14951

[Bug]: vLLM response on tool_calls does not align with OpenAI standard #14951

Comments

mshensg commented Mar 17, 2025 • edited Loading

Your current environment: vLLM 0.7.3 latest

🐛 Describe the bug

Before submitting a new issue...

mshensg commented Mar 17, 2025 •

edited

Loading