Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: vLLM response on tool_calls does not align with OpenAI standard #14951

Open
1 task done
mshensg opened this issue Mar 17, 2025 · 0 comments
Open
1 task done

[Bug]: vLLM response on tool_calls does not align with OpenAI standard #14951

mshensg opened this issue Mar 17, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@mshensg
Copy link

mshensg commented Mar 17, 2025

Your current environment: vLLM 0.7.3 latest

We are trying to use tool_calls with vLLM running llama 3.1 or 3.2. We found that the tool_calls data returned from vLLM is not the same as what OpenAI demonstrated so the OpenAI Adaptors are not working as expected (the function name is concated as a very long string so it cannot be found).

As per OpenAI API document: OpenAI Document for streaming function calling

OpenAI streams
  • [{"index": 0, "id": "call_DdmO9pD3xa9XTPNJ32zg2hcA", "function": {"arguments": "", "name": "get_weather"}, "type": "function"}]
  • [{"index": 0, "id": null, "function": {"arguments": "{"", "name": null}, "type": null}]
  • [{"index": 0, "id": null, "function": {"arguments": "location", "name": null}, "type": null}]
  • [{"index": 0, "id": null, "function": {"arguments": "":"", "name": null}, "type": null}]
  • [{"index": 0, "id": null, "function": {"arguments": "Paris", "name": null}, "type": null}]
  • [{"index": 0, "id": null, "function": {"arguments": ",", "name": null}, "type": null}]
  • [{"index": 0, "id": null, "function": {"arguments": " France", "name": null}, "type": null}]
  • [{"index": 0, "id": null, "function": {"arguments": ""}", "name": null}, "type": null}]

However, what we get from vLLM is:

vLLM streams
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"{""}}]},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"location"}}]},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"":""}}]},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":"Paris"}}]},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":","}}]},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":" France"}}]},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":""}"}}]},"logprobs":null,"finish_reason":null}]}
  • {"id":"chatcmpl-dcd3e8852e0f4562a3a43a9dc7a61fbd","object":"chat.completion.chunk","created":1741766314,"model":"model","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"get_weather","arguments":""}}]},"logprobs":null,"finish_reason":"stop","stop_reason":null}]}

Then the processor is trying to do the same as receiving data from OpenAI and cause the confusion on function name.

So I found this is due to the code in vllm/entrypoints/openai/serving_chat.py

in first_iteration, it is sending a simple choice:

Image

and in each chunk sending the function name repeatedly:

Image

So I would like to suggest a following change:

Image

and also:

Image

I have attached the complete code as attached:

serving_chat.py

Thanks for looking into this.

🐛 Describe the bug

We are using Elastic Observability AI Assistant to connect to vLLM running llama3.2 to identify the problem. We don't have a code for this issue. Sample trace is as below:

Function title_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversation called but was not available
at createFunctionNotFoundError (/usr/share/kibana/node_modules/@kbn/observability-ai-assistant-plugin/common/conversation_complete.js:71:10)
at Object.next (/usr/share/kibana/node_modules/@kbn/observability-ai-assistant-plugin/server/service/client/adapters/fail_on_non_existing_function_call.js:25:55)
at /usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/tap.js:20:81
at OperatorSubscriber._this._next (/usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/OperatorSubscriber.js:33:21)
at OperatorSubscriber.Subscriber.next (/usr/share/kibana/node_modules/rxjs/dist/cjs/internal/Subscriber.js:51:18)
...

Tool structuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutput called but was not available
at createToolNotFoundError (/usr/share/kibana/node_modules/@kbn/inference-plugin/server/chat_complete/errors.js:32:10)
at /usr/share/kibana/node_modules/@kbn/inference-plugin/server/util/validate_tool_calls.js:33:49
at Array.map ()
at validateToolCalls (/usr/share/kibana/node_modules/@kbn/inference-plugin/server/util/validate_tool_calls.js:29:20)
at /usr/share/kibana/node_modules/@kbn/inference-plugin/server/chat_complete/utils/chunks_into_message.js:48:77
at /usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/map.js:10:37
...

After checking the network dump, we realize the problem is because the misalignment of vLLM stream tool_calls message with OpenAI (which we are using OpenAI compatible protocol). Please let me know if need more information.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@mshensg mshensg added the bug Something isn't working label Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant