You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying to use tool_calls with vLLM running llama 3.1 or 3.2. We found that the tool_calls data returned from vLLM is not the same as what OpenAI demonstrated so the OpenAI Adaptors are not working as expected (the function name is concated as a very long string so it cannot be found).
We are using Elastic Observability AI Assistant to connect to vLLM running llama3.2 to identify the problem. We don't have a code for this issue. Sample trace is as below:
Function title_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversationtitle_conversation called but was not available
at createFunctionNotFoundError (/usr/share/kibana/node_modules/@kbn/observability-ai-assistant-plugin/common/conversation_complete.js:71:10)
at Object.next (/usr/share/kibana/node_modules/@kbn/observability-ai-assistant-plugin/server/service/client/adapters/fail_on_non_existing_function_call.js:25:55)
at /usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/tap.js:20:81
at OperatorSubscriber._this._next (/usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/OperatorSubscriber.js:33:21)
at OperatorSubscriber.Subscriber.next (/usr/share/kibana/node_modules/rxjs/dist/cjs/internal/Subscriber.js:51:18)
...
Tool structuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutputstructuredOutput called but was not available
at createToolNotFoundError (/usr/share/kibana/node_modules/@kbn/inference-plugin/server/chat_complete/errors.js:32:10)
at /usr/share/kibana/node_modules/@kbn/inference-plugin/server/util/validate_tool_calls.js:33:49
at Array.map ()
at validateToolCalls (/usr/share/kibana/node_modules/@kbn/inference-plugin/server/util/validate_tool_calls.js:29:20)
at /usr/share/kibana/node_modules/@kbn/inference-plugin/server/chat_complete/utils/chunks_into_message.js:48:77
at /usr/share/kibana/node_modules/rxjs/dist/cjs/internal/operators/map.js:10:37
...
After checking the network dump, we realize the problem is because the misalignment of vLLM stream tool_calls message with OpenAI (which we are using OpenAI compatible protocol). Please let me know if need more information.
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment: vLLM 0.7.3 latest
We are trying to use tool_calls with vLLM running llama 3.1 or 3.2. We found that the tool_calls data returned from vLLM is not the same as what OpenAI demonstrated so the OpenAI Adaptors are not working as expected (the function name is concated as a very long string so it cannot be found).
As per OpenAI API document: OpenAI Document for streaming function calling
OpenAI streams
However, what we get from vLLM is:
vLLM streams
Then the processor is trying to do the same as receiving data from OpenAI and cause the confusion on function name.
So I found this is due to the code in vllm/entrypoints/openai/serving_chat.py
in first_iteration, it is sending a simple choice:
and in each chunk sending the function name repeatedly:
So I would like to suggest a following change:
and also:
I have attached the complete code as attached:
serving_chat.py
Thanks for looking into this.
🐛 Describe the bug
We are using Elastic Observability AI Assistant to connect to vLLM running llama3.2 to identify the problem. We don't have a code for this issue. Sample trace is as below:
After checking the network dump, we realize the problem is because the misalignment of vLLM stream tool_calls message with OpenAI (which we are using OpenAI compatible protocol). Please let me know if need more information.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: