Skip to content

Selfhosted vLLM Server (Qwen2.5-VL-32B-Instruct) #118

@plitc

Description

@plitc

@atupem
Since Ollama doesn’t support “supports_function_calling,” we’ve switched to vLLM.
However, our current parameters/configuration don’t work with ByteBot.
Could you help us?

vLLM Server (Docker / Config)

  • Proxmox VM with 4x NVIDIA RTX 6000A

#!/bin/sh
export HUGGING_FACE_HUB_TOKEN=hf_XXX-XXX-XXX
export CUDA_VISIBLE_DEVICES="0,1,2,3"
docker run \
--name vllm-qwen-vl \
--network vllm-qwen-vl \
--gpus all \
--runtime=nvidia \
--ipc=host \
--rm --init \
-p 8000:8000 \
-v /opt/vllm:/root/.cache/huggingface \
vllm/vllm-openai:latest --model Qwen/Qwen2.5-VL-32B-Instruct --served-model-name "Qwen2.5-VL-32B-Instruct" --tensor-parallel-size 4 --max_model_len 32768 --enable-auto-tool-choice --tool-call-parser hermes --chat-template-content-format openai --chat-template /root/.cache/huggingface/chat_template.json
#

ByteBot (Config)

root@bytebot-1:/opt/bytebot# egrep -A 6 -B 2 "Qwen2.5-VL" packages/bytebot-llm-proxy/litellm-config.yaml
model_list:
- model_name: VM426:Qwen2.5-VL-32B-Instruct
litellm_params:
model: openai/Qwen2.5-VL-32B-Instruct
api_base: https://XXX-XXX-XXX-XXX/v1
supports_function_calling: true
drop_params: true
- model_name: VM426:OpenGVLab/InternVL3_5-38B
litellm_params:
...
root@bytebot-1:/opt/bytebot#

Image

Errors (vLLM)

(APIServer pid=7) WARNING 09-12 16:12:18 [protocol.py:81] The following fields were present in the request but ignored: {'supports_function_calling'}
(APIServer pid=7) WARNING 09-12 16:12:18 [sampling_params.py:311] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.
(APIServer pid=7) INFO: 172.21.0.1:37970 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=7) WARNING 09-12 16:12:19 [protocol.py:81] The following fields were present in the request but ignored: {'supports_function_calling'}
(APIServer pid=7) WARNING 09-12 16:12:19 [sampling_params.py:311] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.
(APIServer pid=7) INFO: 172.21.0.1:37970 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Errors (ByteBot)

[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Found existing task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509, and status PENDING. Resuming.
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Updating task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Update data: {"status":"RUNNING","executedAt":"2025-09-12T22:26:20.005Z"}
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Successfully updated task ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Updated task: {"id":"be692d7e-d4cd-42b8-99ca-6057beacd509","description":"Open Firefox Browser","type":"IMMEDIATE","status":"RUNNING","priority":"MEDIUM","control":"ASSISTANT","createdAt":"2025-09-12T22:26:16.493Z","createdBy":"USER","scheduledFor":null,"updatedAt":"2025-09-12T22:26:20.008Z","executedAt":"2025-09-12T22:26:20.005Z","completedAt":null,"queuedAt":null,"error":null,"result":null,"model":{"name":"openai/Qwen2.5-VL-32B-Instruct","title":"VM426:Qwen2.5-VL-32B-Instruct","provider":"proxy","contextWindow":128000}}
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [AgentScheduler] Processing task ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [AgentProcessor] Starting processing for task ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [AgentProcessor] Processing iteration for task ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [AgentProcessor] Sending 1 messages to LLM for processing
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [AgentProcessor] Received 0 content blocks from LLM
[Nest] 18 - 09/12/2025, 10:26:20 PM WARN [AgentProcessor] Task ID: be692d7e-d4cd-42b8-99ca-6057beacd509 received no content blocks from LLM, marking as failed
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Updating task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Update data: {"status":"FAILED"}
[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509
[Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions