Selfhosted vLLM Server (Qwen2.5-VL-32B-Instruct)

@atupem
Since Ollama doesn’t support “supports_function_calling,” we’ve switched to vLLM.
However, our current parameters/configuration don’t work with ByteBot.
Could you help us?

**vLLM Server** (Docker / Config)
- Proxmox VM with 4x NVIDIA RTX 6000A

`#!/bin/sh`
`export HUGGING_FACE_HUB_TOKEN=hf_XXX-XXX-XXX`
`export CUDA_VISIBLE_DEVICES="0,1,2,3"`
`docker run \`
`       --name vllm-qwen-vl \`
`       --network vllm-qwen-vl \`
`       --gpus all \`
`       --runtime=nvidia \`
`       --ipc=host \`
`       --rm --init \`
`       -p 8000:8000 \`
`       -v /opt/vllm:/root/.cache/huggingface \`
`       vllm/vllm-openai:latest --model Qwen/Qwen2.5-VL-32B-Instruct --served-model-name "Qwen2.5-VL-32B-Instruct" --tensor-parallel-size 4 --max_model_len 32768 --enable-auto-tool-choice --tool-call-parser hermes --chat-template-content-format openai --chat-template /root/.cache/huggingface/chat_template.json`
`#`

**ByteBot** (Config)

`root@bytebot-1:/opt/bytebot# egrep -A 6 -B 2 "Qwen2.5-VL" packages/bytebot-llm-proxy/litellm-config.yaml`
`model_list:`
`  - model_name: VM426:Qwen2.5-VL-32B-Instruct`
`    litellm_params:`
`      model: openai/Qwen2.5-VL-32B-Instruct`
`      api_base: https://XXX-XXX-XXX-XXX/v1`
`      supports_function_calling: true`
`      drop_params: true`
`  - model_name: VM426:OpenGVLab/InternVL3_5-38B`
`    litellm_params:`
`...`
`root@bytebot-1:/opt/bytebot#`

<img width="1651" height="907" alt="Image" src="https://github.com/user-attachments/assets/ddea6e0c-e6c4-407c-b1b2-65eb401fbbcd" />

**Errors** (vLLM)

`(APIServer pid=7) WARNING 09-12 16:12:18 [protocol.py:81] The following fields were present in the request but ignored: {'supports_function_calling'}`
`(APIServer pid=7) WARNING 09-12 16:12:18 [sampling_params.py:311] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.`
`(APIServer pid=7) INFO:     172.21.0.1:37970 - "POST /v1/chat/completions HTTP/1.1" 200 OK`
`(APIServer pid=7) WARNING 09-12 16:12:19 [protocol.py:81] The following fields were present in the request but ignored: {'supports_function_calling'}`
`(APIServer pid=7) WARNING 09-12 16:12:19 [sampling_params.py:311] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.`
`(APIServer pid=7) INFO:     172.21.0.1:37970 - "POST /v1/chat/completions HTTP/1.1" 200 OK`

**Errors** (ByteBot)

`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [TasksService] Found existing task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509, and status PENDING. Resuming.`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [TasksService] Updating task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [TasksService] Update data: {"status":"RUNNING","executedAt":"2025-09-12T22:26:20.005Z"}`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [TasksService] Successfully updated task ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [TasksService] Updated task: {"id":"be692d7e-d4cd-42b8-99ca-6057beacd509","description":"Open Firefox Browser","type":"IMMEDIATE","status":"RUNNING","priority":"MEDIUM","control":"ASSISTANT","createdAt":"2025-09-12T22:26:16.493Z","createdBy":"USER","scheduledFor":null,"updatedAt":"2025-09-12T22:26:20.008Z","executedAt":"2025-09-12T22:26:20.005Z","completedAt":null,"queuedAt":null,"error":null,"result":null,"model":{"name":"openai/Qwen2.5-VL-32B-Instruct","title":"VM426:Qwen2.5-VL-32B-Instruct","provider":"proxy","contextWindow":128000}}`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [AgentScheduler] Processing task ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [AgentProcessor] Starting processing for task ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [AgentProcessor] Processing iteration for task ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [AgentProcessor] Sending 1 messages to LLM for processing`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [AgentProcessor] Received 0 content blocks from LLM`
`[Nest] 18  - 09/12/2025, 10:26:20 PM    WARN [AgentProcessor] Task ID: be692d7e-d4cd-42b8-99ca-6057beacd509 received no content blocks from LLM, marking as failed`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [TasksService] Updating task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [TasksService] Update data: {"status":"FAILED"}`
`[Nest] 18  - 09/12/2025, 10:26:20 PM     LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509`
`[Nest] 18  - 09/12/2025, 10:26:20 PM   DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Selfhosted vLLM Server (Qwen2.5-VL-32B-Instruct) #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Selfhosted vLLM Server (Qwen2.5-VL-32B-Instruct) #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions