-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Description
Your current environment
vllm 0.7.2
torch 2.4
cuda 12.1
🐛 Describe the bug
OpenAI-Compatible Server in chat window call by url base_url="http://localhost:8000/v1" when call api,why 200 OK only the first time and then always 400 Bad Request:
log 阿斯follows:
INFO: 127.0.0.1:59042 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 02-15 22:09:59 engine.py:275] Added request chatcmpl-803293759b1e415caefd7845b3fa8352.
INFO 02-15 22:10:03 metrics.py:455] Avg prompt throughput: 33.4 tokens/s, Avg generation throughput: 37.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO 02-15 22:10:08 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 43.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO: 127.0.0.1:59042 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.