-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Open
Labels
Description
Your current environment
INFO 06-16 10:03:14 [loggers.py:87] Engine 000: Avg prompt throughput: 1254.4 tokens/s, Avg generation throughput: 29.2 tokens/s, Running: 10 reqs, Waiting: 4 reqs, GPU KV cache usage: 57.9%, Prefix cache hit rate: 40.5%
Does the Avg generation throughput
means the tokens throughput?
Does it include the tokens generated by prefill ?
Can I use it as the metrics to measure the performance of throughput? Or it's only the decoding throughput ?
Thank you!
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.