性能压测:Qwen2.5-72B-Instruct-GPTQ-Int4的RPS比Qwen2-72B-Instruct-GPTQ-Int4要低,平均耗时高 #1019
kartikzheng
started this conversation in
General
Replies: 1 comment
-
|
输出长度可能不一样 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用locust对vllm起的接口进行性能压测,发现Qwen2.5-72B-Instruct-GPTQ-Int4的RPS比Qwen2-72B-Instruct-GPTQ-Int4要低,平均耗时高,具体压测数据如下:
<style> </style>测试前提:压测时间5分钟;每秒产生用户数5;
测试环境:H800,两块40G
单位:毫秒
Beta Was this translation helpful? Give feedback.
All reactions