[Usage]: gpu memory usage when using tensor parallel #4880

DaiJianghai · 2024-05-17T06:01:35Z

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

i try to use vllm to serve Qwen-32B-chat-AWQ in 3090(24G x 2).
in my expectation, 24G memory could be enough in one gpu, so i use one GPU at first time, but failed
then i try to use tensor parallel to serve the model and that work, but memeory usage over my expection: 18G for each GPU, total 36G, that much more beyond my expectation for one GPU, i want to know, if that is common

in my expectation, 13-14G for each GPU is enough

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-05-17T06:05:38Z

You can set the --gpu-memory-utilization parameter to a smaller value (default is 90% of each GPU)

DaiJianghai added the usage How to use vllm label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: gpu memory usage when using tensor parallel #4880

[Usage]: gpu memory usage when using tensor parallel #4880

DaiJianghai commented May 17, 2024

DarkLight1337 commented May 17, 2024

[Usage]: gpu memory usage when using tensor parallel #4880

[Usage]: gpu memory usage when using tensor parallel #4880

Comments

DaiJianghai commented May 17, 2024

Your current environment

How would you like to use vllm

DarkLight1337 commented May 17, 2024