[Misc]: why `3B-Instruct-AWQ` takes 16G

### Anything you want to discuss about vllm.

Ubuntu 22, RTX3090.
I've ran `vllm 0.8.1` with a very small model of https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-AWQ with below:
```
vllm serve Qwen/Qwen2.5-3B-Instruct-AWQ
INFO 03-20 17:28:12 [__init__.py:256] Automatically detected platform cuda.
INFO 03-20 17:28:13 [api_server.py:977] vLLM API server version 0.8.1
```
it works good, but when I looked via `nvidia-smi`, it takes almost 16G:
```
anaconda3/envs/vllm/bin/python      15982MiB
```
and i've changed to a bigger model of https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-AWQ but got the same GPU memory usage.

Question:
why `3B-Instruct-AWQ` takes 16G?
why `7B-Instruct-AWQ` takes the same GPU memory?


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc]: why `3B-Instruct-AWQ` takes 16G #15204

Anything you want to discuss about vllm.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Misc]: why 3B-Instruct-AWQ takes 16G #15204

Description

Anything you want to discuss about vllm.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Misc]: why `3B-Instruct-AWQ` takes 16G #15204