Hardware Requirement for Running Llama-2 inferences #61

shang-zhu · 2024-01-19T22:29:04Z

Hi, I successfully ran the inferences with Llama-2-7b and unlimiformer but ran into memory errors when jumped to larger models. What are the minimum GPU memory requirements for running 13b and 70b models? Thank you!

abertsch72 · 2024-01-22T15:07:20Z

Thanks for your interest in our work!

The memory required depends on two things:

The base memory needed for that model (as you'd expect!). I haven't personally tried the 70b model, but this NVIDIA guide gives numbers that look pretty reasonable to me:

The file size of the model varies on how large the model is.
Llama2-7B-Chat requires about 30GB of storage.
Llama2-13B-Chat requires about 50GB of storage.
Llama2-70B-Chat requires about 150GB of storage.

The number of layers you apply Unlimiformer at. The good news here is that the additional cost from Unlimiformer doesn't depend on the model size (since we're only saving hidden states, and the models all have the same hidden dimension). You can calculate this for your input/use case by looking at the difference in GPU memory used between your 7b Llama+Unlimiformer setup and the base 7b model.

As a general recipe, I'd guess (amount of memory for the model) + (2-3 GBs per layer you'd like to apply Unlimiformer at) will get you pretty close to the amount needed, but this depends on how long your inputs are and whether you choose flat or trained indices.

zhangwenhao666 · 2024-03-13T08:10:41Z

I would like to ask about the input of about 100,000 tokens. Using the llama2-13b model, how long does it take to run it on an H100 graphics card?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware Requirement for Running Llama-2 inferences #61

Hardware Requirement for Running Llama-2 inferences #61

shang-zhu commented Jan 19, 2024

abertsch72 commented Jan 22, 2024

zhangwenhao666 commented Mar 13, 2024

Hardware Requirement for Running Llama-2 inferences #61

Hardware Requirement for Running Llama-2 inferences #61

Comments

shang-zhu commented Jan 19, 2024

abertsch72 commented Jan 22, 2024

zhangwenhao666 commented Mar 13, 2024