VRAM performance #3

cuuupid · 2022-04-08T20:16:24Z

Hi @BlinkDL! First off this is amazing and seems very promising for scaling down large Transformers to be more production friendly.

I'm wondering if you have any benchmarks regarding VRAM performance? Specifically I've got 3 questions:

1 - How much VRAM does this model (or rather, the CUDA version) need for training? Are we talking 1060 size (6gb), 3090 size (20gb), or a6000+ size (40+gb)
2 - Same question as 1, but for inference?
3 - Can this run on CPU reasonably?

BlinkDL · 2022-04-15T14:04:30Z

Similar to usual GPT of the same size, because we are using parallelization to increase training speed. However, you can definitely train it like a RNN to save VRAM (but that will be much slower).
More friendly than usual GPT. Because you don't need to keep a huge context (or kv cache). You just need the hidden state of the last single token.
YES! Inference is very fast even on CPU. Please try run.py

BlinkDL closed this as completed Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VRAM performance #3

VRAM performance #3

cuuupid commented Apr 8, 2022

BlinkDL commented Apr 15, 2022 •

edited

VRAM performance #3

VRAM performance #3

Comments

cuuupid commented Apr 8, 2022

BlinkDL commented Apr 15, 2022 • edited

BlinkDL commented Apr 15, 2022 •

edited