Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRAM performance #3

Closed
cuuupid opened this issue Apr 8, 2022 · 1 comment
Closed

VRAM performance #3

cuuupid opened this issue Apr 8, 2022 · 1 comment

Comments

@cuuupid
Copy link

cuuupid commented Apr 8, 2022

Hi @BlinkDL! First off this is amazing and seems very promising for scaling down large Transformers to be more production friendly.

I'm wondering if you have any benchmarks regarding VRAM performance? Specifically I've got 3 questions:

1 - How much VRAM does this model (or rather, the CUDA version) need for training? Are we talking 1060 size (6gb), 3090 size (20gb), or a6000+ size (40+gb)
2 - Same question as 1, but for inference?
3 - Can this run on CPU reasonably?

@BlinkDL
Copy link
Owner

BlinkDL commented Apr 15, 2022

  1. Similar to usual GPT of the same size, because we are using parallelization to increase training speed. However, you can definitely train it like a RNN to save VRAM (but that will be much slower).
  2. More friendly than usual GPT. Because you don't need to keep a huge context (or kv cache). You just need the hidden state of the last single token.
  3. YES! Inference is very fast even on CPU. Please try run.py

@BlinkDL BlinkDL closed this as completed Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants