You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @BlinkDL! First off this is amazing and seems very promising for scaling down large Transformers to be more production friendly.
I'm wondering if you have any benchmarks regarding VRAM performance? Specifically I've got 3 questions:
1 - How much VRAM does this model (or rather, the CUDA version) need for training? Are we talking 1060 size (6gb), 3090 size (20gb), or a6000+ size (40+gb)
2 - Same question as 1, but for inference?
3 - Can this run on CPU reasonably?
The text was updated successfully, but these errors were encountered:
Similar to usual GPT of the same size, because we are using parallelization to increase training speed. However, you can definitely train it like a RNN to save VRAM (but that will be much slower).
More friendly than usual GPT. Because you don't need to keep a huge context (or kv cache). You just need the hidden state of the last single token.
YES! Inference is very fast even on CPU. Please try run.py
Hi @BlinkDL! First off this is amazing and seems very promising for scaling down large Transformers to be more production friendly.
I'm wondering if you have any benchmarks regarding VRAM performance? Specifically I've got 3 questions:
1 - How much VRAM does this model (or rather, the CUDA version) need for training? Are we talking 1060 size (6gb), 3090 size (20gb), or a6000+ size (40+gb)
2 - Same question as 1, but for inference?
3 - Can this run on CPU reasonably?
The text was updated successfully, but these errors were encountered: