You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running Gemini 34B on A100 I get only 10 tokens/sec, when I can get 40 tokens per sec in llama-70B running on same GPU
Any changes I can make to increase the inference speed??
The text was updated successfully, but these errors were encountered:
When running Gemini 34B on A100 I get only 10 tokens/sec, when I can get 40 tokens per sec in llama-70B running on same GPU
Any changes I can make to increase the inference speed??
The text was updated successfully, but these errors were encountered: