-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: RuntimeError: "topk_cpu" not implemented for 'Half' #42
Comments
This error happens when part of the model is to large to be fully held by the GPU. So it is offloaded to the CPU and the CPU does not support the topk operation in fp16. The solution should be to increase the |
That's what I was thinking too, but I was on a A100 on GCP (so 40GB of memory). And usually, when it's a memory issue, I get the |
Hey @lorr1 ! |
When using a bloom model with generate, I get
when
do_sample=True
.I.e.
The text was updated successfully, but these errors were encountered: