-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it possible to speed up the polyglot-12.8b-koalpaca-v1.1b ? #34
Comments
To be clairfy, Give me some information to guess.
|
I would appreciate it if you have any advice to fix these issues |
Have you tried with if MODEL = 'beomi/KoAlpaca-Polyglot-12.8B'
model = AutoModelForCausalLM.from_pretrained(
MODEL,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
).to(device=f"cuda", non_blocking=True)
model.eval()
print() |
Oh, thank you! |
Since the precision of the fp16 is exactly half of fp32, but as generation model it does not harm the generation quality. |
Hi there,
I tried to use your new provided model (polyglot-12.8b-koalpaca-v1.1b) on my local system (with one GPU)
but it's kinda slow, is there any way that I can speed up the functionality?
thank you!
The text was updated successfully, but these errors were encountered: