Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance on Ultra core 7 cpu (meteor lake) #255

Closed
sebastienbo opened this issue Feb 11, 2024 · 4 comments
Closed

Slow performance on Ultra core 7 cpu (meteor lake) #255

sebastienbo opened this issue Feb 11, 2024 · 4 comments
Labels

Comments

@sebastienbo
Copy link

sebastienbo commented Feb 11, 2024

Hi,

I have a new meteor lake laptop (without nvidia gpu) and llama is responding really slow.
I think the process is forked on the efficieny cores for some reason instead of the performance cores.
I noticed in the readme file that there are options to enable nvidia or amd gpu's, however I don't see how I can enabled the NPU usage or the ARC igpu from microsoft.

Can someone point me out the documentation?

thank you

@jart
Copy link
Collaborator

jart commented Feb 12, 2024

We haven't implemented support yet for managing performance vs. efficiency cores. I have a Raptor Lake CPU. The OS reports that it has 32 CPUs. I know it has 8 efficiency cores. Right now llamafile chooses 16 as the default, but on this CPU it actually goes faster if I pass -t 8 to limit it to just the performance cores. However at the same time, it'll go even faster than that if I say -t 31 to use all the CPUs. In order to make a default choice for something like that, we'd need some kind of nonobvious approach.

There's no support yet for Intel GPUs. That might happen in the future if it's possible to support them through Vulkan, which is being worked on upstream. For now I think you should assume that you need either an NVIDIA or AMD GPU to get GPU perf.

Thanks for using llamafile. I hope this information helps!

@jart jart closed this as completed Feb 12, 2024
@jart jart added the question label Feb 12, 2024
@sebastienbo
Copy link
Author

sebastienbo commented Feb 23, 2024 via email

@jart
Copy link
Collaborator

jart commented Feb 23, 2024

GGML usually needs to be rewritten from scratch for each computing platform. If someone ends up contributing that to llama.cpp then we might be able to support it. But we'd need to possess the hardware in order to do the development work that would require. Right now the only computing platforms we support are x86, aarch64, nvidia cuda, amd hip, and apple metal. That represents the lion's share of the computing community I think.

@sebastienbo
Copy link
Author

sebastienbo commented Mar 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants