New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance on Ultra core 7 cpu (meteor lake) #255
Comments
We haven't implemented support yet for managing performance vs. efficiency cores. I have a Raptor Lake CPU. The OS reports that it has 32 CPUs. I know it has 8 efficiency cores. Right now llamafile chooses 16 as the default, but on this CPU it actually goes faster if I pass There's no support yet for Intel GPUs. That might happen in the future if it's possible to support them through Vulkan, which is being worked on upstream. For now I think you should assume that you need either an NVIDIA or AMD GPU to get GPU perf. Thanks for using llamafile. I hope this information helps! |
Thank you very much for your answer
And what about NPU support?
As you probably know, neural networks are far more performant then
GPU's for LLM's
And these NPU's are now part of the new Intel and AMD cpu's
These will allow lower end laptops to process LLM's even without GPU.
Have you considered supporting those?
https://medium.com/openvino-toolkit/how-to-run-and-develop-your-ai-app-on-intel-npu-intel-ai-boost-76f3efade169
<https://mailtrack.io/l/959011ae7ff76d96ceb42757d3a66e508158d244?url=https%3A%2F%2Fmedium.com%2Fopenvino-toolkit%2Fhow-to-run-and-develop-your-ai-app-on-intel-npu-intel-ai-boost-76f3efade169&u=8875671&signature=5af92d09d129d281>
Best regards,
*Sébastien*
…On Mon, 12 Feb 2024 at 20:06, Justine Tunney ***@***.***> wrote:
We haven't implemented support yet for managing performance vs. efficiency
cores. I have a Raptor Lake CPU. The OS reports that it has 32 CPUs. I know
it has 8 efficiency cores. Right now llamafile chooses 16 as the default,
but on this CPU it actually goes faster if I pass -t 8 to limit it to
just the performance cores. However at the same time, it'll go even faster
than that if I say -t 31 to use all the CPUs. In order to make a default
choice for something like that, we'd need some kind of nonobvious approach.
There's no support yet for Intel GPUs. That might happen in the future if
it's possible to support them through Vulkan, which is being worked on
upstream. For now I think you should assume that you need either an NVIDIA
or AMD GPU to get GPU perf.
Thanks for using llamafile. I hope this information helps!
—
Reply to this email directly, view it on GitHub
<#255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB7MPQVMP2TGH5PSAMHUE7LYTJR2RAVCNFSM6AAAAABDDSIELWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZGM3DCMBSGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
GGML usually needs to be rewritten from scratch for each computing platform. If someone ends up contributing that to llama.cpp then we might be able to support it. But we'd need to possess the hardware in order to do the development work that would require. Right now the only computing platforms we support are x86, aarch64, nvidia cuda, amd hip, and apple metal. That represents the lion's share of the computing community I think. |
It just has gone open source:
https://www.tomshardware.com/pc-components/cpus/intels-npu-acceleration-library-goes-open-source
The results of npu Vs GPU are huge.
The npu is much faster then using tensor cores of a gpu
…On Fri, 23 Feb 2024, 21:29 Justine Tunney, ***@***.***> wrote:
GGML usually needs to be rewritten from scratch for each computing
platform. If someone ends up contributing that to llama.cpp then we might
be able to support it. But we'd need to possess the hardware in order to do
the development work that would require. Right now the only computing
platforms we support are x86, aarch64, nvidia cuda, amd hip, and apple
metal. That represents the lion's share of the computing community I think.
—
Reply to this email directly, view it on GitHub
<#255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB7MPQVSAWYBDSXUEPSQQ43YVD3ZPAVCNFSM6AAAAABDDSIELWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRHE2DSMRTGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi,
I have a new meteor lake laptop (without nvidia gpu) and llama is responding really slow.
I think the process is forked on the efficieny cores for some reason instead of the performance cores.
I noticed in the readme file that there are options to enable nvidia or amd gpu's, however I don't see how I can enabled the NPU usage or the ARC igpu from microsoft.
Can someone point me out the documentation?
thank you
The text was updated successfully, but these errors were encountered: