Slow performance on Ultra core 7 cpu (meteor lake) #255

sebastienbo · 2024-02-11T14:12:48Z

Hi,

I have a new meteor lake laptop (without nvidia gpu) and llama is responding really slow.
I think the process is forked on the efficieny cores for some reason instead of the performance cores.
I noticed in the readme file that there are options to enable nvidia or amd gpu's, however I don't see how I can enabled the NPU usage or the ARC igpu from microsoft.

Can someone point me out the documentation?

thank you

jart · 2024-02-12T19:06:03Z

We haven't implemented support yet for managing performance vs. efficiency cores. I have a Raptor Lake CPU. The OS reports that it has 32 CPUs. I know it has 8 efficiency cores. Right now llamafile chooses 16 as the default, but on this CPU it actually goes faster if I pass -t 8 to limit it to just the performance cores. However at the same time, it'll go even faster than that if I say -t 31 to use all the CPUs. In order to make a default choice for something like that, we'd need some kind of nonobvious approach.

There's no support yet for Intel GPUs. That might happen in the future if it's possible to support them through Vulkan, which is being worked on upstream. For now I think you should assume that you need either an NVIDIA or AMD GPU to get GPU perf.

Thanks for using llamafile. I hope this information helps!

sebastienbo · 2024-02-23T10:19:55Z

Thank you very much for your answer And what about NPU support? As you probably know, neural networks are far more performant then GPU's for LLM's And these NPU's are now part of the new Intel and AMD cpu's These will allow lower end laptops to process LLM's even without GPU. Have you considered supporting those? https://medium.com/openvino-toolkit/how-to-run-and-develop-your-ai-app-on-intel-npu-intel-ai-boost-76f3efade169 <https://mailtrack.io/l/959011ae7ff76d96ceb42757d3a66e508158d244?url=https%3A%2F%2Fmedium.com%2Fopenvino-toolkit%2Fhow-to-run-and-develop-your-ai-app-on-intel-npu-intel-ai-boost-76f3efade169&u=8875671&signature=5af92d09d129d281> Best regards, *Sébastien*

…

On Mon, 12 Feb 2024 at 20:06, Justine Tunney ***@***.***> wrote: We haven't implemented support yet for managing performance vs. efficiency cores. I have a Raptor Lake CPU. The OS reports that it has 32 CPUs. I know it has 8 efficiency cores. Right now llamafile chooses 16 as the default, but on this CPU it actually goes faster if I pass -t 8 to limit it to just the performance cores. However at the same time, it'll go even faster than that if I say -t 31 to use all the CPUs. In order to make a default choice for something like that, we'd need some kind of nonobvious approach. There's no support yet for Intel GPUs. That might happen in the future if it's possible to support them through Vulkan, which is being worked on upstream. For now I think you should assume that you need either an NVIDIA or AMD GPU to get GPU perf. Thanks for using llamafile. I hope this information helps! — Reply to this email directly, view it on GitHub <#255 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7MPQVMP2TGH5PSAMHUE7LYTJR2RAVCNFSM6AAAAABDDSIELWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZGM3DCMBSGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

jart · 2024-02-23T20:28:59Z

GGML usually needs to be rewritten from scratch for each computing platform. If someone ends up contributing that to llama.cpp then we might be able to support it. But we'd need to possess the hardware in order to do the development work that would require. Right now the only computing platforms we support are x86, aarch64, nvidia cuda, amd hip, and apple metal. That represents the lion's share of the computing community I think.

sebastienbo · 2024-03-03T08:21:53Z

It just has gone open source: https://www.tomshardware.com/pc-components/cpus/intels-npu-acceleration-library-goes-open-source The results of npu Vs GPU are huge. The npu is much faster then using tensor cores of a gpu

…

On Fri, 23 Feb 2024, 21:29 Justine Tunney, ***@***.***> wrote: GGML usually needs to be rewritten from scratch for each computing platform. If someone ends up contributing that to llama.cpp then we might be able to support it. But we'd need to possess the hardware in order to do the development work that would require. Right now the only computing platforms we support are x86, aarch64, nvidia cuda, amd hip, and apple metal. That represents the lion's share of the computing community I think. — Reply to this email directly, view it on GitHub <#255 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7MPQVSAWYBDSXUEPSQQ43YVD3ZPAVCNFSM6AAAAABDDSIELWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRHE2DSMRTGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

jart closed this as completed Feb 12, 2024

jart added the question label Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance on Ultra core 7 cpu (meteor lake) #255

Slow performance on Ultra core 7 cpu (meteor lake) #255

sebastienbo commented Feb 11, 2024 •

edited

jart commented Feb 12, 2024

sebastienbo commented Feb 23, 2024 via email

jart commented Feb 23, 2024

sebastienbo commented Mar 3, 2024 via email

Slow performance on Ultra core 7 cpu (meteor lake) #255

Slow performance on Ultra core 7 cpu (meteor lake) #255

Comments

sebastienbo commented Feb 11, 2024 • edited

jart commented Feb 12, 2024

sebastienbo commented Feb 23, 2024 via email

jart commented Feb 23, 2024

sebastienbo commented Mar 3, 2024 via email

sebastienbo commented Feb 11, 2024 •

edited