-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using CUDA when both CPU and Cuda12 back-ends are present. #456
Comments
It seems that the CPU back-end and the Cuda back-ends can't be installed at the same time. If this is by design, the issue can be closed, but since I don't know if this is by design, I'll leave the issue open for others to comment. |
Originally they weren't meant to be installed together, since it then wasn't clear which binaries should be used. However we now have the runtime detection which should probe your system and load the best binaries possible. In your case that looks like it is working, since the logs say:
But then for some reason that isn't actually using your GPU! I think this probably is a real bug |
Hi @vvdb-architecture , if it is not using your GPU even with the Cuda backend do you have your GpuLayerCount in your ModelParams set to -1, or 1-33? If it is not set or set to 0 it will default to cpu-only, even with just the Cuda backend installed. Sorry if I misunderstand your problem, but this may help other users if they have that issue: ![]() |
It's set to 33. |
I think this issue can be closed, since the docs explicitly states you can only install one of the back-ends. |
@vvdb-architecture Sorry for seeing this issue late. It should be my duty to resolve this problem because I wrote the main part of dynamic loading of native library. #588 is also a duplication of this issue.
Yes, but the document has been outdated for a long time. The document still stays at v0.5.0 now, while we are already proceeding to v0.11.0. In document I declared this state because dynamic loading is not supported in v0.5.0. LLamaSharp is expected to work with multiple backend packages in current version, so I'll re-open this issue and dig on it. Thank you for your reminder in #589! |
Hello, is there any news on this issue? I encounter a similar issue. I have installed both
When I load a model on the CPU with
When I only install
|
That's how it's meant to work - if the CUDA binaries are available and compatible with your system they will be used unless you explicitly disable CUDA at load time with Changing |
I'm wondering if this bug is still manifesting. Using latest 0.18.0 with CPU and Cuda12 backends installed, it defaults to CPU no matter what settings I specify. Here is what I am trying:
The only way I can get the Cuda12 backend to work is by removing CPU, which is contrary to the documentation ("Please note that before LLamaSharp v0.10.0, only one backend package should be installed at a time."). |
What does the log call back show for you (see this comment just above for how to add the callback)? |
Certainly! Output I tested with NativeLibraryConfig.All.WithCuda(true) and just default settings -- the same settings which work if only the Cuda12 backend is installed. |
There's something a bit odd going on here, your log shows that it tried to load things in this order:
So it tried to load a GPU backend and then tried to load a CPU backend when it couldn't do that, which is what we'd expect except that it was trying to load vulkan! I'm at a bit of a loss what's going on here. @m0nsky any ideas? |
Here's the associated code which manifests the bugged behavior.
|
Thank you for your response @martindevans! I have a follow up question when I set Is this considered normal when |
I'm not sure about that, ideally there probably shouldn't be any memory used, but I can easily imagine some resources getting created even when they're not technically needed. I'd recommend asking it upstream in the llama.cpp repo, they'll know more about the details. |
I'm using Kernel-memory with LLamaSharp. Despite having a RTX 3080 and the latest CUDA drivers installed, CUDA is not used.
Not sure if this is a bug or I'm missing something, so here's a question instead:
The LlamaSharp.csproj contains
I found out that if both Cpu and Cuda12 back-ends are referenced, only the CPU is being used even if the CUDA DLL is loaded.
Interestingly, the logs do say that the CUDA back-end is loaded, but no Cuda is used.
If I remove the reference to LLamaSharp.Backend.Cpu, then the CUDA back-end will start to be used. The logs show:
I've reported this to the kernel memory project, but was advised to report this here.
The text was updated successfully, but these errors were encountered: