Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to set the number of layers to offload to GPU #51

Closed
rahulvk007 opened this issue Dec 17, 2023 · 5 comments
Closed

Ability to set the number of layers to offload to GPU #51

rahulvk007 opened this issue Dec 17, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@rahulvk007
Copy link

Since this is using llama.cpp in the backend, is there any way to customise the number of layers to offload to GPU ?

Because right now I am using localGPT and I can get great performance by offloading 17/35 layers to GPU without any crashes caused by CUDA out of memory.

But in this I can see that it is offloading 21 layers automatically and that causes it to crash due to cuda out of memory and fall back to cpu resulting in extremely slow performance.

Nevertheless this is a great product with a very friendly ui.

@JayNakrani
Copy link
Contributor

TIL that we can do that in llama.cpp haha. So off the top of my head, I don't know how to do this. But let's look into it together.

Can you point me to some docs in llama.cpp that explains how to do it in llama.cpp? I also encourage you to dig through the code and docs of Ollama if you're up for it (and then show us lol).

@JayNakrani
Copy link
Contributor

Oh wait, I just saw the -ngl parameter in llama.cpp

@JayNakrani
Copy link
Contributor

Ollama allows us to customize this through num_gpu option at API request level and also at Modelfile level:

We would need to pass this value to Ollama when making API requests.

@JayNakrani JayNakrani added the enhancement New feature or request label Dec 17, 2023
@JayNakrani
Copy link
Contributor

Probably the best way to do this would be to let the system admin specify these as the MODEL_PROVIDER_CONFIGS env var.

Documenting the steps in case you (or someone else) want to contribute this feature:

  1. Add the Ollama options in ModelProviderConfig so that they can be specified via MODEL_PROVIDER_CONFIGS environment variable.
  2. Send it with each Ollama request (through ModelProviderService)

JayNakrani added a commit that referenced this issue Dec 20, 2023
@JayNakrani
Copy link
Contributor

@rahulvk007 With the latest release (v0.0.3), you can now set the number of layers to offload: https://github.com/SecureAI-Tools/SecureAI-Tools#customize-llm-provider-specific-options

Please try it out and let me know if you run into any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants