Ability to set the number of layers to offload to GPU #51

rahulvk007 · 2023-12-17T10:19:09Z

Since this is using llama.cpp in the backend, is there any way to customise the number of layers to offload to GPU ?

Because right now I am using localGPT and I can get great performance by offloading 17/35 layers to GPU without any crashes caused by CUDA out of memory.

But in this I can see that it is offloading 21 layers automatically and that causes it to crash due to cuda out of memory and fall back to cpu resulting in extremely slow performance.

Nevertheless this is a great product with a very friendly ui.

JayNakrani · 2023-12-17T14:43:29Z

TIL that we can do that in llama.cpp haha. So off the top of my head, I don't know how to do this. But let's look into it together.

Can you point me to some docs in llama.cpp that explains how to do it in llama.cpp? I also encourage you to dig through the code and docs of Ollama if you're up for it (and then show us lol).

JayNakrani · 2023-12-17T14:45:41Z

Oh wait, I just saw the -ngl parameter in llama.cpp

JayNakrani · 2023-12-17T14:51:45Z

Ollama allows us to customize this through num_gpu option at API request level and also at Modelfile level:

We would need to pass this value to Ollama when making API requests.

JayNakrani · 2023-12-17T15:36:28Z

Probably the best way to do this would be to let the system admin specify these as the MODEL_PROVIDER_CONFIGS env var.

Documenting the steps in case you (or someone else) want to contribute this feature:

Add the Ollama options in ModelProviderConfig so that they can be specified via MODEL_PROVIDER_CONFIGS environment variable.
Send it with each Ollama request (through ModelProviderService)

Relates to #51

JayNakrani · 2023-12-22T20:37:09Z

@rahulvk007 With the latest release (v0.0.3), you can now set the number of layers to offload: https://github.com/SecureAI-Tools/SecureAI-Tools#customize-llm-provider-specific-options

Please try it out and let me know if you run into any issues.

JayNakrani added the enhancement New feature or request label Dec 17, 2023

JayNakrani added a commit that referenced this issue Dec 20, 2023

Allow specifying model-provider specific options.

3c913e0

Relates to #51

JayNakrani closed this as completed Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to set the number of layers to offload to GPU #51

Ability to set the number of layers to offload to GPU #51

rahulvk007 commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 22, 2023

Ability to set the number of layers to offload to GPU #51

Ability to set the number of layers to offload to GPU #51

Comments

rahulvk007 commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 17, 2023

JayNakrani commented Dec 22, 2023