llama cpp integration? #1

JoseConseco · 2023-10-01T18:04:20Z

Are there any changes for llama.cpp integration?
It is using quantized models - allowing running bigger models with less VRAM required ( https://github.com/ggerganov/llama.cpp#quantization) and you can put part of computation of CPU, part on GPU, if whole model cannot be loaded into GPU.
It supports server mode too- https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#testing-with-curl

JoseConseco · 2023-10-01T19:44:57Z

Ok, forget about it. From ollma docs I can see it is using llamacpp in background. And it supports manual addition of custom gguf models, with various params and etc.

JoseConseco closed this as completed Oct 1, 2023

raghur mentioned this issue Dec 21, 2023

Instructions to run with llama-cpp-python or llama.cpp's serve directly? #54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama cpp integration? #1

llama cpp integration? #1

JoseConseco commented Oct 1, 2023

JoseConseco commented Oct 1, 2023

llama cpp integration? #1

llama cpp integration? #1

Comments

JoseConseco commented Oct 1, 2023

JoseConseco commented Oct 1, 2023