Skip to content

Using GGUF models efficiently #473

Answered by N1h1lv5
N1h1lv5 asked this question in Q&A
Sep 13, 2023 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

This was the solution for me, I am running windows with a conda env :

  1. set the environment variable properly :

$Env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$Env:FORCE_CMAKE="1"

  1. check that it works

echo $Env:CMAKE_ARGS

  1. uninstall previous version of llama-cpp-python

pip uninstall llama-cpp-python

  1. install the proper version :

pip install llama-cpp-python==0.1.83 --no-cache-dir

Now GPU + CPU works toether. Thanks @PromtEngineer

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@nekiee13
Comment options

@HWiese1980
Comment options

Answer selected by N1h1lv5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants