-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuring Code-Llama to use an NVIDIA GPU on Windows #168
Comments
@jordanbtucker thanks first, i use CUDA_PATH=/usr/local/cuda FORCE_CMAKE=1 CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip3 install llama-cpp-python --force-reinstall --upgrade --no-cache-dir -vv ---> error: subprocess-exited-with-error × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip. (open-interpreter) chenxin@chenxin-Nitro-AN515-52: (open-interpreter) chenxin@chenxin-Nitro-AN515-52: |
Hi - thanks for all your work on this, very exciting! The link in your OP is 404 - I think it should go to here maybe (missing docs path)? https://github.com/KillianLucas/open-interpreter/blob/main/docs/GPU.md |
I installed CUDA and verified GPU support set up properly using all instructions from https://github.com/KillianLucas/open-interpreter/blob/main/docs/GPU.md. But CPU still pegged and barely using GPU and open-interpreter too slow to use. Am I still missing a step or is my Win laptop just too slow? GPU is RTX3050. Thank you. |
@raptor-bot Thanks for reporting. Please make sure you are on the latest version of open-interpreter by running the following.
Unfortunately, running a local model is going to be a degraded experience at this early stage. Ensuring you are on the latest version of open-interpreter will help to a degree, but unless you run a 30B model on an RTX 3090 or 4090, you will not likely be able to get anything close to GPT-3.5, at least for now. We are of course working on improving this. |
@raptor-bot It will likely be the same or slower based on your mac's GPU. |
@raptor-bot you are constrained by your GPUs memory. If the model is larger than the onboard memory (keeping at least 1gb of headroom for stability) it will be shifted to CPU and RAM. Your best bet is probably to use smaller, heavily quantized models. Of course this has direct negative implications for performance but that's the cost of being on the bleeding edge of tech. Financially probably better off subbing to OpenAi and utilizing the API if you are looking for performance. Otherwise you are looking at dropping a couple stacks on a new rig. Honestly with the pace of things the best bet for most is probably a combo of gpu rental (Llamba, vast etc) and OpenAi api. Gives the best performance with very little upfront buy-in cost. |
Open Interpreter can be used with local language models, however these can be rather taxing on your computer's resources. If you have an NVIDIA GPU, you may benefit from offloading some of the work to your GPU.
The set this up, follow the steps in Local Language Models with GPU Support.
If you still run into problems, leave a comment here or ask on the Discord server.
The text was updated successfully, but these errors were encountered: