-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Illegal instruction" when trying to run the server using a precompiled docker image #272
Comments
MLP models aren't supported by |
Wow fast reply. I am using alpaca-lora-65B.ggml.q5_1.bin with llama-cpp-python directly inside a python app and it works well. I tried to load this model using the docker server path and it gave me the same error: "Illegal instruction" So the incompatibility is not with llama.cpp EDIT:
Standalone:
|
I just removed the OpenBLAS version and reinstalled the vanilla version and the error persists. |
The standalone server is running well, so the error has something to do with either the docker image or what it is trying to do and what my environment (WSL2) is letting it do:
|
On further thought "illegal instruction" could be due to the fact that the Docker image was compiled on another computer that supports vector instructions not supported on your current hardware. Did you build the Docker image locally? |
Hm that is entirely possible. I did not. Docker just pulled it as it was not available on my system. Is there a Dockerfile somewhere? |
Entirely likely then. There's sample Dockerfiles in the root dir, or I just created a pull request #270 that needs some smoke testing, if you have the time 😉 EDIT: Getting the CUDA Docker builds to run on Windows would be highly valuable to the community as a lot of people are struggling with CUDA, especially on Windows. |
Im getting this issues as well. Has anyone else have this issue to ? |
On the same hardware without any virtualisation? |
Yes, Im using virtualization, Im using a Dell R820 with Proxmox 7. What the point is, Im having a I have no idea what logs to even collect, I checked
Any Ideas? |
You're going to need to compare the AVX1 / AVX2 / AVX512* compilation options between the working build and the one throwing the illegal opcode. If you're compiling in a VM, sometimes the virtualisation "lies" about what Intel extensions are available to the VM, when in fact they're only available on the host. Someone logged a ticket a while back where HyperV security settings was preventing the use of AVX512 by programs running in the VM. |
Gjmulder, that's interesting, why would it fail within the python binding VS from the vendor source? Do you know if when we build the python binding does it build the llama-cpp application with other cpu instructions that are hard-coded? There has to be something different from Mind you llama-cpp works fine with cuBLAS it just doesn't work for me in llama-cpp-python. |
Maybe try building Next compare the output of a test programming using that uses the source build There's significant amount of change occurring in |
@gjmulder I tried building with verbose and there isnt significant log of value. I suspect, that building "llama.cpp" directly works fine for me since I dident use I think I would be able to replicate this issue directly on |
@gjmulder and @vmajor Once that was done I was now able to build "llama-cpp-python" from source and install with pip natively and also import it in python no issues.
@vmajor I think if you were to do the same modification you should be able to build llama-cpp-python with the correct cpu instructions. Eitherway, the issue posted on upstream should resolve this issue downstream on "llama-cpp-python" this is a workaround for the moment. |
Thanks for the hard work. I can in fact run llama-cpp-python locally without any issue. It was just the docker image that would not run. I had not spent time trying to make a new docker image as I changed my workflow to use llama-cpp-python locally. There are many other toolchains that are more broken for me so I need to pick and choose what I focus on. |
I'm a Collaborator to |
CLosing please reopen if the problem is reproducible with the latest |
What update are you referring to? |
@chen369's solution works, but for some environments, you may also have to set |
I am running on old E5645 (Westmere) Xeons that do not support AVX at all. I also ran into "Illegal instruction". But I can confirm that the below command works for me: CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir Not a problem with CMAKE_ARGS. |
At first, I used this command to compile the wheel. However, after completion, when I tried to load the model, it returned None. I meticulously checked multiple times and noticed that its version had changed from 0.1.77 to 0.1.83. To address this, I specified the version by appending ==0.1.77 to the package name, recompiled, and reinstalled. After this adjustment, everything worked as expected. thanks. |
Expected Behavior
I am trying to execute this:
docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest
and I expect the model to load and server to start. I am using the model quantized by The Bloke according to the current latest specs of llama.ccp ggml implementation
Current Behavior
Environment and Context
Linux DESKTOP-xxx 5.15.68.1-microsoft-standard-WSL2+ #2 SMP
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest
The text was updated successfully, but these errors were encountered: