Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Illegal instruction" when trying to run the server using a precompiled docker image #272

Closed
vmajor opened this issue May 25, 2023 · 22 comments
Labels
build hardware Hardware specific issue

Comments

@vmajor
Copy link

vmajor commented May 25, 2023

Expected Behavior

I am trying to execute this:

docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest

and I expect the model to load and server to start. I am using the model quantized by The Bloke according to the current latest specs of llama.ccp ggml implementation

Current Behavior

llama.cpp: loading model from /models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin
Illegal instruction

Environment and Context

Linux DESKTOP-xxx 5.15.68.1-microsoft-standard-WSL2+ #2 SMP

$ python3 3.10.9
$ make GNU Make 4.3
$ g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest

@gjmulder
Copy link
Contributor

MLP models aren't supported by llama.cpp AFAIK

@vmajor
Copy link
Author

vmajor commented May 25, 2023

Wow fast reply. I am using alpaca-lora-65B.ggml.q5_1.bin with llama-cpp-python directly inside a python app and it works well. I tried to load this model using the docker server path and it gave me the same error: "Illegal instruction"

So the incompatibility is not with llama.cpp

EDIT:

llama.cpp: loading model from /models/alpaca-lora-65B-GGML/alpaca-lora-65B.ggml.q5_1.bin
Illegal instruction

Standalone:

models/alpaca-lora-65B-GGML/alpaca-lora-65B.ggml.q5_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_layer    = 80
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size = 146.86 KB
llama_model_load_internal: mem required  = 50284.17 MB (+ 5120.00 MB per state)
llama_init_from_file: kv self size  = 5120.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

@vmajor vmajor changed the title Illegal instruction with OpenBLAS build "Illegal instruction" when trying to run the server using docker May 25, 2023
@vmajor
Copy link
Author

vmajor commented May 25, 2023

I just removed the OpenBLAS version and reinstalled the vanilla version and the error persists.

@vmajor
Copy link
Author

vmajor commented May 25, 2023

The standalone server is running well, so the error has something to do with either the docker image or what it is trying to do and what my environment (WSL2) is letting it do:

gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_layer    = 80
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size =    0.18 MB
llama_model_load_internal: mem required  = 50284.21 MB (+ 5120.00 MB per state)
warning: failed to mlock 196608000-byte buffer (after previously locking 0 bytes): Cannot allocate memory
Try increasing RLIMIT_MLOCK ('ulimit -l' as root).
....................................................................................................
llama_init_from_file: kv self size  = 5120.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
INFO:     Started server process [2244388]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

@gjmulder
Copy link
Contributor

On further thought "illegal instruction" could be due to the fact that the Docker image was compiled on another computer that supports vector instructions not supported on your current hardware. Did you build the Docker image locally?

@vmajor
Copy link
Author

vmajor commented May 25, 2023

Hm that is entirely possible. I did not. Docker just pulled it as it was not available on my system. Is there a Dockerfile somewhere?

@gjmulder gjmulder added build hardware Hardware specific issue labels May 25, 2023
@gjmulder
Copy link
Contributor

gjmulder commented May 25, 2023

Entirely likely then.

There's sample Dockerfiles in the root dir, or I just created a pull request #270 that needs some smoke testing, if you have the time 😉

EDIT: Getting the CUDA Docker builds to run on Windows would be highly valuable to the community as a lot of people are struggling with CUDA, especially on Windows.

@gjmulder gjmulder changed the title "Illegal instruction" when trying to run the server using docker "Illegal instruction" when trying to run the server using a precompiled docker image May 25, 2023
@real-limitless
Copy link

Im getting this issues as well.
I have manually built 'llama-cpp-python' and it fails similarly Illegal instruction (core dumped) as soon as you load from llama_cpp import Llama

Has anyone else have this issue to ?
Im on Almalinux 9 with python 3.11

@gjmulder
Copy link
Contributor

I have manually built 'llama-cpp-python'

On the same hardware without any virtualisation?

@real-limitless
Copy link

real-limitless commented May 28, 2023

I have manually built 'llama-cpp-python'

On the same hardware without any virtualisation?

Yes, Im using virtualization, Im using a Dell R820 with Proxmox 7.
I have a VM running Almalinux 9

What the point is, Im having a Illegal instruction (core dumped) errors everytime im trying to use "
llama-cpp-python" However, If I used llama-cpp directly I have no issues. Its only when I use this python binding with cuBLAS being built. CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir or even manually building it from source.

I have no idea what logs to even collect, I checked journalctl and found nothing of intrest other than

May 27 15:29:46 localhost.localdomain kernel: traps: python3.9[84283] trap invalid opcode ip:7f3cec006c2a sp:7ffdbd2a9920 error:0 in libllama.so[7f3cebff2000+45000]
May 27 15:29:46 localhost.localdomain systemd[1]: Started Process Core Dump (PID 84284/UID 0).
May 27 15:29:47 localhost.localdomain systemd-coredump[84285]: Resource limits disable core dumping for process 84283 (python3.9).
May 27 15:29:47 localhost.localdomain systemd-coredump[84285]: [🡕] Process 84283 (python3.9) of user 1000 dumped core.
May 27 15:29:47 localhost.localdomain systemd[1]: systemd-coredump@13-84284-0.service: Deactivated successfully.
(.venv) [ai00@localhost ai]$ python -W all -d -v -X dev
import _frozen_importlib # frozen
import _imp # builtin
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import '_io' # <class '_frozen_importlib.BuiltinImporter'>
import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
# installing zipimport hook
import 'time' # <class '_frozen_importlib.BuiltinImporter'>
import 'zipimport' # <class '_frozen_importlib.FrozenImporter'>
# installed zipimport hook
import 'faulthandler' # <class '_frozen_importlib.BuiltinImporter'>
# /usr/lib64/python3.11/encodings/__pycache__/__init__.cpython-311.pyc matches /usr/lib64/python3.11/encodings/__init__.py
# code object from '/usr/lib64/python3.11/encodings/__pycache__/__init__.cpython-311.pyc'
import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
import 'codecs' # <class '_frozen_importlib.FrozenImporter'>
# /usr/lib64/python3.11/encodings/__pycache__/aliases.cpython-311.pyc matches /usr/lib64/python3.11/encodings/aliases.py
# code object from '/usr/lib64/python3.11/encodings/__pycache__/aliases.cpython-311.pyc'
import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60d10060>
import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e6e7eea90>
# /usr/lib64/python3.11/encodings/__pycache__/utf_8.cpython-311.pyc matches /usr/lib64/python3.11/encodings/utf_8.py
# code object from '/usr/lib64/python3.11/encodings/__pycache__/utf_8.cpython-311.pyc'
import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e6e7eef40>
import '_signal' # <class '_frozen_importlib.BuiltinImporter'>
import '_abc' # <class '_frozen_importlib.BuiltinImporter'>
import 'abc' # <class '_frozen_importlib.FrozenImporter'>
import 'io' # <class '_frozen_importlib.FrozenImporter'>
# /usr/lib64/python3.11/__pycache__/warnings.cpython-311.pyc matches /usr/lib64/python3.11/warnings.py
# code object from '/usr/lib64/python3.11/__pycache__/warnings.cpython-311.pyc'
import 'warnings' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60d01dc0>
import '_stat' # <class '_frozen_importlib.BuiltinImporter'>
import 'stat' # <class '_frozen_importlib.FrozenImporter'>
import '_collections_abc' # <class '_frozen_importlib.FrozenImporter'>
import 'genericpath' # <class '_frozen_importlib.FrozenImporter'>
import 'posixpath' # <class '_frozen_importlib.FrozenImporter'>
import 'os' # <class '_frozen_importlib.FrozenImporter'>
import '_sitebuiltins' # <class '_frozen_importlib.FrozenImporter'>
Processing global site-packages
Adding directory: '/home/ai00/ai/.venv/lib64/python3.11/site-packages'
Adding directory: '/home/ai00/ai/.venv/lib/python3.11/site-packages'
Processing user site-packages
Processing global site-packages
Adding directory: '/home/ai00/ai/.venv/lib64/python3.11/site-packages'
Adding directory: '/home/ai00/ai/.venv/lib/python3.11/site-packages'
import 'site' # <class '_frozen_importlib.FrozenImporter'>
# /usr/lib64/python3.11/encodings/__pycache__/ascii.cpython-311.pyc matches /usr/lib64/python3.11/encodings/ascii.py
# code object from '/usr/lib64/python3.11/encodings/__pycache__/ascii.cpython-311.pyc'
import 'encodings.ascii' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60d3aea0>
# extension module 'readline' loaded from '/usr/lib64/python3.11/lib-dynload/readline.cpython-311-x86_64-linux-gnu.so'
# extension module 'readline' executed from '/usr/lib64/python3.11/lib-dynload/readline.cpython-311-x86_64-linux-gnu.so'
import 'readline' # <_frozen_importlib_external.ExtensionFileLoader object at 0x7f8e60d3a540>
# /usr/lib64/python3.11/__pycache__/rlcompleter.cpython-311.pyc matches /usr/lib64/python3.11/rlcompleter.py
# code object from '/usr/lib64/python3.11/__pycache__/rlcompleter.cpython-311.pyc'
import 'atexit' # <class '_frozen_importlib.BuiltinImporter'>
# /usr/lib64/python3.11/__pycache__/inspect.cpython-311.pyc matches /usr/lib64/python3.11/inspect.py
# code object from '/usr/lib64/python3.11/__pycache__/inspect.cpython-311.pyc'
# /usr/lib64/python3.11/__pycache__/ast.cpython-311.pyc matches /usr/lib64/python3.11/ast.py
# code object from '/usr/lib64/python3.11/__pycache__/ast.cpython-311.pyc'
import '_ast' # <class '_frozen_importlib.BuiltinImporter'>
# /usr/lib64/python3.11/__pycache__/contextlib.cpython-311.pyc matches /usr/lib64/python3.11/contextlib.py
# code object from '/usr/lib64/python3.11/__pycache__/contextlib.cpython-311.pyc'
# /usr/lib64/python3.11/collections/__pycache__/__init__.cpython-311.pyc matches /usr/lib64/python3.11/collections/__init__.py
# code object from '/usr/lib64/python3.11/collections/__pycache__/__init__.cpython-311.pyc'
import 'itertools' # <class '_frozen_importlib.BuiltinImporter'>
# /usr/lib64/python3.11/__pycache__/keyword.cpython-311.pyc matches /usr/lib64/python3.11/keyword.py
# code object from '/usr/lib64/python3.11/__pycache__/keyword.cpython-311.pyc'
import 'keyword' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60dfe9a0>
# /usr/lib64/python3.11/__pycache__/operator.cpython-311.pyc matches /usr/lib64/python3.11/operator.py
# code object from '/usr/lib64/python3.11/__pycache__/operator.cpython-311.pyc'
import '_operator' # <class '_frozen_importlib.BuiltinImporter'>
import 'operator' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60dfef90>
# /usr/lib64/python3.11/__pycache__/reprlib.cpython-311.pyc matches /usr/lib64/python3.11/reprlib.py
# code object from '/usr/lib64/python3.11/__pycache__/reprlib.cpython-311.pyc'
import 'reprlib' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c20100>
import '_collections' # <class '_frozen_importlib.BuiltinImporter'>
import 'collections' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60de4560>
# /usr/lib64/python3.11/__pycache__/functools.cpython-311.pyc matches /usr/lib64/python3.11/functools.py
# code object from '/usr/lib64/python3.11/__pycache__/functools.cpython-311.pyc'
# /usr/lib64/python3.11/__pycache__/types.cpython-311.pyc matches /usr/lib64/python3.11/types.py
# code object from '/usr/lib64/python3.11/__pycache__/types.cpython-311.pyc'
import 'types' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c3d910>
import '_functools' # <class '_frozen_importlib.BuiltinImporter'>
import 'functools' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c3c510>
import 'contextlib' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60dc6450>
# /usr/lib64/python3.11/__pycache__/enum.cpython-311.pyc matches /usr/lib64/python3.11/enum.py
# code object from '/usr/lib64/python3.11/__pycache__/enum.cpython-311.pyc'
import 'enum' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c3d0a0>
import 'ast' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60d765e0>
# /usr/lib64/python3.11/__pycache__/dis.cpython-311.pyc matches /usr/lib64/python3.11/dis.py
# code object from '/usr/lib64/python3.11/__pycache__/dis.cpython-311.pyc'
# /usr/lib64/python3.11/__pycache__/opcode.cpython-311.pyc matches /usr/lib64/python3.11/opcode.py
# code object from '/usr/lib64/python3.11/__pycache__/opcode.cpython-311.pyc'
# extension module '_opcode' loaded from '/usr/lib64/python3.11/lib-dynload/_opcode.cpython-311-x86_64-linux-gnu.so'
# extension module '_opcode' executed from '/usr/lib64/python3.11/lib-dynload/_opcode.cpython-311-x86_64-linux-gnu.so'
import '_opcode' # <_frozen_importlib_external.ExtensionFileLoader object at 0x7f8e60c947e0>
import 'opcode' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c7e9f0>
import 'dis' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c7c740>
# /usr/lib64/python3.11/collections/__pycache__/abc.cpython-311.pyc matches /usr/lib64/python3.11/collections/abc.py
# code object from '/usr/lib64/python3.11/collections/__pycache__/abc.cpython-311.pyc'
import 'collections.abc' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c96c70>
# /usr/lib64/python3.11/importlib/__pycache__/__init__.cpython-311.pyc matches /usr/lib64/python3.11/importlib/__init__.py
# code object from '/usr/lib64/python3.11/importlib/__pycache__/__init__.cpython-311.pyc'
import 'importlib' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c968b0>
import 'importlib.machinery' # <class '_frozen_importlib.FrozenImporter'>
# /usr/lib64/python3.11/__pycache__/linecache.cpython-311.pyc matches /usr/lib64/python3.11/linecache.py
# code object from '/usr/lib64/python3.11/__pycache__/linecache.cpython-311.pyc'
# /usr/lib64/python3.11/__pycache__/tokenize.cpython-311.pyc matches /usr/lib64/python3.11/tokenize.py
# code object from '/usr/lib64/python3.11/__pycache__/tokenize.cpython-311.pyc'
# /usr/lib64/python3.11/re/__pycache__/__init__.cpython-311.pyc matches /usr/lib64/python3.11/re/__init__.py
# code object from '/usr/lib64/python3.11/re/__pycache__/__init__.cpython-311.pyc'
# /usr/lib64/python3.11/re/__pycache__/_compiler.cpython-311.pyc matches /usr/lib64/python3.11/re/_compiler.py
# code object from '/usr/lib64/python3.11/re/__pycache__/_compiler.cpython-311.pyc'
import '_sre' # <class '_frozen_importlib.BuiltinImporter'>
# /usr/lib64/python3.11/re/__pycache__/_parser.cpython-311.pyc matches /usr/lib64/python3.11/re/_parser.py
# code object from '/usr/lib64/python3.11/re/__pycache__/_parser.cpython-311.pyc'
# /usr/lib64/python3.11/re/__pycache__/_constants.cpython-311.pyc matches /usr/lib64/python3.11/re/_constants.py
# code object from '/usr/lib64/python3.11/re/__pycache__/_constants.cpython-311.pyc'
import 're._constants' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60cd1fa0>
import 're._parser' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60cbfcb0>
# /usr/lib64/python3.11/re/__pycache__/_casefix.cpython-311.pyc matches /usr/lib64/python3.11/re/_casefix.py
# code object from '/usr/lib64/python3.11/re/__pycache__/_casefix.cpython-311.pyc'
import 're._casefix' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60cd45b0>
import 're._compiler' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60cbd6e0>
# /usr/lib64/python3.11/__pycache__/copyreg.cpython-311.pyc matches /usr/lib64/python3.11/copyreg.py
# code object from '/usr/lib64/python3.11/__pycache__/copyreg.cpython-311.pyc'
import 'copyreg' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60cd6720>
import 're' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c9bd50>
# /usr/lib64/python3.11/__pycache__/token.cpython-311.pyc matches /usr/lib64/python3.11/token.py
# code object from '/usr/lib64/python3.11/__pycache__/token.cpython-311.pyc'
import 'token' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60cd6a90>
# /usr/lib64/python3.11/encodings/__pycache__/latin_1.cpython-311.pyc matches /usr/lib64/python3.11/encodings/latin_1.py
# code object from '/usr/lib64/python3.11/encodings/__pycache__/latin_1.cpython-311.pyc'
import 'encodings.latin_1' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60cf8d80>
import 'tokenize' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c96ef0>
import 'linecache' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60c96590>
import 'inspect' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60d3bc60>
import 'rlcompleter' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60d3b1c0>
Python 3.11.2 (main, Feb 16 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_cpp import Llama
# /home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/__pycache__/__init__.cpython-311.pyc matches /home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/__init__.py
# code object from '/home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/__pycache__/__init__.cpython-311.pyc'
# /home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/__pycache__/llama_cpp.cpython-311.pyc matches /home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/llama_cpp.py
# code object from '/home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/__pycache__/llama_cpp.cpython-311.pyc'
# /usr/lib64/python3.11/ctypes/__pycache__/__init__.cpython-311.pyc matches /usr/lib64/python3.11/ctypes/__init__.py
# code object from '/usr/lib64/python3.11/ctypes/__pycache__/__init__.cpython-311.pyc'
# extension module '_ctypes' loaded from '/usr/lib64/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so'
# extension module '_ctypes' executed from '/usr/lib64/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so'
import '_ctypes' # <_frozen_importlib_external.ExtensionFileLoader object at 0x7f8e60b15f50>
# /usr/lib64/python3.11/__pycache__/struct.cpython-311.pyc matches /usr/lib64/python3.11/struct.py
# code object from '/usr/lib64/python3.11/__pycache__/struct.cpython-311.pyc'
# extension module '_struct' loaded from '/usr/lib64/python3.11/lib-dynload/_struct.cpython-311-x86_64-linux-gnu.so'
# extension module '_struct' executed from '/usr/lib64/python3.11/lib-dynload/_struct.cpython-311-x86_64-linux-gnu.so'
import '_struct' # <_frozen_importlib_external.ExtensionFileLoader object at 0x7f8e60b388d0>
import 'struct' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60b38470>
# /usr/lib64/python3.11/ctypes/__pycache__/_endian.cpython-311.pyc matches /usr/lib64/python3.11/ctypes/_endian.py
# code object from '/usr/lib64/python3.11/ctypes/__pycache__/_endian.cpython-311.pyc'
import 'ctypes._endian' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60b39b40>
import 'ctypes' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60b15190>
# /usr/lib64/python3.11/__pycache__/pathlib.cpython-311.pyc matches /usr/lib64/python3.11/pathlib.py
# code object from '/usr/lib64/python3.11/__pycache__/pathlib.cpython-311.pyc'
# /usr/lib64/python3.11/__pycache__/fnmatch.cpython-311.pyc matches /usr/lib64/python3.11/fnmatch.py
# code object from '/usr/lib64/python3.11/__pycache__/fnmatch.cpython-311.pyc'
import 'fnmatch' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60b5d320>
import 'ntpath' # <class '_frozen_importlib.FrozenImporter'>
import 'errno' # <class '_frozen_importlib.BuiltinImporter'>
# /usr/lib64/python3.11/urllib/__pycache__/__init__.cpython-311.pyc matches /usr/lib64/python3.11/urllib/__init__.py
# code object from '/usr/lib64/python3.11/urllib/__pycache__/__init__.cpython-311.pyc'
import 'urllib' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60b5fc60>
# /usr/lib64/python3.11/urllib/__pycache__/parse.cpython-311.pyc matches /usr/lib64/python3.11/urllib/parse.py
# code object from '/usr/lib64/python3.11/urllib/__pycache__/parse.cpython-311.pyc'
import 'urllib.parse' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60b5fee0>
import 'pathlib' # <_frozen_importlib_external.SourceFileLoader object at 0x7f8e60b177b0>
Fatal Python error: Illegal instruction

Current thread 0x00007f8e6ee0a740 (most recent call first):
  File "/home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/llama_cpp.py", line 237 in llama_init_backend
  File "/home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/llama_cpp.py", line 859 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "/home/ai00/ai/.venv/lib64/python3.11/site-packages/llama_cpp/__init__.py", line 1 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<stdin>", line 1 in <module>
Illegal instruction (core dumped)

Any Ideas?

@gjmulder
Copy link
Contributor

You're going to need to compare the AVX1 / AVX2 / AVX512* compilation options between the working build and the one throwing the illegal opcode.

If you're compiling in a VM, sometimes the virtualisation "lies" about what Intel extensions are available to the VM, when in fact they're only available on the host. Someone logged a ticket a while back where HyperV security settings was preventing the use of AVX512 by programs running in the VM.

@real-limitless
Copy link

Gjmulder, that's interesting, why would it fail within the python binding VS from the vendor source?

Do you know if when we build the python binding does it build the llama-cpp application with other cpu instructions that are hard-coded?

There has to be something different from
Installing normally VS installing from source in llama-cpp-python.

Mind you llama-cpp works fine with cuBLAS it just doesn't work for me in llama-cpp-python.

@gjmulder
Copy link
Contributor

Maybe try building llama-cpp-python from source with the --verbose option? It will then pull in a specific commit of llama.cpp.

Next compare the output of a test programming using that uses the source build llama-cpp-python package with the output of ./main. That ensures the same llama.cpp commit is being used.

There's significant amount of change occurring in llama.cpp to the point where new features are being introduced faster than I can keep track of them. Best to assume nothing at this point. 🤷‍♂️

@real-limitless
Copy link

real-limitless commented May 28, 2023

@gjmulder I tried building with verbose and there isnt significant log of value.
I tried building with or with out -DLLAMA_AVX2=off for fun both gave me the same result.
However, after doing some more digging I think this issue opened on llama-cpp might be our issue.

ggerganov/llama.cpp#1583

I suspect, that building "llama.cpp" directly works fine for me since I dident use cmake to build it. however since llama-cpp-python uses cmake and from what this issue says (ggerganov/llama.cpp#1583) its by default its building it with AVX2 built in, its more than likely causing my compiled library to have support for instructions that are not supported. in this case (AVX2)

I think I would be able to replicate this issue directly on llama.cpp if I built it from source using cmake.

@real-limitless
Copy link

real-limitless commented May 28, 2023

@gjmulder and @vmajor
Great news I was able to fix this issue.
It required me to modify /vendor/llama.cpp/CMakeLists.txt In the lines 56 and 70 I had to turn off LLAMA_AVX2 and turn on LLAMA_CUBLAS

Once that was done I was now able to build "llama-cpp-python" from source and install with pip natively and also import it in python no issues.

(.venv) [ai00@localhost llama-cpp-python]$ python -c "from llama_cpp import *; print(llama_print_system_info())" b'AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | '1

@vmajor I think if you were to do the same modification you should be able to build llama-cpp-python with the correct cpu instructions.

Eitherway, the issue posted on upstream should resolve this issue downstream on "llama-cpp-python" this is a workaround for the moment.

@vmajor
Copy link
Author

vmajor commented May 29, 2023

Thanks for the hard work. I can in fact run llama-cpp-python locally without any issue. It was just the docker image that would not run. I had not spent time trying to make a new docker image as I changed my workflow to use llama-cpp-python locally. There are many other toolchains that are more broken for me so I need to pick and choose what I focus on.

@gjmulder
Copy link
Contributor

ggerganov/llama.cpp#1583

I'm a Collaborator to llama.cpp so I labeled it with build + bug + high priority, given the llama-cpp-python dependency.

@gjmulder
Copy link
Contributor

CLosing please reopen if the problem is reproducible with the latest llama-cpp-python which includes an updated llama.cpp

@decadance-dance
Copy link

CLosing please reopen if the problem is reproducible with the latest llama-cpp-python which includes an updated llama.cpp

What update are you referring to?
I upgraded to a new version of llama-cpp-python and still keep getting this error.
Maybe there are some flags or presets that I have to do when I install llama-cpp-python?

@Mwni
Copy link

Mwni commented Aug 2, 2023

@chen369's solution works, but for some environments, you may also have to set LLAMA_FMA = off.

@kkaarrss
Copy link

kkaarrss commented Aug 5, 2023

I am running on old E5645 (Westmere) Xeons that do not support AVX at all. I also ran into "Illegal instruction". But I can confirm that the below command works for me:

CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

Not a problem with CMAKE_ARGS.

ggerganov/llama.cpp#1027

@BobCN2017
Copy link

I am running on old E5645 (Westmere) Xeons that do not support AVX at all. I also ran into "Illegal instruction". But I can confirm that the below command works for me:

CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

Not a problem with CMAKE_ARGS.

ggerganov/llama.cpp#1027

At first, I used this command to compile the wheel. However, after completion, when I tried to load the model, it returned None. I meticulously checked multiple times and noticed that its version had changed from 0.1.77 to 0.1.83. To address this, I specified the version by appending ==0.1.77 to the package name, recompiled, and reinstalled. After this adjustment, everything worked as expected. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build hardware Hardware specific issue
Projects
None yet
Development

No branches or pull requests

7 participants