Skip to content

Install with CLBlast requires CLBlast libraries #306

@fgdfgfthgr-fox

Description

@fgdfgfthgr-fox

Expected Behavior

With command "CMAKE_ARGS="-DLLAMA_CLBLAST=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir", I would expect a llama.cpp build with CLBlast to be installed. Which when a model loaded, I would see BLAS = 1 and my graphic card get used during inference.

Current Behavior

Llama.cpp shows BLAS = 0, and my GPU doesn't move at all during inference.

Environment and Context

Linux Mint 21.1 Cinnamon (which is basically Ubuntu)
My GPU: Radeon VII
$ lscpu

架构:                   x86_64
  CPU 运行模式:         32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  字节序:               Little Endian
CPU:                     12
  在线 CPU 列表:        0-11
厂商 ID:                AuthenticAMD
  型号名称:             AMD Ryzen 5 5500
    CPU 系列:           25
    型号:               80
    每个核的线程数:     2
    每个座的核数:       6
    座:                 1
    步进:               0
    Frequency boost:     enabled
    CPU 最大 MHz:       3600.0000
    CPU 最小 MHz:       1400.0000
    BogoMIPS:           7186.38
    标记:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
                         x mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
                         od nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl p
                         ni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2api
                         c movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_le
                         gacy svm extapic cr8_legacy abm sse4a misalignsse 3dnow
                         prefetch osvw ibs skinit wdt tce topoext perfctr_core p
                         erfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw
                         _pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 
                         avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap c
                         lflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cq
                         m_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero 
                         irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm
                         _lock nrip_save tsc_scale vmcb_clean flushbyasid decode
                         assists pausefilter pfthreshold avic v_vmsave_vmload vg
                         if v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid ove
                         rflow_recov succor smca fsrm
Virtualization features: 
  虚拟化:               AMD-V
Caches (sum of all):     
  L1d:                   192 KiB (6 instances)
  L1i:                   192 KiB (6 instances)
  L2:                    3 MiB (6 instances)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA 节点:            1
  NUMA 节点0 CPU:       0-11
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
                          and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIB
                         P always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

$ uname -a
Linux fgdfgfthgr-MS-7C95 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ python3 --version
Python 3.10.11
$ make --version
GNU Make 4.3
为 x86_64-pc-linux-gnu 编译
Copyright (C) 1988-2020 Free Software Foundation, Inc.
许可证:GPLv3+:GNU 通用公共许可证第 3 版或更新版本<http://gnu.org/licenses/gpl.html>。
本软件是自由软件:您可以自由修改和重新发布它。
在法律允许的范围内没有其他保证。
$ g++ --version
g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

'$ pip list'

Package             Version
------------------- ----------------
accelerate          0.20.0.dev0
aiofiles            23.1.0
aiohttp             3.8.4
aiosignal           1.3.1
altair              5.0.1
anyio               3.7.0
async-timeout       4.0.2
attrs               23.1.0
bitsandbytes        0.37.2
certifi             2022.12.7
charset-normalizer  2.1.1
click               8.1.3
cmake               3.25.0
colorama            0.4.6
contourpy           1.0.7
cycler              0.11.0
datasets            2.12.0
dill                0.3.6
einops              0.6.1
exceptiongroup      1.1.1
fastapi             0.95.2
ffmpy               0.3.0
filelock            3.9.0
flexgen             0.1.7
fonttools           4.39.4
frozenlist          1.3.3
fsspec              2023.5.0
gradio              3.31.0
gradio_client       0.2.5
h11                 0.14.0
httpcore            0.17.2
httpx               0.24.1
huggingface-hub     0.14.1
idna                3.4
Jinja2              3.1.2
jsonschema          4.17.3
kiwisolver          1.4.4
linkify-it-py       2.0.2
lit                 15.0.7
llama-cpp-python    0.1.57
Markdown            3.4.3
markdown-it-py      2.2.0
MarkupSafe          2.1.2
matplotlib          3.7.1
mdit-py-plugins     0.3.3
mdurl               0.1.2
mpmath              1.2.1
multidict           6.0.4
multiprocess        0.70.14
networkx            3.0
numpy               1.24.3
orjson              3.8.14
packaging           23.1
pandas              2.0.2
peft                0.4.0.dev0
Pillow              9.5.0
pip                 23.0.1
psutil              5.9.5
PuLP                2.7.0
pyarrow             12.0.0
pydantic            1.10.8
pydub               0.25.1
Pygments            2.15.1
pyparsing           3.0.9
pyrsistent          0.19.3
python-dateutil     2.8.2
python-multipart    0.0.6
pytorch-triton-rocm 2.0.1
pytz                2023.3
PyYAML              6.0
quant-cuda          0.0.0
regex               2023.5.5
requests            2.28.1
responses           0.18.0
safetensors         0.3.1
scipy               1.10.1
semantic-version    2.10.0
sentencepiece       0.1.99
setuptools          67.8.0
six                 1.16.0
sniffio             1.3.0
starlette           0.27.0
sympy               1.11.1
tokenizers          0.13.3
toolz               0.12.0
torch               2.0.1+rocm5.4.2
torchaudio          2.0.2+rocm5.4.2
torchvision         0.15.2+rocm5.4.2
tqdm                4.65.0
transformers        4.30.0.dev0
typing_extensions   4.6.2
tzdata              2023.3
uc-micro-py         1.0.2
urllib3             1.26.13
uvicorn             0.22.0
websockets          11.0.3
wheel               0.38.4
xxhash              3.2.0
yarl                1.9.2

Failure Logs

$ CMAKE_ARGS="-DLLAMA_CLBLAST=on" FORCE_CMAKE=1 pip install llama-cpp-python  --force-reinstall --upgrade --no-cache-dir
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.1.57.tar.gz (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 6.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0
  Downloading typing_extensions-4.6.2-py3-none-any.whl (31 kB)
Collecting numpy>=1.20.0
  Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 8.5 MB/s eta 0:00:00
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.1.57-cp310-cp310-linux_x86_64.whl size=212476 sha256=9c83c78ef597f128db8bfc5ad6ceac144a5eb63a122952e021c47ca3bbb242dc
  Stored in directory: /tmp/pip-ephem-wheel-cache-6ct6wx1l/wheels/87/9b/55/22559358a9af8074053b6c39406f5c06f0d391c052f77ca0c2
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, llama-cpp-python
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.4.0
    Uninstalling typing_extensions-4.4.0:
      Successfully uninstalled typing_extensions-4.4.0
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.1
    Uninstalling numpy-1.24.1:
      Successfully uninstalled numpy-1.24.1
Successfully installed llama-cpp-python-0.1.57 numpy-1.24.3 typing-extensions-4.6.2

INFO:Loading llamacpp_30b...
INFO:llama.cpp weights detected: models/llamacpp_30b/ggml-model-q5_0.bin

INFO:Cache capacity is 0 bytes
llama.cpp: loading model from models/llamacpp_30b/ggml-model-q5_0.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 23634.31 MB (+ 3124.00 MB per state)
.
llama_init_from_file: kv self size  = 3120.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
INFO:Loaded the model in 4.37 seconds.


llama_print_timings:        load time =  1245.53 ms
llama_print_timings:      sample time =    43.12 ms /    19 runs   (    2.27 ms per token)
llama_print_timings: prompt eval time =  1245.47 ms /     3 tokens (  415.16 ms per token)
llama_print_timings:        eval time = 11416.53 ms /    18 runs   (  634.25 ms per token)
llama_print_timings:       total time = 12735.25 ms
Output generated in 13.04 seconds (1.46 tokens/s, 19 tokens, context 3, seed 1337143409)

Metadata

Metadata

Assignees

No one assigned

    Labels

    builddocumentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions