pip install llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on"  on Ubuntu20.04 with error

# Prerequisites

-   I am install  the version llama_cpp_python-0.2.12 with pip
-   

# Expected Behavior

    install llama_cpp with support CUDA

# Current Behavior

     Cannot install success

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

`$ lscpu`

      Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             20
On-line CPU(s) list:                0-19
Thread(s) per core:                 1
Core(s) per socket:                 14
Socket(s):                          1
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              183
Model name:                         13th Gen Intel(R) Core(TM) i5-13600K
Stepping:                           1
CPU MHz:                            3500.000
CPU max MHz:                        5100.0000
CPU min MHz:                        800.0000
BogoMIPS:                           6988.80
Virtualization:                     VT-x
L1d cache:                          336 KiB
L1i cache:                          224 KiB
L2 cache:                           14 MiB
NUMA node0 CPU(s):                  0-19
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb r
                                    dtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 mon
                                    itor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lah
                                    f_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust b
                                    mi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dt
                                    herm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir
                                    64b fsrm md_clear serialize pconfig arch_lbr flush_l1d arch_capabilities

`$ uname -a`
     Linux myUbuntu64 5.15.0-87-generic #97~20.04.1-Ubuntu SMP Thu Oct 5 08:25:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

* SDK version, e.g. for Linux:

```
$ python3 --version    
       Python 3.8.18
$ make --version
      GNU Make 4.2.1
      Built for x86_64-pc-linux-gnu
      
$ g++ --version
     g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
```

# Failure Information (for bugs)

    when execute  "CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python"
    with error:

    Collecting llama-cpp-python
  Using cached llama_cpp_python-0.2.12.tar.gz (7.6 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in ./.local/lib/python3.8/site-packages (from llama-cpp-python) (4.8.0)
Requirement already satisfied: numpy>=1.20.0 in ./anaconda3/envs/llamaIndex/lib/python3.8/site-packages (from llama-cpp-python) (1.24.4)
Requirement already satisfied: diskcache>=5.6.1 in ./anaconda3/envs/llamaIndex/lib/python3.8/site-packages (from llama-cpp-python) (5.6.3)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [81 lines of output]
      *** scikit-build-core 0.6.0 using CMake 3.27.7 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpn2rxt21u/build/CMakeInit.txt
      -- The C compiler identification is GNU 9.4.0
      -- The CXX compiler identification is GNU 9.4.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Could NOT find Git (missing: GIT_EXECUTABLE)
      CMake Warning at vendor/llama.cpp/scripts/build-info.cmake:16 (message):
        Git not found.  Build info will not be accurate.
      Call Stack (most recent call first):
        vendor/llama.cpp/CMakeLists.txt:108 (include)


      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
      -- Check if compiler accepts -pthread
      -- Check if compiler accepts -pthread - yes
      -- Found Threads: TRUE
      -- Found CUDAToolkit: /usr/include (found version "10.1.243")
      -- cuBLAS found
      -- The CUDA compiler identification is NVIDIA 10.1.243
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Using CUDA architectures: 52;61;70
      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      CMake Warning (dev) at CMakeLists.txt:18 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at CMakeLists.txt:27 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Configuring done (0.9s)
      -- Generating done (0.0s)
      -- Build files have been written to: /tmp/tmpn2rxt21u/build
      *** Building project with Ninja...
      Change Dir: '/tmp/tmpn2rxt21u/build'

      Run Build Command(s): /tmp/pip-build-env-kxwznvv9/normal/lib/python3.8/site-packages/ninja/data/bin/ninja -v
      [1/15] /usr/bin/nvcc  -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -Xcompiler -pthread -x cu -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o && /usr/bin/nvcc  -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -Xcompiler -pthread -x cu -M /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml-cuda.cu -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d
      FAILED: vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
      /usr/bin/nvcc  -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -Xcompiler -pthread -x cu -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o && /usr/bin/nvcc  -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -Xcompiler -pthread -x cu -M /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml-cuda.cu -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d
      nvcc fatal   : Unknown option 'Wmissing-declarations'
      [2/15] cd /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp && /tmp/pip-build-env-kxwznvv9/normal/lib/python3.8/site-packages/cmake/data/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=9.4.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/scripts/build-info.cmake
      -- Could NOT find Git (missing: GIT_EXECUTABLE)
      CMake Warning at scripts/build-info.cmake:16 (message):
        Git not found.  Build info will not be accurate.


      [3/15] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -pthread -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml-backend.c
      [4/15] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -pthread -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml-alloc.c
      [5/15] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/. -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/console.cpp
      [6/15] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/. -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/grammar-parser.cpp
      [7/15] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/. -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/sampling.cpp
      [8/15] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -pthread -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml-quants.c
      [9/15] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/. -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/train.cpp
      [10/15] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/. -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/common.cpp
      /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/common.cpp: In function ‘bool gpt_params_parse(int, char**, gpt_params&)’:
      /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/common/common.cpp:114:34: warning: format not a string literal and no format arguments [-Wformat-security]
        114 |         fprintf(stderr, ex.what());
            |                                  ^
      [11/15] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -pthread -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/ggml.c
      [12/15] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -pthread -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-48r6m3ny/llama-cpp-python_a4f4a9081c0c4c2a9407154bf7336c12/vendor/llama.cpp/llama.cpp
      ninja: build stopped: subcommand failed.


      *** CMake build failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

   
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pip install llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on" on Ubuntu20.04 with error #862

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pip install llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on" on Ubuntu20.04 with error #862

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions