ERROR: .GGML_ASSERT: D:\a\llama-cpp-python-cuBLAS-wheels\llama-cpp-python-cuBLAS-wheels\vendor\llama.cpp\ggml-cuda.cu:5925: false

When loading the model I get the following error message: 

llm_load_tensors: ggml ctx size = 0.16 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 5734.11 MB
llm_load_tensors: offloading 20 repeating layers to GPU
llm_load_tensors: offloaded 20/43 layers to GPU
llm_load_tensors: VRAM used: 5266.66 MB
................................................GGML_ASSERT: D:\a-cpp-python-cuBLAS-wheels-wheels-cpp-python-cuBLAS-wheels.cpp-cuBLAS-wheels.cpp-cuBLAS-cuda.cu:5925: false

my llama-cpp version is: llama_cpp_python 0.2.11+cu117


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ERROR: .GGML_ASSERT: D:\a\llama-cpp-python-cuBLAS-wheels\llama-cpp-python-cuBLAS-wheels\vendor\llama.cpp\ggml-cuda.cu:5925: false #812

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ERROR: .GGML_ASSERT: D:\a\llama-cpp-python-cuBLAS-wheels\llama-cpp-python-cuBLAS-wheels\vendor\llama.cpp\ggml-cuda.cu:5925: false #812

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions