Skip to content

Conversation

@pradhyumna85
Copy link
Contributor

PR for the issue: #597

Previously models produced garbage output when running on GPU with layers offloaded.

Previously models produced garbage output when running on GPU with layers offloaded.

Similar to related fix on another repo: bartowski1182/koboldcpp-docker@331326a
@asteinba
Copy link

Hey @pradhyumna85,

I think I just ran into this. What do you mean with garbage? Do you mean it still produces valid sentences but the answers doesn't make sense?

And can you also explain how the fix works/what was the reason :).

Thanks! Understanding this would help me a lot 😁

@pradhyumna85
Copy link
Contributor Author

Hi @asteinba,
By garbage I mean, absolute garbage - symbols, special characters, Unicode characters etc, no English or any language.

The most relevant change in the Dockerfile is the environment variable - CUDA_DOCKER_ARCH=all
It is mentioned in the readme of the official llama.cpp repo (https://github.com/ggerganov/llama.cpp) in the "Docker with CUDA section"

It basically passes the -Wno-deprecated-gpu-targets to the nvccflags in the llama.cpp makefile. (https://github.com/ggerganov/llama.cpp/blob/master/Makefile)

.
.
.
ifdef LLAMA_CUBLAS
	CFLAGS    += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
	CXXFLAGS  += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
	LDFLAGS   += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/x86_64-linux/lib
	OBJS      += ggml-cuda.o
	NVCCFLAGS = --forward-unknown-to-host-compiler -use_fast_math
ifdef LLAMA_CUDA_NVCC
	NVCC = $(LLAMA_CUDA_NVCC)
else
	NVCC = nvcc
endif #LLAMA_CUDA_NVCC
ifdef CUDA_DOCKER_ARCH
	NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
else
	NVCCFLAGS += -arch=native
endif # CUDA_DOCKER_ARCH
.
.
.

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python, I just copied from the readme (https://github.com/abetlen/llama-cpp-python) instruction for cublas installation.

@asteinba
Copy link

Thank you very much for the explanation! I really appreciate that :)

@abetlen
Copy link
Owner

abetlen commented Aug 18, 2023

@pradhyumna85 thank you for the fix, lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants