Current Behavior
If we use the cuda dockerfile (https://github.com/abetlen/llama-cpp-python/blob/main/docker/cuda_simple/Dockerfile) and use it with any ggml model file and offloading layers to gpu, model outputs garbage text.
By garbage I mean, absolute garbage - symbols, special characters, Unicode characters etc, no English or any language.
Environment and Context
Running on centos 7 with docker and nvidia docker toolkit installed
Instance - aws g4dn.12xlarge (4 Nvidia T4 - 64GB VRAM)
Exact same issue seen in a repo related to Cublas in koboldcpp which also kind of uses llamacpp
issue: bartowski1182/koboldcpp-docker#1
fix commit: bartowski1182/koboldcpp-docker@331326a
Fix
I'll raise a pull request, with the updated dockerfile.
Cheers !🥂