Cuda Dockerfile - model produces garbage output on GPU (if layers offloaded to GPU)

# Current Behavior

If we use the cuda dockerfile (https://github.com/abetlen/llama-cpp-python/blob/main/docker/cuda_simple/Dockerfile) and use it with any ggml model file and offloading layers to gpu, model outputs garbage text.

By garbage I mean, absolute garbage - symbols, special characters, Unicode characters etc, no English or any language.

# Environment and Context

Running on centos 7 with docker and nvidia docker toolkit installed
Instance - aws g4dn.12xlarge (4 Nvidia T4 - 64GB VRAM)

# Exact same issue seen in a repo related to Cublas in koboldcpp which also kind of uses llamacpp

issue: https://github.com/noneabove1182/koboldcpp-docker/issues/1
fix commit: https://github.com/noneabove1182/koboldcpp-docker/commit/331326a0e340ac845855d346f730356b000650c7

# Fix
I'll raise a pull request, with the updated dockerfile.

Cheers !🥂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda Dockerfile - model produces garbage output on GPU (if layers offloaded to GPU) #597

Current Behavior

Environment and Context

Exact same issue seen in a repo related to Cublas in koboldcpp which also kind of uses llamacpp

Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cuda Dockerfile - model produces garbage output on GPU (if layers offloaded to GPU) #597

Description

Current Behavior

Environment and Context

Exact same issue seen in a repo related to Cublas in koboldcpp which also kind of uses llamacpp

Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions