GitHub - bdqfork/go-llama.cpp: go binding for llama.cpp, offer low level and high level api

This is a Go language binding for llama.cpp, which was implemented based on CGo. Provide the llama.cpp low level API and ChatGPT high level API.

High level API

You can see pkg/llm, in which implemented ChatGPT completion and chat completion APIs. More details https://platform.openai.com/docs/api-reference/chat/create.

Usage

Using vicuna as an example.

First, download the vicuna model. You can use below commands to download.

wget -c https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/resolve/main/ggml-vic7b-q4_2.bin

Then edit the file models/vicuna.yaml and modify its path to point to the location of the just downloaded model.
Run the command make run, you can see below.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] GET    /v1/models                --> github.com/bdqfork/go-llama.cpp/pkg/server.(*Server).listModels-fm (3 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/bdqfork/go-llama.cpp/pkg/server.(*Server).retreiveModel-fm (3 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/bdqfork/go-llama.cpp/pkg/server.(*Server).completion-fm (3 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/bdqfork/go-llama.cpp/pkg/server.(*Server).chatCompletion-fm (3 handlers)
[GIN-debug] [WARNING] You trusted all proxies, this is NOT safe. We recommend you to set a value.
Please check https://pkg.go.dev/github.com/gin-gonic/gin#readme-don-t-trust-all-proxies for details.
[GIN-debug] Listening and serving HTTP on 0.0.0.0:8000

Visit completion or chatcompletion API via http://localhost:8000.

curl --location 'http://localhost:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "vicuna",
    "messages": [
        {
            "role": "system",
            "content": "You are assistant for user. Have a chat with user, using markdown!"
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ],
    "max_tokens": 20,
    "presence_penalty": 1,
    "stream": false
}'

GPU

LLAMA_CUBLAS=1 make run

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
include		include
models		models
pkg		pkg
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.MD		README.MD
doc.go		doc.go
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
revive.toml		revive.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High level API

Usage

GPU

About

Releases

Packages

Languages

bdqfork/go-llama.cpp

Folders and files

Latest commit

History

Repository files navigation

High level API

Usage

GPU

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages