Skip to content

codesphere-cloud/llama2-endpoint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

This template will install the python wrapper of llama.cpp from llama-cpp-python and run a FastAPI based python server that can be used as a drop in replacement for the openAI API. This means code that is compatible with the chatGPT API will also be compatible with this self hosted version.

The CI pipeline is configured to fetch a pre-converted and quantized llama code instruct model from TheBloke from huggingface but you can edit that to use other compatible models too.

Once the server is started open the deployment url and append /docs to see a swagger like API documentation where you can view and test all available endpoints.

Per default the max_token setting of this model is very restrictive (to support less capable hardware) - if you want to send or receive longer queries navigate into the installed folder /.venv/lib/python3.8/site-packages/llama_cpp/server/app.py and edit line 412:

max_tokens_field = Field(
   default=400, ge=1, description="The maximum number of tokens to generate."
)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published