A simple guide to get the llama.cpp chat API up and running using Python.
First, clone the llama.cpp library from the following repository:
After cloning, navigate to the llama.cpp
folder.
In the llama.cpp
folder, run the server file using the command below. Make sure to edit the path to your ggml
file accordingly:
./server -m models/13b-chat/ggml-model-q4_0.bin -c 2048
After running the server, clone the Llama.cpp API Python repository:
git clone https://github.com/avinrique/Llama.cpp-api-python-
cd Llama.cpp-api-python-
cp fetch_chatapi.py ./../
python app.py