LLaMA

Run LLM apps hyper fast on your local machine for fun.

Startup 🚀

Single Model Chat
python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.Q4_0.gguf
Single Model Chat with GPU Offload
python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.Q4_0.gguf --n_gpu -1
Single Model Function Calling with GPU Offload
python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.- Q4_0.gguf --n_gpu -1 --chat functionary
Multiple Model Load with Config
python -m llama_cpp.server --config_file config.json
Multi Modal Models
python -m llama_cpp.server --model models/llava-v1.5-7b-Q4_K.gguf --clip_model_path models/llava-v1.5-7b-mmproj-Q4_0.gguf --n_gpu -1 --chat llava-1-5

👨🏾‍💻 Author: Tom Odhiambo
📅 Version: 1.x
📜 License: This project is licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
config.json		config.json
instructor.py		instructor.py
main.py		main.py
multimodal.py		multimodal.py
openaitest.py		openaitest.py
requirements.txt		requirements.txt
stock_data.py		stock_data.py