A private, local chat interface for running GGUF models (like Gemma) using FastAPI and llama-cpp-python.
-
Install Requests:
pip install -r requirements.txt
-
Download Model: Ensure you have your GGUF model file.
- Default path configured in
server.py:/Users/user_name/ml_models/gemma_models/gemma3_4b_it/gemma-3-4b-it-Q4_K_M.gguf - Edit
MODEL_PATHinserver.pyif your model is elsewhere.
- Default path configured in
Start the chatbot server:
python3 server.py- Access the Chat UI: http://localhost:8080
- API Endpoint:
http://localhost:8080/api/chat
If you are using Brave Browser or have strict ad-blockers:
- Disable Shields: Click the Brave Lion icon (or ad-blocker icon) in your address bar.
- Allow Connection: Toggle shields/blocking DOWN for
localhost. - Reload: Press
Cmd+Shift+Rto hard refresh.
Reason: Privacy browsers often block connections to local ports (like 8080) if they suspect it's a tracking script.
If you see server errors about roles:
- The server now auto-magically merges "system" prompts into the first user message to keep the model happy. No action needed!