Stop paying per token. Get a local, OpenAI-compatible LLM running on your own machine in a
couple of minutes, then point your existing OpenAI SDK code at localhost. Private, no per-token
fees, works offline.
These are the free, no-nonsense setup scripts. They install Ollama, pull a model sized to your GPU, and kill the cold-start lag.
Windows (PowerShell):
./scripts/setup-ollama.ps1macOS / Linux:
./scripts/setup-ollama.sh # or: ./scripts/setup-ollama.sh qwen2.5:14bThen verify it behaves like the OpenAI API:
pip install openai
python scripts/openai-proxy-test.pyPoint your app at it — no code rewrite needed:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="local") # any key| VRAM | General | Coding |
|---|---|---|
| 8 GB | llama3.1:8b |
qwen2.5-coder:7b |
| 12–16 GB | qwen2.5:14b |
qwen2.5-coder:14b |
| 24 GB+ | qwen2.5:32b |
qwen2.5-coder:32b |
Rule of thumb: model size on disk ≈ params(B) × 0.6 at Q4. Leave ~2GB headroom.
export OLLAMA_KEEP_ALIVE=-1 # keep the model resident in VRAM- 💸 Cost — a $400/mo API bill becomes ~$15 of electricity on a GPU you may already own.
- 🔒 Privacy/compliance — data never leaves your machine.
- ⚡ No rate limits — it's your hardware.
This repo gets you running. The Local AI Cost-Killer Kit adds the parts that take trial-and-error to figure out: local image generation setup, a hardware decision tree (what to buy, what not to), latency tuning, a compliance checklist, a savings calculator, a boot-time pre-warm script, and free lifetime updates.
📖 Background read: How I cut my $400/mo AI bill to ~$15
MIT — use it anywhere. Star the repo if it saved you money ⭐