LLM Cost Optimizer — Reduce AI spending by up to 70%
Pennywise analyzes your LLM API usage and intelligently routes, caches, and optimizes requests to cut costs without sacrificing output quality. Built with a Python backend and a Next.js frontend dashboard.
⚠️ Work in progress — core optimization engine is functional, UI and docs are evolving.
LLM API costs add up fast. Most applications send every request to the same expensive model, even when a cheaper one would produce identical results. Prompt lengths go unoptimized, duplicate queries aren't cached, and there's no visibility into what's actually driving the bill.
- Smart model routing — Classifies incoming requests by complexity and routes simple queries to cheaper models (e.g., GPT-4o-mini, Haiku) while reserving expensive models for tasks that need them.
- Response caching — Caches semantically similar queries to avoid redundant API calls.
- Prompt optimization — Analyzes and compresses prompts to reduce token count without losing intent.
- Cost dashboard — Tracks spending per model, per endpoint, and over time so you can see exactly where your budget goes.
- Usage analytics — Identifies patterns in your API usage: which prompts are expensive, which are redundant, and where optimization has the highest ROI.
┌─────────────────────────────────────────────────┐
│ Frontend (Next.js) │
│ Cost dashboard · Usage analytics │
└──────────────────────┬──────────────────────────┘
│ API
┌──────────────────────┴──────────────────────────┐
│ Backend (Python) │
│ │
│ ┌────────────┐ ┌───────────┐ ┌────────────┐ │
│ │ Request │ │ Prompt │ │ Response │ │
│ │ Router │ │ Optimizer│ │ Cache │ │
│ └────────────┘ └───────────┘ └────────────┘ │
│ │
│ ┌────────────┐ ┌───────────────────────────┐ │
│ │ Cost │ │ LLM Provider Integrations │ │
│ │ Tracker │ │ (OpenAI, Anthropic, etc.) │ │
│ └────────────┘ └───────────────────────────┘ │
└──────────────────────────────────────────────────┘
| Layer | Technology |
|---|---|
| Backend | Python, FastAPI |
| Frontend | Next.js, TypeScript |
| Database | SQLite (dev) / PostgreSQL (prod) |
| Caching | Semantic similarity-based dedup |
| Deployment | Docker-ready |
# Clone
git clone https://github.com/Subh24ai/pennywise.git
cd pennywise
# Backend
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # Add your API keys
python main.py
# Frontend (separate terminal)
cd frontend
npm install
npm run devOpen http://localhost:3000 to access the dashboard.
Create a .env file in the backend/ directory:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
DATABASE_URL=sqlite:///pennywise.db| Strategy | Typical Savings | How |
|---|---|---|
| Model downgrading | 30–50% | Route simple tasks to cheaper models |
| Response caching | 15–25% | Skip API calls for semantically similar queries |
| Prompt compression | 10–20% | Reduce token count per request |
| Combined | Up to 70% | All strategies applied together |
pennywise/
├── backend/ # Python API server + optimization engine
├── frontend/ # Next.js cost dashboard
├── .gitignore
└── README.md
- Multi-provider cost comparison (OpenAI vs Anthropic vs local)
- Prompt A/B testing with cost tracking
- Webhook alerts for budget thresholds
- Export usage reports (CSV/PDF)
- Team/org-level usage tracking
MIT
Built by Subhash Gupta