A full-stack LLM chatbot with a lightweight inference logging SDK, near-real-time ingestion API, and PostgreSQL storage for messages and inference metadata.
| Requirement | Implementation |
|---|---|
| Multi-turn chatbot | Conversation history (last 20 messages) sent to the model |
| Simple UI | React app — list, resume, cancel conversations |
| Inference SDK | @ollive/inference-sdk wraps LLM calls, captures metadata, POSTs to ingest |
| Ingestion pipeline | POST /api/ingest — Zod validation, persistence |
| Database | PostgreSQL — conversations, messages, inference_logs |
Bonus
- Multi-provider: Google Gemini (default), OpenAI, Anthropic
- Streaming responses (SSE)
- Latency / throughput / errors dashboard (live panel)
- Docker Compose one-command setup
- PII redaction in log previews (email, phone, SSN, cards, API keys)
- Event-style decoupling: SDK → HTTP ingest (same process locally; separable in production)
cp .env.example .env
# Add at least one key:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# GEMINI_API_KEY=your-gemini-api-key-here
docker compose up --buildPrerequisites: Node 20+, PostgreSQL 16 (or use Docker for Postgres only).
npm install
cp .env.example .env
# Start Postgres (or: docker compose up postgres -d)
npm run db:migrate
npm run dev- Web: http://localhost:5173 (proxies
/api→:3001) - API: http://localhost:3001
├── packages/inference-sdk/ # Logging wrapper (publishable SDK)
├── apps/api/ # Chat API + ingestion + dashboard
├── apps/web/ # React UI
├── docker-compose.yml
├── ARCHITECTURE.md # Design notes
└── README.md
| Endpoint | Description |
|---|---|
GET /api/conversations |
List conversations |
POST /api/conversations |
Create conversation |
GET /api/conversations/:id |
Resume — messages + metadata |
POST /api/conversations/:id/cancel |
Cancel conversation |
POST /api/chat/message |
Send message (stream: true for SSE) |
POST /api/chat/cancel-stream |
Abort in-flight stream |
GET /api/chat/providers |
Available providers/models |
POST /api/ingest |
Inference log ingestion (SDK target) |
GET /api/dashboard/metrics |
Latency, throughput, errors |
conversations — session container; status (active | cancelled), optional provider/model defaults.
messages — append-only chat history; FK to conversation with ON DELETE CASCADE.
inference_logs — one row per inference attempt; previews only (not full payloads) to limit storage and PII exposure. Indexed by conversation_id, provider, status for dashboard queries.
Tradeoffs
- Previews capped at 500 chars in the SDK; full messages live in
messagesonly. - Ingestion is synchronous HTTP (202 Accepted); failed ingest logs are warned, not retried (see ARCHITECTURE.md).
- Context window fixed at 20 messages — simple and predictable; not token-aware.
See .env.example.
- Retry queue (Redis/SQS) for failed ingest events
- Token-based context trimming
- Auth + multi-tenant conversation isolation
- Grafana dashboards from
inference_logs - Separate ingest worker service and read replicas
- Kubernetes manifests (Helm) for self-hosted deploy
Screenshots from a local run (Gemini gemini-2.0-flash, inference logging + live metrics).
Multi-turn chat with provider/model selection and streaming support.
Switch between Gemini models (gemini-2.0-flash, gemini-2.5-flash, etc.) from the header.
Token-by-token streaming while the assistant generates a reply.
Live 24h panel: latency, throughput, error breakdown, and per-provider stats (fed by the ingestion pipeline).
docker compose up postgres -d
npm run devOpen http://localhost:5173 — see Quick start for full setup.
Architecture notes: ARCHITECTURE.md



