Skip to content

Nightstorm26/ChatBot

Repository files navigation

Ollive Inference Chatbot

A full-stack LLM chatbot with a lightweight inference logging SDK, near-real-time ingestion API, and PostgreSQL storage for messages and inference metadata.

Features

Requirement Implementation
Multi-turn chatbot Conversation history (last 20 messages) sent to the model
Simple UI React app — list, resume, cancel conversations
Inference SDK @ollive/inference-sdk wraps LLM calls, captures metadata, POSTs to ingest
Ingestion pipeline POST /api/ingest — Zod validation, persistence
Database PostgreSQL — conversations, messages, inference_logs

Bonus

  • Multi-provider: Google Gemini (default), OpenAI, Anthropic
  • Streaming responses (SSE)
  • Latency / throughput / errors dashboard (live panel)
  • Docker Compose one-command setup
  • PII redaction in log previews (email, phone, SSN, cards, API keys)
  • Event-style decoupling: SDK → HTTP ingest (same process locally; separable in production)

Quick start (Docker)

cp .env.example .env
# Add at least one key:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# GEMINI_API_KEY=your-gemini-api-key-here

docker compose up --build

Local development

Prerequisites: Node 20+, PostgreSQL 16 (or use Docker for Postgres only).

npm install
cp .env.example .env
# Start Postgres (or: docker compose up postgres -d)

npm run db:migrate
npm run dev

Project structure

├── packages/inference-sdk/   # Logging wrapper (publishable SDK)
├── apps/api/                 # Chat API + ingestion + dashboard
├── apps/web/                 # React UI
├── docker-compose.yml
├── ARCHITECTURE.md           # Design notes
└── README.md

API overview

Endpoint Description
GET /api/conversations List conversations
POST /api/conversations Create conversation
GET /api/conversations/:id Resume — messages + metadata
POST /api/conversations/:id/cancel Cancel conversation
POST /api/chat/message Send message (stream: true for SSE)
POST /api/chat/cancel-stream Abort in-flight stream
GET /api/chat/providers Available providers/models
POST /api/ingest Inference log ingestion (SDK target)
GET /api/dashboard/metrics Latency, throughput, errors

Schema design

conversations — session container; status (active | cancelled), optional provider/model defaults.

messages — append-only chat history; FK to conversation with ON DELETE CASCADE.

inference_logs — one row per inference attempt; previews only (not full payloads) to limit storage and PII exposure. Indexed by conversation_id, provider, status for dashboard queries.

Tradeoffs

  • Previews capped at 500 chars in the SDK; full messages live in messages only.
  • Ingestion is synchronous HTTP (202 Accepted); failed ingest logs are warned, not retried (see ARCHITECTURE.md).
  • Context window fixed at 20 messages — simple and predictable; not token-aware.

Environment variables

See .env.example.

What we'd improve with more time

  • Retry queue (Redis/SQS) for failed ingest events
  • Token-based context trimming
  • Auth + multi-tenant conversation isolation
  • Grafana dashboards from inference_logs
  • Separate ingest worker service and read replicas
  • Kubernetes manifests (Helm) for self-hosted deploy

Demo

Screenshots from a local run (Gemini gemini-2.0-flash, inference logging + live metrics).

Chat UI — multi-turn conversation

Multi-turn chat with provider/model selection and streaming support.

Multi-turn chat with RAG response

Model selection (multi-provider)

Switch between Gemini models (gemini-2.0-flash, gemini-2.5-flash, etc.) from the header.

Gemini model selector

Streaming response

Token-by-token streaming while the assistant generates a reply.

Streaming inference

Inference metrics dashboard

Live 24h panel: latency, throughput, error breakdown, and per-provider stats (fed by the ingestion pipeline).

Dashboard and conversation list

Run it yourself

docker compose up postgres -d
npm run dev

Open http://localhost:5173 — see Quick start for full setup.

Architecture notes: ARCHITECTURE.md

About

Build a lightweight inference logging and ingestion system for an LLM application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors