A self-hosted, production-grade AI chatbot platform that runs entirely on your machine. No API keys required. No cloud dependency. Full control.
Multimodal input ยท Real-time voice ยท WebSocket streaming ยท Agentic reasoning ยท MCP tools ยท Observability
Quick Start ยท Features ยท Architecture ยท Documentation ยท Tech Stack
Most chatbot tutorials stop at "hello world." ProdBot is a complete, battle-tested system โ from LLM inference to database persistence, from tool orchestration to production observability.
- ๐ข Enterprise-ready โ Multi-tenant architecture, Redis caching, PostgreSQL persistence, Prometheus + Grafana monitoring
- ๐ Beginner-friendly โ 10+ phases of detailed documentation guide you from core concepts to production deployment
- ๐ Fully self-hosted โ Runs on Ollama with open-source models. Your data never leaves your infrastructure
- โก Production-hardened โ Red-team tested, behavioral benchmarks, deep audit with 12 critical fixes applied
Tip
New to building chatbots? Follow the phase-by-phase documentation โ each phase builds on the previous one, taking you from a simple chat loop to a fully orchestrated, multimodal AI system.
- ๐ง 9-Phase Chat Orchestrator โ Intent classification, context injection, hybrid memory, tool routing, agentic reasoning, and response synthesis โ all in a single pipeline
- ๐ค Agentic Engine โ Plan + ReAct loop with cycle detection, circuit breaker, tool retry, and safety guardrails for complex multi-step tasks
- โก Adaptive Routing โ Trivial bypass (<1ms), fast path (~5s), tool path (~20s), and full agentic path (60s+) based on query complexity
- ๐ผ๏ธ Multimodal Input โ Upload images, audio, and video. Auto-switches to vision model (LLaVA) for image understanding. Keyframe extraction for video
- ๐ค Real-Time Voice โ Full-duplex WebSocket voice conversation with Whisper STT and multi-backend TTS (piper/macOS say/espeak)
- ๐ MCP Tool Ecosystem โ 15+ Model Context Protocol servers: filesystem, Git, GitHub, Brave Search, Docker, Slack, Google Maps, sequential thinking, and more
- ๐งฒ Hybrid Memory System โ Hot (sliding window), warm (summarization), and cold (pgvector semantic search) memory tiers for intelligent context management
- ๐ Multi-Provider LLM โ Ollama (default), OpenAI, Anthropic, and Gemini support with runtime model switching from the UI
- ๐ก WebSocket Streaming โ Token-by-token response streaming with real-time status indicators (thinking, tool execution, synthesis)
- ๐ Full Observability โ Prometheus metrics, Grafana dashboards, node exporter, health checks โ production monitoring out of the box
- ๐ Multi-Tenant Ready โ User isolation, conversation threading, session management, role-based access architecture
- ๐งช Extensively Tested โ Red-team security suite, behavioral benchmarks, trajectory evaluation, integration tests, and live pipeline audits
| Tool | Purpose |
|---|---|
| Python 3.11+ & Poetry | Backend runtime |
| Node.js 20+ & npm | Frontend runtime |
| Docker & Docker Compose | PostgreSQL, Redis, monitoring |
| Ollama | Local LLM inference โ Install โ |
| FFmpeg | Audio/video processing โ brew install ffmpeg |
git clone https://github.com/your-username/chatbot-ai-systems-production.git
cd chatbot-ai-systems-production
cp .env.example .env
cp frontend/.env.example frontend/.env.localImportant
To enable MCP tools (web search, GitHub, Slack, etc.), add your API keys to .env.
See docs/MCP_SETUP.md for the full guide.
ollama serve # Start Ollama
ollama pull qwen2.5:14b-instruct # Text model
ollama pull llava:7b # Vision model
ollama pull nomic-embed-text # Embedding modeldocker-compose up -d postgres redis # Start databases
poetry install # Install dependencies
poetry run alembic upgrade head # Run migrations
poetry run uvicorn chatbot_ai_system.server.main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm install
npm run dev| Service | URL |
|---|---|
| Chat UI | http://localhost:3000 |
| API Docs | http://localhost:8000/docs |
| Health Check | http://localhost:8000/health |
| Grafana | http://localhost:3001 (admin/admin) |
| Prometheus | http://localhost:9090 |
flowchart TB
subgraph Client["๐ฅ๏ธ Client Layer"]
Browser["Browser"]
UI["Next.js Frontend<br/>localhost:3000"]
Mic["๐ค Microphone"]
end
subgraph API["โก API Layer"]
FastAPI["FastAPI Server<br/>localhost:8000"]
REST["/api/chat<br/>REST Endpoint"]
WS["/api/chat/stream<br/>WebSocket"]
Upload["/api/upload<br/>Media Upload"]
VoiceWS["/api/voice/stream<br/>Voice WebSocket"]
Health["/health"]
end
subgraph Core["๐ง Core Layer"]
Orchestrator["Chat Orchestrator<br/>(9-Phase Pipeline)"]
AgenticEngine["Agentic Engine<br/>(Plan+ReAct + Safety)"]
Provider["LLM Provider"]
Registry["Tool Registry"]
MCPClient["MCP Client Layer"]
MediaPipe["Media Pipeline"]
end
subgraph Multimodal["๐ผ๏ธ Multimodal Layer"]
ImgProc["Image Processor<br/>(Pillow)"]
STT["STT Engine<br/>(Whisper)"]
TTS["TTS Engine<br/>(say/piper/espeak)"]
VidProc["Video Processor<br/>(OpenCV)"]
end
subgraph Data["๐พ Data Layer (Hybrid Memory)"]
DB[(PostgreSQL\nDatabase)]
Vector["pgvector\n(Cold Memory)"]
Summary["Summarization\n(Warm Memory)"]
Window["Sliding Window\n(Hot Memory)"]
MediaDB["Media Attachments\nTable"]
end
subgraph Tools["๐ ๏ธ Tool Layer (MCP)"]
FS["Filesystem"]
Git["Git & GitHub"]
Web["Brave Search & Fetch"]
Brain["Sequential Thinking\n& SQLite"]
Time["Time & Memory"]
end
subgraph Cache["โก Cache Layer"]
Redis[(Redis\nCache)]
end
subgraph LLM["๐ค LLM Layer"]
Ollama["Ollama Server<br/>localhost:11434"]
TextModel["qwen2.5:14b<br/>Text Model"]
VisionModel["llava:7b<br/>Vision Model"]
Embed["nomic-embed-text<br/>Embedding Model"]
end
Browser --> UI
Mic --> UI
UI -->|HTTP/REST| REST
UI -.->|WebSocket| WS
UI -->|File Upload| Upload
UI -.->|Voice| VoiceWS
REST --> Orchestrator
WS --> Orchestrator
Upload --> MediaPipe
VoiceWS --> STT & TTS
MediaPipe --> ImgProc & STT & VidProc
Orchestrator --> AgenticEngine --> Provider
Orchestrator --> Provider
Orchestrator --> Registry
Orchestrator --> Redis
Provider --> Ollama
Ollama --> TextModel & VisionModel & Embed
Redis --> DB & Vector & Summary & Window
Registry --> MCPClient --> Tools
MediaPipe --> MediaDB
ProdBot automatically classifies every query and routes it through the optimal execution path:
flowchart TD
User["User Query"] --> Classify{Intent Classifier}
Classify -->|Simple Info| FastPath["๐ FAST PATH<br/>(Direct Response)"]
Classify -->|Simple Tool| MedPath["๐ ๏ธ TOOL PATH<br/>(One-shot Execution)"]
Classify -->|Complex/Reasoning| SlowPath["๐ง AGENTIC PATH<br/>(Plan + ReAct Loop)"]
FastPath --> LLM[LLM Response]
MedPath --> Registry[Tool Registry] --> LLM
SlowPath --> Planner[Sequential Thinking]
Planner --> ReAct[ReAct Loop]
ReAct --> Registry
ReAct --> ReAct
ReAct --> LLM
| Path | When | Latency |
|---|---|---|
| Trivial Bypass | Greetings, acknowledgments ("hi", "thanks") โ skips LLM entirely | <1ms |
| Fast Path | Facts, definitions โ direct LLM response, tools disabled | ~5-8s |
| Tool Path | Single-step tasks ("list files", "search web") โ one-shot tool call | ~20-40s |
| Agentic Path | Complex reasoning โ full Plan+ReAct loop with safety guardrails | 60s+ |
ProdBot dynamically loads Model Context Protocol servers based on your .env configuration:
| Category | Tools | Env Key Required |
|---|---|---|
| Core | Filesystem, Time, Memory (Knowledge Graph), PostgreSQL | None (built-in) |
| Research | Brave Search, Puppeteer, Fetch (HTTP) | BRAVE_API_KEY |
| Developer | Git, GitHub, Docker, E2B Code Interpreter | GITHUB_TOKEN, E2B_API_KEY |
| Brain | Sequential Thinking, SQLite | None (built-in) |
| Connectors | Slack, Google Maps, Sentry | SLACK_BOT_TOKEN, GOOGLE_MAPS_API_KEY, SENTRY_AUTH_TOKEN |
โ๏ธ How to Enable / Update MCP Tools
- Open your
.envfile and add the API key for the tool you want to enable:# Example: Enable web search BRAVE_API_KEY=your-brave-api-key # Example: Enable GitHub integration GITHUB_TOKEN=ghp_your-github-token
- Restart the backend โ ProdBot auto-discovers available servers on startup
- The tool is now available to the chatbot's orchestrator and will be used when relevant
Tools without required API keys (Filesystem, Time, Git, Sequential Thinking, SQLite) work out of the box with zero configuration.
See docs/MCP_SETUP.md for the complete setup guide with all available servers.
| Layer | Technologies |
|---|---|
| Backend | FastAPI ยท SQLAlchemy (async) ยท Pydantic ยท WebSockets ยท Alembic |
| LLM | Ollama ยท OpenAI ยท Anthropic ยท Gemini ยท MCP Protocol |
| Multimodal | faster-whisper (STT) ยท piper-tts / macOS say (TTS) ยท Pillow ยท OpenCV ยท LLaVA |
| Data | PostgreSQL ยท pgvector ยท Redis ยท Hybrid 3-tier memory |
| Frontend | Next.js 14 ยท TypeScript ยท Tailwind CSS |
| DevOps | Docker Compose ยท Prometheus ยท Grafana ยท Node Exporter |
ProdBot goes beyond generic chatbots โ it can connect to your personal communication platforms and act as a context-aware assistant across your digital life.
| Platform | Capability | How It Works |
|---|---|---|
| Gmail | Read, search, draft emails | OAuth-based via MCP server โ drafts land in your Gmail Drafts folder |
| Telegram | Read chats, send messages | Local MTProto client (Telethon) โ your session, your machine |
| Read inbox messages | Headless browser automation (Playwright) | |
| Slack | Send messages, read channels | Official Slack Bot Token via MCP |
| Read/send messages | Planned โ Phase 2 rollout | |
| Line | Read/send messages | Planned โ Phase 2 rollout |
Key design principles:
- ๐ Local-first โ All credentials stay on your machine. No cloud auth, no data leaves your infrastructure
- โ Human-in-the-loop โ ProdBot never sends messages without your explicit approval. Every outgoing message goes through a Draft Card UI where you can edit, regenerate, or cancel
- ๐๏ธ Granular permissions โ Control Read / Draft / Send permissions per platform from the Plugins dashboard
- ๐ On-demand retrieval โ ProdBot doesn't pre-index your messages. It searches your platforms in real-time when asked
Tip
All personal integrations are gated behind feature flags (off by default). Enable them individually when ready. See docs/personal_platform_integration.md for the full specification.
# Red-team security regression (no external APIs needed)
PYTHONPATH=src .venv/bin/pytest tests/redteam -q
# Behavioral benchmark suite
PYTHONPATH=src .venv/bin/python tests/evals/run_benchmarks.py
# Full pipeline integration test (requires running backend + Ollama)
PYTHONPATH=src .venv/bin/python tests/test_all_pipelines.py
# Multimodal pipeline test (requires backend + Ollama + FFmpeg)
PYTHONPATH=src .venv/bin/python tests/test_media_pipeline.pyTip
New here? ProdBot was built incrementally across 20+ phases โ each with detailed documentation explaining what was built, why it was designed that way, and how it works under the hood. Start from Phase 1 and build your understanding of production AI systems step by step.
๐ Click to expand full Phase Documentation
| Phase | What You'll Learn | Docs |
|---|---|---|
| 1.0 | Core chatbot with open-source LLM | โ |
| 1.1 | MCP tool support & streaming execution | Phase 1.1 |
| 1.2 | Decision discipline โ smart routing & planning | Phase 1.2 |
| 1.3 | Chat orchestrator โ 9-phase architecture | Phase 1.3 |
| 2.0 | Data persistence & user memory (PostgreSQL) | Phase 2.0 |
| 2.2 | Embedding & semantic search (pgvector) | Phase 2.2 |
| 2.5 | Observability & schema scaling | Phase 2.5 |
| 2.6โ2.7 | Sliding window (hot memory) & summarization (warm memory) | โ |
| 3.0 | Redis caching & performance optimization | Phase 3.0 |
| 4.0โ4.1 | Prometheus & Grafana observability (setup + hardening) | Phase 4.0 ยท 4.1 |
| 5.0 | Multimodal input & voice conversation | Phase 5.0 |
| 5.5 | Performance optimization & adaptive routing | Phase 5.5 |
| 6.0 | Multi-provider LLM orchestration (OpenAI, Anthropic, Gemini) | Phase 6.0 |
| 6.5 | Free tool integration (web search & coding) | Phase 6.5 |
| 7.0โ7.1 | System hardening โ deep audit & 12 critical fixes | Phase 7.0 ยท 7.1 |
| 8.0โ8.1 | Red-team testing & behavioral benchmarks | Phase 8.0 |
| 9.0 | Personal platform integration (Gmail/Telegram/LinkedIn) | Phase 9.0 |
| 9.1 | Stabilization & functional evaluation | Testing |
| 10.0 | Orchestrator & routing reliability upgrade | Phase 10.0 |
| 10.1 | State-machine graph engine & multi-agent handoff | Phase 10.1 |
- Core chat engine with open-source LLM (Ollama)
- 9-phase chat orchestrator with intent classification & context injection โ Phase 1.3
- MCP tool support & streaming execution โ Phase 1.1
- Decision discipline โ smart routing & planning โ Phase 1.2
- Agentic engine โ Plan + ReAct loop with cycle detection & circuit breaker โ Phase 5.5
- Adaptive routing โ trivial, fast, tool, and agentic execution paths โ Phase 5.5
- Data persistence & user memory (PostgreSQL) โ Phase 2.0
- Embedding & semantic search (pgvector) โ Phase 2.2
- Hybrid 3-tier memory โ hot (sliding window), warm (summarization), cold (pgvector) โ Phase 2.5
- Redis caching layer for context, sessions, and tool reliability scores โ Phase 3.0
- Full observability โ Prometheus, Grafana dashboards, health checks โ Phase 4.0 ยท Phase 4.1
- Multimodal input โ image (LLaVA), audio (Whisper), video (OpenCV keyframes) โ Phase 5.0
- Real-time voice conversation โ full-duplex WebSocket with STT + TTS โ Phase 5.0
- Multi-provider LLM โ Ollama, OpenAI, Anthropic, Gemini with runtime switching โ Phase 6.0
- Free tool integration โ web search & code interpreter โ Phase 6.5
- 15+ MCP tool servers โ filesystem, Git, GitHub, web search, Docker, Slack, and more โ MCP Setup
- Production hardening โ deep audit, 12 critical fixes, red-team tested โ Phase 7.0 ยท Phase 7.1
- Behavioral evaluation โ benchmark suite with trajectory tracking โ Phase 8.0
- Personal platform integration โ Gmail, Telegram, LinkedIn (local-first, human-in-the-loop) โ Phase 9.0
- Orchestrator & routing reliability upgrade โ deterministic routing, tool reliability ranking โ Phase 10.0
- Graph-based state-machine orchestrator with multi-agent handoff โ Phase 10.1
- Redis checkpointing for crash-resilient execution โ Phase 10.1
- Reflection engine โ automatic LLM self-correction on tool failures โ Phase 10.1
- Authentication & Multi-Tenancy โ JWT/OAuth2 auth, user isolation, org-level RBAC, session token rotation
- Infrastructure as Code (Terraform) โ Reproducible cloud provisioning (AWS/GCP), environment parity (dev/staging/prod), Git-managed infra with automated plan/apply
- Container Orchestration (Kubernetes) โ Helm charts, horizontal pod autoscaling, rolling updates, production k8s manifests with resource limits & liveness probes
- Message Queue & Async Processing โ RabbitMQ/Kafka/SQS for decoupled request handling, background task workers, retry queues with dead-letter handling for high-throughput traffic
- Rate Limiting & Throttling โ Per-user and per-endpoint rate limits, token bucket / sliding window algorithms, LLM API quota management with exponential backoff
- Load Balancing & Auto-Scaling โ L7 load balancing with health-check routing, model-aware request distribution, geo-distributed deployment for latency reduction
- Advanced Circuit Breaker & Fault Tolerance โ Service-level circuit breakers (closed/open/half-open) for LLM providers and MCP tools, cascading failure prevention, graceful degradation under partial outages
- CI/CD Pipeline โ Automated test gates (unit โ red-team โ behavioral), blue-green / canary deployments, zero-downtime release strategy
- Secrets Management & Security Hardening โ Vault/AWS SSM for credential rotation, network segmentation, container image scanning, OWASP compliance audit
- Chaos Engineering & Load Testing โ Synthetic traffic generation (Locust/k6), fault injection testing, latency/throughput benchmarks under realistic concurrent load
- Cost Optimization & Token Budget Control โ Per-request cost tracking, smart model routing (route simple queries to cheaper/smaller models), token consumption dashboards and alerting
chatbot-ai-systems-production/
โโโ src/chatbot_ai_system/
โ โโโ config/ # Settings, MCP server config
โ โโโ database/ # SQLAlchemy models, session, Redis
โ โโโ models/ # Pydantic schemas
โ โโโ observability/ # Prometheus metrics
โ โโโ orchestrator.py # 9-phase chat orchestrator
โ โโโ providers/ # LLM providers (Ollama, OpenAI, Anthropic, Gemini)
โ โโโ repositories/ # DB repositories (conversation, memory)
โ โโโ server/ # FastAPI routes, media routes, voice routes
โ โโโ services/ # Media pipeline, STT, TTS, embedding
โ โโโ tools/ # MCP tool registry and client
โโโ frontend/ # Next.js 14 frontend
โโโ tests/ # Red-team, evals, integration, pipeline tests
โโโ docs/ # Phase-by-phase documentation
โโโ docker/ # Prometheus, Grafana config
โโโ alembic/ # Database migrations
โโโ scripts/ # Utility scripts
Contributions are welcome! Feel free to open issues and submit pull requests.
License: MIT
Built with โค๏ธ โ from first commit to production