Skip to content

Manthya/ProdBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

48 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– ProdBot

From Zero (Beginner) to Production AI Chatbot

License: MIT Python 3.11+ Node.js 20+ Docker FastAPI Ollama

A self-hosted, production-grade AI chatbot platform that runs entirely on your machine. No API keys required. No cloud dependency. Full control.

Multimodal input ยท Real-time voice ยท WebSocket streaming ยท Agentic reasoning ยท MCP tools ยท Observability

Quick Start ยท Features ยท Architecture ยท Documentation ยท Tech Stack


Why ProdBot?

Most chatbot tutorials stop at "hello world." ProdBot is a complete, battle-tested system โ€” from LLM inference to database persistence, from tool orchestration to production observability.

  • ๐Ÿข Enterprise-ready โ€” Multi-tenant architecture, Redis caching, PostgreSQL persistence, Prometheus + Grafana monitoring
  • ๐ŸŽ“ Beginner-friendly โ€” 10+ phases of detailed documentation guide you from core concepts to production deployment
  • ๐Ÿ”’ Fully self-hosted โ€” Runs on Ollama with open-source models. Your data never leaves your infrastructure
  • โšก Production-hardened โ€” Red-team tested, behavioral benchmarks, deep audit with 12 critical fixes applied

Tip

New to building chatbots? Follow the phase-by-phase documentation โ€” each phase builds on the previous one, taking you from a simple chat loop to a fully orchestrated, multimodal AI system.


โœจ Features

  • ๐Ÿง  9-Phase Chat Orchestrator โ€” Intent classification, context injection, hybrid memory, tool routing, agentic reasoning, and response synthesis โ€” all in a single pipeline
  • ๐Ÿค– Agentic Engine โ€” Plan + ReAct loop with cycle detection, circuit breaker, tool retry, and safety guardrails for complex multi-step tasks
  • โšก Adaptive Routing โ€” Trivial bypass (<1ms), fast path (~5s), tool path (~20s), and full agentic path (60s+) based on query complexity
  • ๐Ÿ–ผ๏ธ Multimodal Input โ€” Upload images, audio, and video. Auto-switches to vision model (LLaVA) for image understanding. Keyframe extraction for video
  • ๐ŸŽค Real-Time Voice โ€” Full-duplex WebSocket voice conversation with Whisper STT and multi-backend TTS (piper/macOS say/espeak)
  • ๐Ÿ”Œ MCP Tool Ecosystem โ€” 15+ Model Context Protocol servers: filesystem, Git, GitHub, Brave Search, Docker, Slack, Google Maps, sequential thinking, and more
  • ๐Ÿงฒ Hybrid Memory System โ€” Hot (sliding window), warm (summarization), and cold (pgvector semantic search) memory tiers for intelligent context management
  • ๐ŸŒ Multi-Provider LLM โ€” Ollama (default), OpenAI, Anthropic, and Gemini support with runtime model switching from the UI
  • ๐Ÿ“ก WebSocket Streaming โ€” Token-by-token response streaming with real-time status indicators (thinking, tool execution, synthesis)
  • ๐Ÿ“Š Full Observability โ€” Prometheus metrics, Grafana dashboards, node exporter, health checks โ€” production monitoring out of the box
  • ๐Ÿ” Multi-Tenant Ready โ€” User isolation, conversation threading, session management, role-based access architecture
  • ๐Ÿงช Extensively Tested โ€” Red-team security suite, behavioral benchmarks, trajectory evaluation, integration tests, and live pipeline audits

๐Ÿš€ Quick Start

Prerequisites

Tool Purpose
Python 3.11+ & Poetry Backend runtime
Node.js 20+ & npm Frontend runtime
Docker & Docker Compose PostgreSQL, Redis, monitoring
Ollama Local LLM inference โ€” Install โ†’
FFmpeg Audio/video processing โ€” brew install ffmpeg

1๏ธโƒฃ Clone & Configure

git clone https://github.com/your-username/chatbot-ai-systems-production.git
cd chatbot-ai-systems-production

cp .env.example .env
cp frontend/.env.example frontend/.env.local

Important

To enable MCP tools (web search, GitHub, Slack, etc.), add your API keys to .env. See docs/MCP_SETUP.md for the full guide.

2๏ธโƒฃ Pull Models

ollama serve                        # Start Ollama
ollama pull qwen2.5:14b-instruct   # Text model
ollama pull llava:7b                # Vision model
ollama pull nomic-embed-text        # Embedding model

3๏ธโƒฃ Start Backend

docker-compose up -d postgres redis          # Start databases
poetry install                                # Install dependencies
poetry run alembic upgrade head               # Run migrations
poetry run uvicorn chatbot_ai_system.server.main:app --reload --host 0.0.0.0 --port 8000

4๏ธโƒฃ Start Frontend

cd frontend
npm install
npm run dev

๐ŸŽ‰ You're Live!

Service URL
Chat UI http://localhost:3000
API Docs http://localhost:8000/docs
Health Check http://localhost:8000/health
Grafana http://localhost:3001 (admin/admin)
Prometheus http://localhost:9090

๐Ÿ—๏ธ Architecture

flowchart TB
    subgraph Client["๐Ÿ–ฅ๏ธ Client Layer"]
        Browser["Browser"]
        UI["Next.js Frontend<br/>localhost:3000"]
        Mic["๐ŸŽค Microphone"]
    end
    
    subgraph API["โšก API Layer"]
        FastAPI["FastAPI Server<br/>localhost:8000"]
        REST["/api/chat<br/>REST Endpoint"]
        WS["/api/chat/stream<br/>WebSocket"]
        Upload["/api/upload<br/>Media Upload"]
        VoiceWS["/api/voice/stream<br/>Voice WebSocket"]
        Health["/health"]
    end
    
    subgraph Core["๐Ÿง  Core Layer"]
        Orchestrator["Chat Orchestrator<br/>(9-Phase Pipeline)"]
        AgenticEngine["Agentic Engine<br/>(Plan+ReAct + Safety)"]
        Provider["LLM Provider"]
        Registry["Tool Registry"]
        MCPClient["MCP Client Layer"]
        MediaPipe["Media Pipeline"]
    end

    subgraph Multimodal["๐Ÿ–ผ๏ธ Multimodal Layer"]
        ImgProc["Image Processor<br/>(Pillow)"]
        STT["STT Engine<br/>(Whisper)"]
        TTS["TTS Engine<br/>(say/piper/espeak)"]
        VidProc["Video Processor<br/>(OpenCV)"]
    end

    subgraph Data["๐Ÿ’พ Data Layer (Hybrid Memory)"]
        DB[(PostgreSQL\nDatabase)]
        Vector["pgvector\n(Cold Memory)"]
        Summary["Summarization\n(Warm Memory)"]
        Window["Sliding Window\n(Hot Memory)"]
        MediaDB["Media Attachments\nTable"]
    end
    
    subgraph Tools["๐Ÿ› ๏ธ Tool Layer (MCP)"]
        FS["Filesystem"]
        Git["Git & GitHub"]
        Web["Brave Search & Fetch"]
        Brain["Sequential Thinking\n& SQLite"]
        Time["Time & Memory"]
    end

    subgraph Cache["โšก Cache Layer"]
        Redis[(Redis\nCache)]
    end
    
    subgraph LLM["๐Ÿค– LLM Layer"]
        Ollama["Ollama Server<br/>localhost:11434"]
        TextModel["qwen2.5:14b<br/>Text Model"]
        VisionModel["llava:7b<br/>Vision Model"]
        Embed["nomic-embed-text<br/>Embedding Model"]
    end
    
    Browser --> UI
    Mic --> UI
    UI -->|HTTP/REST| REST
    UI -.->|WebSocket| WS
    UI -->|File Upload| Upload
    UI -.->|Voice| VoiceWS
    REST --> Orchestrator
    WS --> Orchestrator
    Upload --> MediaPipe
    VoiceWS --> STT & TTS
    MediaPipe --> ImgProc & STT & VidProc
    Orchestrator --> AgenticEngine --> Provider
    Orchestrator --> Provider
    Orchestrator --> Registry
    Orchestrator --> Redis
    Provider --> Ollama
    Ollama --> TextModel & VisionModel & Embed
    Redis --> DB & Vector & Summary & Window
    Registry --> MCPClient --> Tools
    MediaPipe --> MediaDB
Loading

โšก How It Works โ€” Adaptive Execution

ProdBot automatically classifies every query and routes it through the optimal execution path:

flowchart TD
    User["User Query"] --> Classify{Intent Classifier}
    
    Classify -->|Simple Info| FastPath["๐Ÿš€ FAST PATH<br/>(Direct Response)"]
    Classify -->|Simple Tool| MedPath["๐Ÿ› ๏ธ TOOL PATH<br/>(One-shot Execution)"]
    Classify -->|Complex/Reasoning| SlowPath["๐Ÿง  AGENTIC PATH<br/>(Plan + ReAct Loop)"]
    
    FastPath --> LLM[LLM Response]
    MedPath --> Registry[Tool Registry] --> LLM
    
    SlowPath --> Planner[Sequential Thinking]
    Planner --> ReAct[ReAct Loop]
    ReAct --> Registry
    ReAct --> ReAct
    ReAct --> LLM
Loading
Path When Latency
Trivial Bypass Greetings, acknowledgments ("hi", "thanks") โ€” skips LLM entirely <1ms
Fast Path Facts, definitions โ€” direct LLM response, tools disabled ~5-8s
Tool Path Single-step tasks ("list files", "search web") โ€” one-shot tool call ~20-40s
Agentic Path Complex reasoning โ€” full Plan+ReAct loop with safety guardrails 60s+

๐Ÿ”Œ MCP Tool Ecosystem

ProdBot dynamically loads Model Context Protocol servers based on your .env configuration:

Category Tools Env Key Required
Core Filesystem, Time, Memory (Knowledge Graph), PostgreSQL None (built-in)
Research Brave Search, Puppeteer, Fetch (HTTP) BRAVE_API_KEY
Developer Git, GitHub, Docker, E2B Code Interpreter GITHUB_TOKEN, E2B_API_KEY
Brain Sequential Thinking, SQLite None (built-in)
Connectors Slack, Google Maps, Sentry SLACK_BOT_TOKEN, GOOGLE_MAPS_API_KEY, SENTRY_AUTH_TOKEN
โš™๏ธ How to Enable / Update MCP Tools
  1. Open your .env file and add the API key for the tool you want to enable:
    # Example: Enable web search
    BRAVE_API_KEY=your-brave-api-key
    
    # Example: Enable GitHub integration
    GITHUB_TOKEN=ghp_your-github-token
  2. Restart the backend โ€” ProdBot auto-discovers available servers on startup
  3. The tool is now available to the chatbot's orchestrator and will be used when relevant

Tools without required API keys (Filesystem, Time, Git, Sequential Thinking, SQLite) work out of the box with zero configuration.

See docs/MCP_SETUP.md for the complete setup guide with all available servers.


๐Ÿ› ๏ธ Tech Stack

Layer Technologies
Backend FastAPI ยท SQLAlchemy (async) ยท Pydantic ยท WebSockets ยท Alembic
LLM Ollama ยท OpenAI ยท Anthropic ยท Gemini ยท MCP Protocol
Multimodal faster-whisper (STT) ยท piper-tts / macOS say (TTS) ยท Pillow ยท OpenCV ยท LLaVA
Data PostgreSQL ยท pgvector ยท Redis ยท Hybrid 3-tier memory
Frontend Next.js 14 ยท TypeScript ยท Tailwind CSS
DevOps Docker Compose ยท Prometheus ยท Grafana ยท Node Exporter

๐Ÿ’ฌ Personal Assistant Mode

ProdBot goes beyond generic chatbots โ€” it can connect to your personal communication platforms and act as a context-aware assistant across your digital life.

Platform Capability How It Works
Gmail Read, search, draft emails OAuth-based via MCP server โ€” drafts land in your Gmail Drafts folder
Telegram Read chats, send messages Local MTProto client (Telethon) โ€” your session, your machine
LinkedIn Read inbox messages Headless browser automation (Playwright)
Slack Send messages, read channels Official Slack Bot Token via MCP
WhatsApp Read/send messages Planned โ€” Phase 2 rollout
Line Read/send messages Planned โ€” Phase 2 rollout

Key design principles:

  • ๐Ÿ”’ Local-first โ€” All credentials stay on your machine. No cloud auth, no data leaves your infrastructure
  • โœ‹ Human-in-the-loop โ€” ProdBot never sends messages without your explicit approval. Every outgoing message goes through a Draft Card UI where you can edit, regenerate, or cancel
  • ๐ŸŽ›๏ธ Granular permissions โ€” Control Read / Draft / Send permissions per platform from the Plugins dashboard
  • ๐Ÿ” On-demand retrieval โ€” ProdBot doesn't pre-index your messages. It searches your platforms in real-time when asked

Tip

All personal integrations are gated behind feature flags (off by default). Enable them individually when ready. See docs/personal_platform_integration.md for the full specification.


๐Ÿงช Testing

# Red-team security regression (no external APIs needed)
PYTHONPATH=src .venv/bin/pytest tests/redteam -q

# Behavioral benchmark suite
PYTHONPATH=src .venv/bin/python tests/evals/run_benchmarks.py

# Full pipeline integration test (requires running backend + Ollama)
PYTHONPATH=src .venv/bin/python tests/test_all_pipelines.py

# Multimodal pipeline test (requires backend + Ollama + FFmpeg)
PYTHONPATH=src .venv/bin/python tests/test_media_pipeline.py

๐Ÿ“– Learn & Build: Phase by Phase

Tip

New here? ProdBot was built incrementally across 20+ phases โ€” each with detailed documentation explaining what was built, why it was designed that way, and how it works under the hood. Start from Phase 1 and build your understanding of production AI systems step by step.

๐Ÿ“š Click to expand full Phase Documentation
Phase What You'll Learn Docs
1.0 Core chatbot with open-source LLM โ€”
1.1 MCP tool support & streaming execution Phase 1.1
1.2 Decision discipline โ€” smart routing & planning Phase 1.2
1.3 Chat orchestrator โ€” 9-phase architecture Phase 1.3
2.0 Data persistence & user memory (PostgreSQL) Phase 2.0
2.2 Embedding & semantic search (pgvector) Phase 2.2
2.5 Observability & schema scaling Phase 2.5
2.6โ€“2.7 Sliding window (hot memory) & summarization (warm memory) โ€”
3.0 Redis caching & performance optimization Phase 3.0
4.0โ€“4.1 Prometheus & Grafana observability (setup + hardening) Phase 4.0 ยท 4.1
5.0 Multimodal input & voice conversation Phase 5.0
5.5 Performance optimization & adaptive routing Phase 5.5
6.0 Multi-provider LLM orchestration (OpenAI, Anthropic, Gemini) Phase 6.0
6.5 Free tool integration (web search & coding) Phase 6.5
7.0โ€“7.1 System hardening โ€” deep audit & 12 critical fixes Phase 7.0 ยท 7.1
8.0โ€“8.1 Red-team testing & behavioral benchmarks Phase 8.0
9.0 Personal platform integration (Gmail/Telegram/LinkedIn) Phase 9.0
9.1 Stabilization & functional evaluation Testing
10.0 Orchestrator & routing reliability upgrade Phase 10.0
10.1 State-machine graph engine & multi-agent handoff Phase 10.1

๐Ÿ—บ๏ธ Roadmap

โœ… What's Built (Infrastructure Complete)

  • Core chat engine with open-source LLM (Ollama)
  • 9-phase chat orchestrator with intent classification & context injection โ€” Phase 1.3
  • MCP tool support & streaming execution โ€” Phase 1.1
  • Decision discipline โ€” smart routing & planning โ€” Phase 1.2
  • Agentic engine โ€” Plan + ReAct loop with cycle detection & circuit breaker โ€” Phase 5.5
  • Adaptive routing โ€” trivial, fast, tool, and agentic execution paths โ€” Phase 5.5
  • Data persistence & user memory (PostgreSQL) โ€” Phase 2.0
  • Embedding & semantic search (pgvector) โ€” Phase 2.2
  • Hybrid 3-tier memory โ€” hot (sliding window), warm (summarization), cold (pgvector) โ€” Phase 2.5
  • Redis caching layer for context, sessions, and tool reliability scores โ€” Phase 3.0
  • Full observability โ€” Prometheus, Grafana dashboards, health checks โ€” Phase 4.0 ยท Phase 4.1
  • Multimodal input โ€” image (LLaVA), audio (Whisper), video (OpenCV keyframes) โ€” Phase 5.0
  • Real-time voice conversation โ€” full-duplex WebSocket with STT + TTS โ€” Phase 5.0
  • Multi-provider LLM โ€” Ollama, OpenAI, Anthropic, Gemini with runtime switching โ€” Phase 6.0
  • Free tool integration โ€” web search & code interpreter โ€” Phase 6.5
  • 15+ MCP tool servers โ€” filesystem, Git, GitHub, web search, Docker, Slack, and more โ€” MCP Setup
  • Production hardening โ€” deep audit, 12 critical fixes, red-team tested โ€” Phase 7.0 ยท Phase 7.1
  • Behavioral evaluation โ€” benchmark suite with trajectory tracking โ€” Phase 8.0
  • Personal platform integration โ€” Gmail, Telegram, LinkedIn (local-first, human-in-the-loop) โ€” Phase 9.0
  • Orchestrator & routing reliability upgrade โ€” deterministic routing, tool reliability ranking โ€” Phase 10.0
  • Graph-based state-machine orchestrator with multi-agent handoff โ€” Phase 10.1
  • Redis checkpointing for crash-resilient execution โ€” Phase 10.1
  • Reflection engine โ€” automatic LLM self-correction on tool failures โ€” Phase 10.1

๐Ÿ”ฎ What's Next (Production Readiness for Real-World Traffic)

  • Authentication & Multi-Tenancy โ€” JWT/OAuth2 auth, user isolation, org-level RBAC, session token rotation
  • Infrastructure as Code (Terraform) โ€” Reproducible cloud provisioning (AWS/GCP), environment parity (dev/staging/prod), Git-managed infra with automated plan/apply
  • Container Orchestration (Kubernetes) โ€” Helm charts, horizontal pod autoscaling, rolling updates, production k8s manifests with resource limits & liveness probes
  • Message Queue & Async Processing โ€” RabbitMQ/Kafka/SQS for decoupled request handling, background task workers, retry queues with dead-letter handling for high-throughput traffic
  • Rate Limiting & Throttling โ€” Per-user and per-endpoint rate limits, token bucket / sliding window algorithms, LLM API quota management with exponential backoff
  • Load Balancing & Auto-Scaling โ€” L7 load balancing with health-check routing, model-aware request distribution, geo-distributed deployment for latency reduction
  • Advanced Circuit Breaker & Fault Tolerance โ€” Service-level circuit breakers (closed/open/half-open) for LLM providers and MCP tools, cascading failure prevention, graceful degradation under partial outages
  • CI/CD Pipeline โ€” Automated test gates (unit โ†’ red-team โ†’ behavioral), blue-green / canary deployments, zero-downtime release strategy
  • Secrets Management & Security Hardening โ€” Vault/AWS SSM for credential rotation, network segmentation, container image scanning, OWASP compliance audit
  • Chaos Engineering & Load Testing โ€” Synthetic traffic generation (Locust/k6), fault injection testing, latency/throughput benchmarks under realistic concurrent load
  • Cost Optimization & Token Budget Control โ€” Per-request cost tracking, smart model routing (route simple queries to cheaper/smaller models), token consumption dashboards and alerting

๐Ÿ“‚ Project Structure

chatbot-ai-systems-production/
โ”œโ”€โ”€ src/chatbot_ai_system/
โ”‚   โ”œโ”€โ”€ config/              # Settings, MCP server config
โ”‚   โ”œโ”€โ”€ database/            # SQLAlchemy models, session, Redis
โ”‚   โ”œโ”€โ”€ models/              # Pydantic schemas
โ”‚   โ”œโ”€โ”€ observability/       # Prometheus metrics
โ”‚   โ”œโ”€โ”€ orchestrator.py      # 9-phase chat orchestrator
โ”‚   โ”œโ”€โ”€ providers/           # LLM providers (Ollama, OpenAI, Anthropic, Gemini)
โ”‚   โ”œโ”€โ”€ repositories/        # DB repositories (conversation, memory)
โ”‚   โ”œโ”€โ”€ server/              # FastAPI routes, media routes, voice routes
โ”‚   โ”œโ”€โ”€ services/            # Media pipeline, STT, TTS, embedding
โ”‚   โ””โ”€โ”€ tools/               # MCP tool registry and client
โ”œโ”€โ”€ frontend/                # Next.js 14 frontend
โ”œโ”€โ”€ tests/                   # Red-team, evals, integration, pipeline tests
โ”œโ”€โ”€ docs/                    # Phase-by-phase documentation
โ”œโ”€โ”€ docker/                  # Prometheus, Grafana config
โ”œโ”€โ”€ alembic/                 # Database migrations
โ””โ”€โ”€ scripts/                 # Utility scripts

๐Ÿค Contributing

Contributions are welcome! Feel free to open issues and submit pull requests.

License: MIT


Built with โค๏ธ โ€” from first commit to production

About

๐Ÿค– From Zero to Production AI Chatbot โ€” Self-hosted, multimodal, agentic chatbot platform with MCP tools, real-time voice, hybrid memory, and full observability. Built with FastAPI, Next.js, Ollama & PostgreSQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors