A production-shaped customer support platform where seven specialized AI agents — triage, knowledge base (RAG), order/billing, troubleshooting, escalation, quality review, and analytics — collaborate through a LangGraph workflow instead of one monolithic chatbot prompt.
Built as a portfolio project to demonstrate multi-agent orchestration, tool-calling, RAG, guardrails, and observability end to end — while staying free/cheap to run and easy to demo. See
docs/DEMO_SCRIPT.mdfor a guided walkthrough anddocs/ARCHITECTURE.mdfor the full system design.
Most "AI customer support" demos are a single LLM call wrapped in a chat UI: one prompt tries to classify intent, look up data, answer policy questions, write code-adjacent troubleshooting steps, decide when to involve a human, and self-police its own tone — all at once. That works for toy demos and breaks down in ways that matter for a real support team: it hallucinates order statuses because nothing forces it to check a database; it can't explain why it routed a ticket a certain way; and there's no natural place to enforce "never promise a refund you haven't issued" without it competing with every other instruction in the prompt.
- Narrower tool surface per agent — the Knowledge Base Agent can't call
issue_refund; the Order/Billing Agent has no access to KB retrieval. Fewer tools per call means fewer chances to pick the wrong one. - A dedicated checkpoint before anything ships — the Response Quality Agent runs as a separate, final pass so guardrail enforcement isn't just "one more instruction" buried in a giant system prompt.
- Escalation runs after every specialist, not just at triage — a conversation that starts as a routine billing question but turns hostile mid-resolution still gets caught, because the Sentiment & Escalation Agent re-evaluates every turn.
- Auditability — every node logs its input/output/tool-calls/guardrail-flags to Postgres
(
lib/logging/logger.ts), viewable per-conversation in the admin dashboard's Agent Trace. You can answer "why did the bot do that?" by reading a table, not by guessing at a prompt.
flowchart TD
START([customer message]) --> Triage[Triage Agent]
Triage -->|knowledge_base| KB[Knowledge Base Agent]
Triage -->|order_billing| Billing[Order/Billing Agent]
Triage -->|troubleshooting| Trouble[Troubleshooting Agent]
Triage -->|escalation| Escalation
KB --> Escalation[Sentiment & Escalation Agent]
Billing --> Escalation
Trouble --> Escalation
Escalation --> Quality[Response Quality Agent]
Quality -->|guardrail violation, 1st time| Quality
Quality -->|clean or 2nd pass| END([response])
Full system diagram, sequencing, and design-decision rationale: docs/ARCHITECTURE.md.
| Agent | File | Responsibility |
|---|---|---|
| Triage | lib/agents/triage.ts |
Classifies intent, category, urgency, sentiment — always runs first |
| Knowledge Base | lib/agents/knowledgeBase.ts |
RAG over kb-docs/*.md via pgvector; cites sources; says "I don't know" below a similarity threshold |
| Order/Billing | lib/agents/orderBilling.ts |
Tool-calls into mock Postgres CRM/billing data; never free-types an order/invoice fact |
| Troubleshooting | lib/agents/troubleshooting.ts |
Step-by-step technical fixes; tracks attempted fixes; asks clarifying questions only when needed |
| Sentiment & Escalation | lib/agents/escalation.ts |
Detects anger/legal/VIP/repeat-contact signals; creates a ticket + Slack notification; writes the handoff summary |
| Response Quality | lib/agents/qualityReview.ts |
Runs guardrails (lib/guardrails/policies.ts), polishes tone, regenerates once on a violation |
| Analytics | lib/agents/analytics.ts |
Batch summary over tickets: category trends, escalation rate, LLM-suggested new FAQ articles |
- Frontend/Backend: Next.js 14 (App Router) + TypeScript + Tailwind CSS, Next.js API routes as the backend
- Agent orchestration: LangGraph (
@langchain/langgraph) with a typed shared state (lib/agents/state.ts) - LLM: OpenAI
gpt-4o-mini(agents) +text-embedding-3-small(RAG) - Database + vector store: Supabase (Postgres + pgvector)
- Auth: Clerk, protecting
/dashboard/*and the admin API routes - Integrations: Slack Incoming Webhook (escalations), Gmail API via OAuth2 (email-based support, polled), mocked CRM/billing (seeded Postgres tables)
- Observability: structured
agent_logstable + console output, viewable per-conversation as an Agent Trace - Deployment: Vercel (app) + Supabase (DB) — see
docs/DEPLOYMENT.md
npm install
cp .env.example .env.local # fill in keys — see docs/DEPLOYMENT.md for where to get each oneRun lib/db/schema.sql in your Supabase SQL editor, then:
npm run seed # demo customers, orders, invoices, subscriptions
npm run ingest-docs # embeds kb-docs/*.md into pgvector
npm run dev # http://localhost:3000/chat is the customer-facing widget (pick a seeded customer or stay "Guest"). /dashboard is the
Clerk-protected admin side (tickets, documents, analytics).
Full step-by-step for every integration (Supabase, OpenAI, Clerk, Slack, Gmail OAuth) and production deployment:
docs/DEPLOYMENT.md.
See .env.example for the full annotated list — OpenAI, Supabase, Clerk, Slack, Gmail/Google OAuth, and app-level
vars (NEXT_PUBLIC_APP_URL, CRON_SECRET).
Full DDL in lib/db/schema.sql: customers, orders, invoices, subscriptions (mock CRM/billing),
conversations, messages (chat history + citations + agent path), tickets (escalation queue), kb_documents /
kb_chunks (RAG, with a match_kb_chunks pgvector similarity-search function), feedback (👍/👎), and
agent_logs (observability).
Seven scenarios from triage through resolution or escalation, narrated step by step in docs/DEMO_SCRIPT.md:
order status, duplicate-charge refund (real tool-backed action), a known troubleshooting fix, subscription
cancellation, a cited KB policy answer, an anger/repeat-contact escalation with a live Slack notification, and a
guardrail catch on an unsupported refund promise.
Add screenshots of /chat and /dashboard here after running the demo locally — see docs/DEMO_SCRIPT.md for
which screens to capture.
eval/fixtures/cases.ts defines 14 cases (7 demo scenarios + 7 adversarial/boundary cases). npm run eval scores
routing accuracy, escalation precision/recall, a hallucination spot-check on KB answers, and an LLM-judge
helpfulness score, then writes docs/EVALUATION.md. See that file for the full methodology and how to regenerate
it with live results.
- Replace the regex-based guardrails with an additional LLM-judge pass to catch paraphrased policy violations.
- Move Gmail from polling to real-time Pub/Sub push once the demo needs lower email-reply latency.
- Add streaming responses in the chat UI (currently request/response) for perceived latency.
- Add per-customer rate limiting and remove/protect the demo-only
/api/customersmock-login endpoint. - Expand the eval suite with a held-out set scored by a human reviewer to calibrate the LLM-judge against.
- Add LangSmith tracing as an optional, richer alternative to the custom
agent_logstable for deeper debugging.