Skip to content

Limeload/customer-support-agent

Repository files navigation

Multi-Agent AI Customer Support System

A production-shaped customer support platform where seven specialized AI agents — triage, knowledge base (RAG), order/billing, troubleshooting, escalation, quality review, and analytics — collaborate through a LangGraph workflow instead of one monolithic chatbot prompt.

Built as a portfolio project to demonstrate multi-agent orchestration, tool-calling, RAG, guardrails, and observability end to end — while staying free/cheap to run and easy to demo. See docs/DEMO_SCRIPT.md for a guided walkthrough and docs/ARCHITECTURE.md for the full system design.

Problem statement

Most "AI customer support" demos are a single LLM call wrapped in a chat UI: one prompt tries to classify intent, look up data, answer policy questions, write code-adjacent troubleshooting steps, decide when to involve a human, and self-police its own tone — all at once. That works for toy demos and breaks down in ways that matter for a real support team: it hallucinates order statuses because nothing forces it to check a database; it can't explain why it routed a ticket a certain way; and there's no natural place to enforce "never promise a refund you haven't issued" without it competing with every other instruction in the prompt.

Why multi-agent beats a single chatbot here

  • Narrower tool surface per agent — the Knowledge Base Agent can't call issue_refund; the Order/Billing Agent has no access to KB retrieval. Fewer tools per call means fewer chances to pick the wrong one.
  • A dedicated checkpoint before anything ships — the Response Quality Agent runs as a separate, final pass so guardrail enforcement isn't just "one more instruction" buried in a giant system prompt.
  • Escalation runs after every specialist, not just at triage — a conversation that starts as a routine billing question but turns hostile mid-resolution still gets caught, because the Sentiment & Escalation Agent re-evaluates every turn.
  • Auditability — every node logs its input/output/tool-calls/guardrail-flags to Postgres (lib/logging/logger.ts), viewable per-conversation in the admin dashboard's Agent Trace. You can answer "why did the bot do that?" by reading a table, not by guessing at a prompt.

Architecture at a glance

flowchart TD
    START([customer message]) --> Triage[Triage Agent]
    Triage -->|knowledge_base| KB[Knowledge Base Agent]
    Triage -->|order_billing| Billing[Order/Billing Agent]
    Triage -->|troubleshooting| Trouble[Troubleshooting Agent]
    Triage -->|escalation| Escalation
    KB --> Escalation[Sentiment & Escalation Agent]
    Billing --> Escalation
    Trouble --> Escalation
    Escalation --> Quality[Response Quality Agent]
    Quality -->|guardrail violation, 1st time| Quality
    Quality -->|clean or 2nd pass| END([response])
Loading

Full system diagram, sequencing, and design-decision rationale: docs/ARCHITECTURE.md.

Agent responsibilities

Agent File Responsibility
Triage lib/agents/triage.ts Classifies intent, category, urgency, sentiment — always runs first
Knowledge Base lib/agents/knowledgeBase.ts RAG over kb-docs/*.md via pgvector; cites sources; says "I don't know" below a similarity threshold
Order/Billing lib/agents/orderBilling.ts Tool-calls into mock Postgres CRM/billing data; never free-types an order/invoice fact
Troubleshooting lib/agents/troubleshooting.ts Step-by-step technical fixes; tracks attempted fixes; asks clarifying questions only when needed
Sentiment & Escalation lib/agents/escalation.ts Detects anger/legal/VIP/repeat-contact signals; creates a ticket + Slack notification; writes the handoff summary
Response Quality lib/agents/qualityReview.ts Runs guardrails (lib/guardrails/policies.ts), polishes tone, regenerates once on a violation
Analytics lib/agents/analytics.ts Batch summary over tickets: category trends, escalation rate, LLM-suggested new FAQ articles

Tech stack

  • Frontend/Backend: Next.js 14 (App Router) + TypeScript + Tailwind CSS, Next.js API routes as the backend
  • Agent orchestration: LangGraph (@langchain/langgraph) with a typed shared state (lib/agents/state.ts)
  • LLM: OpenAI gpt-4o-mini (agents) + text-embedding-3-small (RAG)
  • Database + vector store: Supabase (Postgres + pgvector)
  • Auth: Clerk, protecting /dashboard/* and the admin API routes
  • Integrations: Slack Incoming Webhook (escalations), Gmail API via OAuth2 (email-based support, polled), mocked CRM/billing (seeded Postgres tables)
  • Observability: structured agent_logs table + console output, viewable per-conversation as an Agent Trace
  • Deployment: Vercel (app) + Supabase (DB) — see docs/DEPLOYMENT.md

Setup

npm install
cp .env.example .env.local      # fill in keys — see docs/DEPLOYMENT.md for where to get each one

Run lib/db/schema.sql in your Supabase SQL editor, then:

npm run seed          # demo customers, orders, invoices, subscriptions
npm run ingest-docs   # embeds kb-docs/*.md into pgvector
npm run dev           # http://localhost:3000

/chat is the customer-facing widget (pick a seeded customer or stay "Guest"). /dashboard is the Clerk-protected admin side (tickets, documents, analytics).

Full step-by-step for every integration (Supabase, OpenAI, Clerk, Slack, Gmail OAuth) and production deployment: docs/DEPLOYMENT.md.

Environment variables

See .env.example for the full annotated list — OpenAI, Supabase, Clerk, Slack, Gmail/Google OAuth, and app-level vars (NEXT_PUBLIC_APP_URL, CRON_SECRET).

Database schema

Full DDL in lib/db/schema.sql: customers, orders, invoices, subscriptions (mock CRM/billing), conversations, messages (chat history + citations + agent path), tickets (escalation queue), kb_documents / kb_chunks (RAG, with a match_kb_chunks pgvector similarity-search function), feedback (👍/👎), and agent_logs (observability).

Demo flow

Seven scenarios from triage through resolution or escalation, narrated step by step in docs/DEMO_SCRIPT.md: order status, duplicate-charge refund (real tool-backed action), a known troubleshooting fix, subscription cancellation, a cited KB policy answer, an anger/repeat-contact escalation with a live Slack notification, and a guardrail catch on an unsupported refund promise.

Screenshots

Add screenshots of /chat and /dashboard here after running the demo locally — see docs/DEMO_SCRIPT.md for which screens to capture.

Evaluation

eval/fixtures/cases.ts defines 14 cases (7 demo scenarios + 7 adversarial/boundary cases). npm run eval scores routing accuracy, escalation precision/recall, a hallucination spot-check on KB answers, and an LLM-judge helpfulness score, then writes docs/EVALUATION.md. See that file for the full methodology and how to regenerate it with live results.

Future improvements

  • Replace the regex-based guardrails with an additional LLM-judge pass to catch paraphrased policy violations.
  • Move Gmail from polling to real-time Pub/Sub push once the demo needs lower email-reply latency.
  • Add streaming responses in the chat UI (currently request/response) for perceived latency.
  • Add per-customer rate limiting and remove/protect the demo-only /api/customers mock-login endpoint.
  • Expand the eval suite with a held-out set scored by a human reviewer to calibrate the LLM-judge against.
  • Add LangSmith tracing as an optional, richer alternative to the custom agent_logs table for deeper debugging.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors