An orchestrated FastAPI + LangGraph system for automated inbox triage, contextual drafting, and learning-based response improvement.
This system automates email classification, draft generation, and response workflows through a multi-agent orchestration pipeline. The backend processes incoming messages through sequential agents that classify urgency and intent, gather contextual information, generate response drafts, and learn from human feedback. A React dashboard provides operational visibility into system metrics, processing status, and workflow state.
The project is currently in Phase 4 (Production Polish), with core orchestration, persistence, and analytics implemented. Remaining work focuses on classifier refinement, deployment infrastructure, and extended testing coverage.
The system follows an ingestion-orchestration-persistence-analytics pattern:
- Ingestion: Gmail API utilities fetch messages and parse them into structured records stored in SQLite.
- Orchestration: LangGraph state machine coordinates seven specialized agents:
- Classifier: Determines urgency, intent, sentiment, and response requirements
- Context Gatherer: Retrieves thread history, calendar availability, and related messages
- Research Agent: Performs optional web searches when external information is needed
- Draft Generator: Produces contextual response drafts
- Human Review: Interrupts workflow for approval, editing, or rejection
- Follow-up Scheduler: Extracts and schedules tasks and reminders
- Learning Agent: Captures feedback signals for system improvement
- Persistence: SQLAlchemy models store emails, classifications, drafts, feedback, learning signals, research results, tasks, and follow-ups. LangGraph checkpoints maintain execution state for resumability.
- Analytics: Aggregation service computes totals, acceptance rates, latency percentiles, distribution metrics, and recent activity timelines.
A Vite + React + Tailwind CSS + ShadCN UI dashboard provides:
- Metrics Overview: High-level cards showing inbox volume, human response mix, latency percentiles, and task load
- Activity Timeline: Daily counts of emails received, drafts generated, and approvals over a configurable window
- Workflow Radar: Scrollable table of emails with classification metadata, task badges, latency indicators, and status labels
- Detail Panel: Full context view including classification history, draft versions with user decisions, extracted tasks, follow-ups, research summaries, and live processing status via Server-Sent Events
- Runbook Panel: Manual validation controls for Gmail authorization, batch processing, and email reprocessing
The dashboard proxies API requests to the FastAPI backend and consumes Server-Sent Events streams for real-time workflow updates.
- LangGraph orchestration with seven-agent workflow (classifier, context, research, draft, review, follow-up, learning)
- FastAPI REST endpoints for inbox processing, review loop, and analytics
- SQLite persistence layer with SQLAlchemy models for all workflow entities
- Gmail API integration for email ingestion and thread hydration
- Calendar client for availability context (optional)
- Research agent with Tavily integration for web search
- Draft generation with tone and key points extraction
- Human-in-the-loop review endpoints (approve, edit, reject)
- Learning agent that captures feedback signals and edit diffs
- Analytics service with metrics aggregation (totals, acceptance, latency, distribution, activity)
- React dashboard with metrics overview, activity timeline, email table, and detail panel
- Server-Sent Events streaming for real-time workflow status
- Pytest test suite covering agents, API contracts, workflow execution, and dataset validation
- 50-email test dataset with expected classification labels
- Rule-based classifier fallback enhancement for improved dataset alignment
- Extended analytics (per-sender trends, error rates, agent retry statistics)
- Dashboard filtering and search capabilities
- Production deployment configuration (Docker, CI/CD)
- PostgreSQL migration for production-grade persistence
- SQLite-based LangGraph checkpointing (currently in-memory)
- Authentication and authorization layer
- Secrets management integration (vault-based)
- Enhanced calendar and research resiliency (retries, backoffs, alerts)
- Automated UI testing for dashboard components
- Extended test dataset coverage and validation
- API rate limiting and request throttling
- Structured logging and error monitoring enhancements
- Framework: FastAPI 0.115+
- Orchestration: LangGraph 0.2+, LangChain 0.3+
- Database: SQLite (SQLAlchemy 2.0+)
- Model Providers: Groq (Llama 3 70B), Google Gemini (2.0 Pro, Flash)
- Integrations: Gmail API, Google Calendar API, Tavily Search API
- Utilities: Pydantic 2.9+, Loguru, BeautifulSoup4, html2text
- Build Tool: Vite 7.2+
- Framework: React 18.2+
- Styling: Tailwind CSS 3.4+, ShadCN UI components
- HTTP Client: Axios
- Icons: Lucide React
- Framework: Pytest 7.4+
- Coverage: Unit tests (agents, utilities, database), integration tests (API, workflow), dataset validation
- Python 3.11+ (validated on 3.13)
- Node.js 18+
- API keys for Groq, Gemini, Tavily, and Gmail (optional for local development with stubs)
Inbox and Drafting:
GET /api/emails: Paginated list of emails with classification metadata, draft status, latency, and task/follow-up countsGET /api/emails/{email_id}: Full email detail including body, classification history, draft trail, tasks, follow-ups, and research contextPOST /api/emails/process: Trigger ingestion and LangGraph execution (supportsmax_results,hours,enable_interruptparameters)GET /api/emails/{email_id}/stream: Server-Sent Events stream for real-time workflow progress updatesPOST /api/emails/{email_id}/reprocess: Rerun LangGraph pipeline from stored email record
Human-in-the-Loop Review:
POST /api/review/{thread_id}/approve: Approve current draft and resume workflowPOST /api/review/{thread_id}/edit: Submit edited draft text with optional notes (learning agent captures differences)POST /api/review/{thread_id}/reject: Reject draft and capture feedback for learningGET /api/review/{thread_id}/status: Check review pending or completed status
Analytics:
GET /api/analytics/summary: System-wide metrics including totals, acceptance/edit/reject breakdown, latency percentiles (average, p50, p95), open tasks and follow-ups, distribution by urgency/intent/sentiment, and recent activity timeline (configurabledaysparameter)
Gmail Integration:
GET /api/emails/gmail/status: Check Gmail OAuth token presence and expiryPOST /api/emails/gmail/authorize: Initiate Gmail OAuth authorization flow
The dashboard provides an operational control room interface for monitoring system behavior:
- Metrics Overview: Four-card layout showing inbox volume (emails, classifications, drafts), human response mix (approval/edit/reject rates and counts), latency percentiles (average, p50, p95), and task load (open tasks, scheduled follow-ups, learning signals)
- Recent Activity Timeline: Sparkline-style visualization of daily email receipts, draft generations, and approvals over a 7-day window (configurable)
- Workflow Radar: Sortable table displaying email summaries with urgency badges, classification confidence, task/follow-up indicators, processing latency, and status labels (
awaiting_draft,awaiting_review,review_approve, etc.). Row selection opens the detail panel. - Detail Panel: Comprehensive view of selected email including classification snapshot with reasoning, draft history with user actions and timestamps, extracted tasks with due dates and priorities, scheduled follow-ups with trigger dates, research summary and metadata, audit metrics (processing start, agent path, errors), and live SSE status indicator
- Runbook Panel: Manual validation controls for Gmail authorization status checks, batch processing with presets (single recent email, small batch, etc.), and email reprocessing without re-fetching from Gmail
The test suite includes:
- Unit Tests: Agent logic (
tests/agents/), utility functions (tests/utils/), database models (tests/database/) - Integration Tests: Workflow-level graph execution and routing (
tests/workflow/test_graph_flow.py), API endpoints (tests/api/), end-to-end scenarios (tests/e2e/) - Dataset Validation: 50-email test dataset (
tests/data/test_dataset_emails.json) with expected classification labels covering all urgency levels, intent categories, and sentiment classes
Known Test Status: test_rule_based_classifier_alignment is marked xfail (strict=False) because the rule-based fallback requires enhanced heuristics to match gold labels. This will be addressed in the classifier upgrade phase.
Dashboard Testing: Currently exercised manually. Automated UI tests can be added as a follow-up task.
-
Rule-Based Classifier Enhancement: Upgrade
app/agents/classifier_fallback.pywith richer keyword matching, weighted heuristics, and improved alignment with the test dataset. Removexfailmarker after completion. -
Analytics Enrichment: Extend metrics service to track per-sender trends, error rates by agent, retry statistics, and commitment vs. completion timelines.
-
Dashboard Improvements: Add filtering by urgency, intent, pending review status, and sender. Improve SSE failure handling and implement proper API authentication and pagination.
-
Containerization: Create Dockerfile and docker-compose configuration for local development and deployment environments.
-
CI/CD Pipeline: Configure GitHub Actions for automated testing (backend
pytestand dashboardnpm run build), linting, and type checking. -
Database Migration: Migrate from SQLite to PostgreSQL for production-grade persistence, concurrent access, and advanced query capabilities.
-
Checkpoint Persistence: Implement SQLite-based LangGraph checkpointing to replace in-memory storage, enabling workflow resumability across restarts.
-
Authentication Layer: Introduce API key or OAuth-based authentication for production deployments.
-
Secrets Management: Integrate with a secrets vault (e.g., HashiCorp Vault, AWS Secrets Manager) for secure credential storage.
-
Resiliency Improvements: Add retry logic with exponential backoff for calendar and research API calls, implement alerting for persistent failures, and enhance error recovery mechanisms.
-
Extended Dataset: Expand test dataset beyond 50 emails to cover edge cases, domain-specific scenarios, and multi-language support.
-
Learning System Evaluation: Implement metrics to measure learning agent effectiveness over time, track improvement in draft acceptance rates, and validate feedback signal quality.
-
Performance Optimization: Profile agent execution times, optimize prompt templates, implement caching strategies for repeated context queries, and reduce latency through parallel agent execution where possible.
-
Rule-Based Classifier: Current fallback implementation uses basic keyword matching and does not align with the test dataset expectations. Marked
xfailin validation tests until enhanced. -
Security and Authentication: Development mode exposes APIs without authentication. Authentication layer required before production deployment.
-
Secrets Management: Environment variables stored in
.envfile suitable for local development only. Production requires integration with a secrets management system. -
Calendar and Research Resiliency: Failures in calendar or research API calls log warnings but suppress features. Consider implementing retries, backoffs, and alerting for persistent failures.
-
Checkpoint Storage: LangGraph checkpoints currently use in-memory storage. Workflow state is not persisted across application restarts.
-
Dashboard Testing: Frontend components are validated manually. Automated UI testing framework not yet implemented.
- Backend & Orchestration:
app/agents/,app/services/,app/api/,app/database/ - Dashboard:
dashboard/(Vite/React, Tailwind CSS, ShadCN UI) - Testing & Dataset:
tests/andtests/data/test_dataset_emails.json - Architecture Documentation:
claude_response.md(long-form design blueprint)
This project is open to collaboration and feedback. For issues, questions, or contributions, please open a GitHub issue or contact the maintainers. The README serves as the primary reference for architecture, setup, testing, and outstanding work items.