Intelligent Email Triage & Response System

An orchestrated FastAPI + LangGraph system for automated inbox triage, contextual drafting, and learning-based response improvement.

Overview

This system automates email classification, draft generation, and response workflows through a multi-agent orchestration pipeline. The backend processes incoming messages through sequential agents that classify urgency and intent, gather contextual information, generate response drafts, and learn from human feedback. A React dashboard provides operational visibility into system metrics, processing status, and workflow state.

The project is currently in Phase 4 (Production Polish), with core orchestration, persistence, and analytics implemented. Remaining work focuses on classifier refinement, deployment infrastructure, and extended testing coverage.

Architecture Summary

Backend Pipeline

The system follows an ingestion-orchestration-persistence-analytics pattern:

Ingestion: Gmail API utilities fetch messages and parse them into structured records stored in SQLite.
Orchestration: LangGraph state machine coordinates seven specialized agents:
- Classifier: Determines urgency, intent, sentiment, and response requirements
- Context Gatherer: Retrieves thread history, calendar availability, and related messages
- Research Agent: Performs optional web searches when external information is needed
- Draft Generator: Produces contextual response drafts
- Human Review: Interrupts workflow for approval, editing, or rejection
- Follow-up Scheduler: Extracts and schedules tasks and reminders
- Learning Agent: Captures feedback signals for system improvement
Persistence: SQLAlchemy models store emails, classifications, drafts, feedback, learning signals, research results, tasks, and follow-ups. LangGraph checkpoints maintain execution state for resumability.
Analytics: Aggregation service computes totals, acceptance rates, latency percentiles, distribution metrics, and recent activity timelines.

Frontend Dashboard

A Vite + React + Tailwind CSS + ShadCN UI dashboard provides:

Metrics Overview: High-level cards showing inbox volume, human response mix, latency percentiles, and task load
Activity Timeline: Daily counts of emails received, drafts generated, and approvals over a configurable window
Workflow Radar: Scrollable table of emails with classification metadata, task badges, latency indicators, and status labels
Detail Panel: Full context view including classification history, draft versions with user decisions, extracted tasks, follow-ups, research summaries, and live processing status via Server-Sent Events
Runbook Panel: Manual validation controls for Gmail authorization, batch processing, and email reprocessing

The dashboard proxies API requests to the FastAPI backend and consumes Server-Sent Events streams for real-time workflow updates.

Current Status

Completed

In Progress

Rule-based classifier fallback enhancement for improved dataset alignment
Extended analytics (per-sender trends, error rates, agent retry statistics)
Dashboard filtering and search capabilities
Production deployment configuration (Docker, CI/CD)

Planned / Future Improvements

PostgreSQL migration for production-grade persistence
SQLite-based LangGraph checkpointing (currently in-memory)
Authentication and authorization layer
Secrets management integration (vault-based)
Enhanced calendar and research resiliency (retries, backoffs, alerts)
Automated UI testing for dashboard components
Extended test dataset coverage and validation
API rate limiting and request throttling
Structured logging and error monitoring enhancements

Technology Stack

Backend

Framework: FastAPI 0.115+
Orchestration: LangGraph 0.2+, LangChain 0.3+
Database: SQLite (SQLAlchemy 2.0+)
Model Providers: Groq (Llama 3 70B), Google Gemini (2.0 Pro, Flash)
Integrations: Gmail API, Google Calendar API, Tavily Search API
Utilities: Pydantic 2.9+, Loguru, BeautifulSoup4, html2text

Frontend

Build Tool: Vite 7.2+
Framework: React 18.2+
Styling: Tailwind CSS 3.4+, ShadCN UI components
HTTP Client: Axios
Icons: Lucide React

Testing

Framework: Pytest 7.4+
Coverage: Unit tests (agents, utilities, database), integration tests (API, workflow), dataset validation

Prerequisites

Python 3.11+ (validated on 3.13)
Node.js 18+
API keys for Groq, Gemini, Tavily, and Gmail (optional for local development with stubs)

API and Dashboard Description

API Endpoints

Inbox and Drafting:

GET /api/emails: Paginated list of emails with classification metadata, draft status, latency, and task/follow-up counts
GET /api/emails/{email_id}: Full email detail including body, classification history, draft trail, tasks, follow-ups, and research context
POST /api/emails/process: Trigger ingestion and LangGraph execution (supports max_results, hours, enable_interrupt parameters)
GET /api/emails/{email_id}/stream: Server-Sent Events stream for real-time workflow progress updates
POST /api/emails/{email_id}/reprocess: Rerun LangGraph pipeline from stored email record

Human-in-the-Loop Review:

POST /api/review/{thread_id}/approve: Approve current draft and resume workflow
POST /api/review/{thread_id}/edit: Submit edited draft text with optional notes (learning agent captures differences)
POST /api/review/{thread_id}/reject: Reject draft and capture feedback for learning
GET /api/review/{thread_id}/status: Check review pending or completed status

Analytics:

GET /api/analytics/summary: System-wide metrics including totals, acceptance/edit/reject breakdown, latency percentiles (average, p50, p95), open tasks and follow-ups, distribution by urgency/intent/sentiment, and recent activity timeline (configurable days parameter)

Gmail Integration:

GET /api/emails/gmail/status: Check Gmail OAuth token presence and expiry
POST /api/emails/gmail/authorize: Initiate Gmail OAuth authorization flow

Dashboard Features

The dashboard provides an operational control room interface for monitoring system behavior:

Metrics Overview: Four-card layout showing inbox volume (emails, classifications, drafts), human response mix (approval/edit/reject rates and counts), latency percentiles (average, p50, p95), and task load (open tasks, scheduled follow-ups, learning signals)
Recent Activity Timeline: Sparkline-style visualization of daily email receipts, draft generations, and approvals over a 7-day window (configurable)
Workflow Radar: Sortable table displaying email summaries with urgency badges, classification confidence, task/follow-up indicators, processing latency, and status labels (awaiting_draft, awaiting_review, review_approve, etc.). Row selection opens the detail panel.
Detail Panel: Comprehensive view of selected email including classification snapshot with reasoning, draft history with user actions and timestamps, extracted tasks with due dates and priorities, scheduled follow-ups with trigger dates, research summary and metadata, audit metrics (processing start, agent path, errors), and live SSE status indicator
Runbook Panel: Manual validation controls for Gmail authorization status checks, batch processing with presets (single recent email, small batch, etc.), and email reprocessing without re-fetching from Gmail

Testing

Test Coverage

The test suite includes:

Unit Tests: Agent logic (tests/agents/), utility functions (tests/utils/), database models (tests/database/)
Integration Tests: Workflow-level graph execution and routing (tests/workflow/test_graph_flow.py), API endpoints (tests/api/), end-to-end scenarios (tests/e2e/)
Dataset Validation: 50-email test dataset (tests/data/test_dataset_emails.json) with expected classification labels covering all urgency levels, intent categories, and sentiment classes

Known Test Status: test_rule_based_classifier_alignment is marked xfail (strict=False) because the rule-based fallback requires enhanced heuristics to match gold labels. This will be addressed in the classifier upgrade phase.

Dashboard Testing: Currently exercised manually. Automated UI tests can be added as a follow-up task.

Roadmap and Future Work

Immediate Priorities

Rule-Based Classifier Enhancement: Upgrade app/agents/classifier_fallback.py with richer keyword matching, weighted heuristics, and improved alignment with the test dataset. Remove xfail marker after completion.
Analytics Enrichment: Extend metrics service to track per-sender trends, error rates by agent, retry statistics, and commitment vs. completion timelines.
Dashboard Improvements: Add filtering by urgency, intent, pending review status, and sender. Improve SSE failure handling and implement proper API authentication and pagination.

Infrastructure and Deployment

Containerization: Create Dockerfile and docker-compose configuration for local development and deployment environments.
CI/CD Pipeline: Configure GitHub Actions for automated testing (backend pytest and dashboard npm run build), linting, and type checking.
Database Migration: Migrate from SQLite to PostgreSQL for production-grade persistence, concurrent access, and advanced query capabilities.

System Enhancements

Checkpoint Persistence: Implement SQLite-based LangGraph checkpointing to replace in-memory storage, enabling workflow resumability across restarts.
Authentication Layer: Introduce API key or OAuth-based authentication for production deployments.
Secrets Management: Integrate with a secrets vault (e.g., HashiCorp Vault, AWS Secrets Manager) for secure credential storage.
Resiliency Improvements: Add retry logic with exponential backoff for calendar and research API calls, implement alerting for persistent failures, and enhance error recovery mechanisms.

Research and Evaluation

Extended Dataset: Expand test dataset beyond 50 emails to cover edge cases, domain-specific scenarios, and multi-language support.
Learning System Evaluation: Implement metrics to measure learning agent effectiveness over time, track improvement in draft acceptance rates, and validate feedback signal quality.
Performance Optimization: Profile agent execution times, optimize prompt templates, implement caching strategies for repeated context queries, and reduce latency through parallel agent execution where possible.

Known Limitations

Rule-Based Classifier: Current fallback implementation uses basic keyword matching and does not align with the test dataset expectations. Marked xfail in validation tests until enhanced.
Security and Authentication: Development mode exposes APIs without authentication. Authentication layer required before production deployment.
Secrets Management: Environment variables stored in .env file suitable for local development only. Production requires integration with a secrets management system.
Calendar and Research Resiliency: Failures in calendar or research API calls log warnings but suppress features. Consider implementing retries, backoffs, and alerting for persistent failures.
Checkpoint Storage: LangGraph checkpoints currently use in-memory storage. Workflow state is not persisted across application restarts.
Dashboard Testing: Frontend components are validated manually. Automated UI testing framework not yet implemented.

Contact and Ownership

Component Ownership

Backend & Orchestration: app/agents/, app/services/, app/api/, app/database/
Dashboard: dashboard/ (Vite/React, Tailwind CSS, ShadCN UI)
Testing & Dataset: tests/ and tests/data/test_dataset_emails.json
Architecture Documentation: claude_response.md (long-form design blueprint)

Collaboration

This project is open to collaboration and feedback. For issues, questions, or contributions, please open a GitHub issue or contact the maintainers. The README serves as the primary reference for architecture, setup, testing, and outstanding work items.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
dashboard		dashboard
tests		tests
.gitignore		.gitignore
README.md		README.md
calendar_token.json		calendar_token.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligent Email Triage & Response System

Overview

Architecture Summary

Backend Pipeline

Frontend Dashboard

Current Status

Completed

In Progress

Planned / Future Improvements

Technology Stack

Backend

Frontend

Testing

Prerequisites

API and Dashboard Description

API Endpoints

Dashboard Features

Testing

Test Coverage

Roadmap and Future Work

Immediate Priorities

Infrastructure and Deployment

System Enhancements

Research and Evaluation

Known Limitations

Contact and Ownership

Component Ownership

Collaboration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intelligent Email Triage & Response System

Overview

Architecture Summary

Backend Pipeline

Frontend Dashboard

Current Status

Completed

In Progress

Planned / Future Improvements

Technology Stack

Backend

Frontend

Testing

Prerequisites

API and Dashboard Description

API Endpoints

Dashboard Features

Testing

Test Coverage

Roadmap and Future Work

Immediate Priorities

Infrastructure and Deployment

System Enhancements

Research and Evaluation

Known Limitations

Contact and Ownership

Component Ownership

Collaboration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages