Skip to content

Satvik24511/Stratum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intelligent Email Triage & Response System

An orchestrated FastAPI + LangGraph system for automated inbox triage, contextual drafting, and learning-based response improvement.

Project Status License


Overview

This system automates email classification, draft generation, and response workflows through a multi-agent orchestration pipeline. The backend processes incoming messages through sequential agents that classify urgency and intent, gather contextual information, generate response drafts, and learn from human feedback. A React dashboard provides operational visibility into system metrics, processing status, and workflow state.

The project is currently in Phase 4 (Production Polish), with core orchestration, persistence, and analytics implemented. Remaining work focuses on classifier refinement, deployment infrastructure, and extended testing coverage.


Architecture Summary

Backend Pipeline

The system follows an ingestion-orchestration-persistence-analytics pattern:

  1. Ingestion: Gmail API utilities fetch messages and parse them into structured records stored in SQLite.
  2. Orchestration: LangGraph state machine coordinates seven specialized agents:
    • Classifier: Determines urgency, intent, sentiment, and response requirements
    • Context Gatherer: Retrieves thread history, calendar availability, and related messages
    • Research Agent: Performs optional web searches when external information is needed
    • Draft Generator: Produces contextual response drafts
    • Human Review: Interrupts workflow for approval, editing, or rejection
    • Follow-up Scheduler: Extracts and schedules tasks and reminders
    • Learning Agent: Captures feedback signals for system improvement
  3. Persistence: SQLAlchemy models store emails, classifications, drafts, feedback, learning signals, research results, tasks, and follow-ups. LangGraph checkpoints maintain execution state for resumability.
  4. Analytics: Aggregation service computes totals, acceptance rates, latency percentiles, distribution metrics, and recent activity timelines.

Frontend Dashboard

A Vite + React + Tailwind CSS + ShadCN UI dashboard provides:

  • Metrics Overview: High-level cards showing inbox volume, human response mix, latency percentiles, and task load
  • Activity Timeline: Daily counts of emails received, drafts generated, and approvals over a configurable window
  • Workflow Radar: Scrollable table of emails with classification metadata, task badges, latency indicators, and status labels
  • Detail Panel: Full context view including classification history, draft versions with user decisions, extracted tasks, follow-ups, research summaries, and live processing status via Server-Sent Events
  • Runbook Panel: Manual validation controls for Gmail authorization, batch processing, and email reprocessing

The dashboard proxies API requests to the FastAPI backend and consumes Server-Sent Events streams for real-time workflow updates.


Current Status

Completed

  • LangGraph orchestration with seven-agent workflow (classifier, context, research, draft, review, follow-up, learning)
  • FastAPI REST endpoints for inbox processing, review loop, and analytics
  • SQLite persistence layer with SQLAlchemy models for all workflow entities
  • Gmail API integration for email ingestion and thread hydration
  • Calendar client for availability context (optional)
  • Research agent with Tavily integration for web search
  • Draft generation with tone and key points extraction
  • Human-in-the-loop review endpoints (approve, edit, reject)
  • Learning agent that captures feedback signals and edit diffs
  • Analytics service with metrics aggregation (totals, acceptance, latency, distribution, activity)
  • React dashboard with metrics overview, activity timeline, email table, and detail panel
  • Server-Sent Events streaming for real-time workflow status
  • Pytest test suite covering agents, API contracts, workflow execution, and dataset validation
  • 50-email test dataset with expected classification labels

In Progress

  • Rule-based classifier fallback enhancement for improved dataset alignment
  • Extended analytics (per-sender trends, error rates, agent retry statistics)
  • Dashboard filtering and search capabilities
  • Production deployment configuration (Docker, CI/CD)

Planned / Future Improvements

  • PostgreSQL migration for production-grade persistence
  • SQLite-based LangGraph checkpointing (currently in-memory)
  • Authentication and authorization layer
  • Secrets management integration (vault-based)
  • Enhanced calendar and research resiliency (retries, backoffs, alerts)
  • Automated UI testing for dashboard components
  • Extended test dataset coverage and validation
  • API rate limiting and request throttling
  • Structured logging and error monitoring enhancements

Technology Stack

Backend

  • Framework: FastAPI 0.115+
  • Orchestration: LangGraph 0.2+, LangChain 0.3+
  • Database: SQLite (SQLAlchemy 2.0+)
  • Model Providers: Groq (Llama 3 70B), Google Gemini (2.0 Pro, Flash)
  • Integrations: Gmail API, Google Calendar API, Tavily Search API
  • Utilities: Pydantic 2.9+, Loguru, BeautifulSoup4, html2text

Frontend

  • Build Tool: Vite 7.2+
  • Framework: React 18.2+
  • Styling: Tailwind CSS 3.4+, ShadCN UI components
  • HTTP Client: Axios
  • Icons: Lucide React

Testing

  • Framework: Pytest 7.4+
  • Coverage: Unit tests (agents, utilities, database), integration tests (API, workflow), dataset validation

Prerequisites

  • Python 3.11+ (validated on 3.13)
  • Node.js 18+
  • API keys for Groq, Gemini, Tavily, and Gmail (optional for local development with stubs)

API and Dashboard Description

API Endpoints

Inbox and Drafting:

  • GET /api/emails: Paginated list of emails with classification metadata, draft status, latency, and task/follow-up counts
  • GET /api/emails/{email_id}: Full email detail including body, classification history, draft trail, tasks, follow-ups, and research context
  • POST /api/emails/process: Trigger ingestion and LangGraph execution (supports max_results, hours, enable_interrupt parameters)
  • GET /api/emails/{email_id}/stream: Server-Sent Events stream for real-time workflow progress updates
  • POST /api/emails/{email_id}/reprocess: Rerun LangGraph pipeline from stored email record

Human-in-the-Loop Review:

  • POST /api/review/{thread_id}/approve: Approve current draft and resume workflow
  • POST /api/review/{thread_id}/edit: Submit edited draft text with optional notes (learning agent captures differences)
  • POST /api/review/{thread_id}/reject: Reject draft and capture feedback for learning
  • GET /api/review/{thread_id}/status: Check review pending or completed status

Analytics:

  • GET /api/analytics/summary: System-wide metrics including totals, acceptance/edit/reject breakdown, latency percentiles (average, p50, p95), open tasks and follow-ups, distribution by urgency/intent/sentiment, and recent activity timeline (configurable days parameter)

Gmail Integration:

  • GET /api/emails/gmail/status: Check Gmail OAuth token presence and expiry
  • POST /api/emails/gmail/authorize: Initiate Gmail OAuth authorization flow

Dashboard Features

The dashboard provides an operational control room interface for monitoring system behavior:

  • Metrics Overview: Four-card layout showing inbox volume (emails, classifications, drafts), human response mix (approval/edit/reject rates and counts), latency percentiles (average, p50, p95), and task load (open tasks, scheduled follow-ups, learning signals)
  • Recent Activity Timeline: Sparkline-style visualization of daily email receipts, draft generations, and approvals over a 7-day window (configurable)
  • Workflow Radar: Sortable table displaying email summaries with urgency badges, classification confidence, task/follow-up indicators, processing latency, and status labels (awaiting_draft, awaiting_review, review_approve, etc.). Row selection opens the detail panel.
  • Detail Panel: Comprehensive view of selected email including classification snapshot with reasoning, draft history with user actions and timestamps, extracted tasks with due dates and priorities, scheduled follow-ups with trigger dates, research summary and metadata, audit metrics (processing start, agent path, errors), and live SSE status indicator
  • Runbook Panel: Manual validation controls for Gmail authorization status checks, batch processing with presets (single recent email, small batch, etc.), and email reprocessing without re-fetching from Gmail

Testing

Test Coverage

The test suite includes:

  • Unit Tests: Agent logic (tests/agents/), utility functions (tests/utils/), database models (tests/database/)
  • Integration Tests: Workflow-level graph execution and routing (tests/workflow/test_graph_flow.py), API endpoints (tests/api/), end-to-end scenarios (tests/e2e/)
  • Dataset Validation: 50-email test dataset (tests/data/test_dataset_emails.json) with expected classification labels covering all urgency levels, intent categories, and sentiment classes

Known Test Status: test_rule_based_classifier_alignment is marked xfail (strict=False) because the rule-based fallback requires enhanced heuristics to match gold labels. This will be addressed in the classifier upgrade phase.

Dashboard Testing: Currently exercised manually. Automated UI tests can be added as a follow-up task.


Roadmap and Future Work

Immediate Priorities

  1. Rule-Based Classifier Enhancement: Upgrade app/agents/classifier_fallback.py with richer keyword matching, weighted heuristics, and improved alignment with the test dataset. Remove xfail marker after completion.

  2. Analytics Enrichment: Extend metrics service to track per-sender trends, error rates by agent, retry statistics, and commitment vs. completion timelines.

  3. Dashboard Improvements: Add filtering by urgency, intent, pending review status, and sender. Improve SSE failure handling and implement proper API authentication and pagination.

Infrastructure and Deployment

  1. Containerization: Create Dockerfile and docker-compose configuration for local development and deployment environments.

  2. CI/CD Pipeline: Configure GitHub Actions for automated testing (backend pytest and dashboard npm run build), linting, and type checking.

  3. Database Migration: Migrate from SQLite to PostgreSQL for production-grade persistence, concurrent access, and advanced query capabilities.

System Enhancements

  1. Checkpoint Persistence: Implement SQLite-based LangGraph checkpointing to replace in-memory storage, enabling workflow resumability across restarts.

  2. Authentication Layer: Introduce API key or OAuth-based authentication for production deployments.

  3. Secrets Management: Integrate with a secrets vault (e.g., HashiCorp Vault, AWS Secrets Manager) for secure credential storage.

  4. Resiliency Improvements: Add retry logic with exponential backoff for calendar and research API calls, implement alerting for persistent failures, and enhance error recovery mechanisms.

Research and Evaluation

  1. Extended Dataset: Expand test dataset beyond 50 emails to cover edge cases, domain-specific scenarios, and multi-language support.

  2. Learning System Evaluation: Implement metrics to measure learning agent effectiveness over time, track improvement in draft acceptance rates, and validate feedback signal quality.

  3. Performance Optimization: Profile agent execution times, optimize prompt templates, implement caching strategies for repeated context queries, and reduce latency through parallel agent execution where possible.


Known Limitations

  • Rule-Based Classifier: Current fallback implementation uses basic keyword matching and does not align with the test dataset expectations. Marked xfail in validation tests until enhanced.

  • Security and Authentication: Development mode exposes APIs without authentication. Authentication layer required before production deployment.

  • Secrets Management: Environment variables stored in .env file suitable for local development only. Production requires integration with a secrets management system.

  • Calendar and Research Resiliency: Failures in calendar or research API calls log warnings but suppress features. Consider implementing retries, backoffs, and alerting for persistent failures.

  • Checkpoint Storage: LangGraph checkpoints currently use in-memory storage. Workflow state is not persisted across application restarts.

  • Dashboard Testing: Frontend components are validated manually. Automated UI testing framework not yet implemented.


Contact and Ownership

Component Ownership

  • Backend & Orchestration: app/agents/, app/services/, app/api/, app/database/
  • Dashboard: dashboard/ (Vite/React, Tailwind CSS, ShadCN UI)
  • Testing & Dataset: tests/ and tests/data/test_dataset_emails.json
  • Architecture Documentation: claude_response.md (long-form design blueprint)

Collaboration

This project is open to collaboration and feedback. For issues, questions, or contributions, please open a GitHub issue or contact the maintainers. The README serves as the primary reference for architecture, setup, testing, and outstanding work items.

About

Production-grade multi-agent email triage system built with LangGraph. Features 6 specialized agents for classification, context gathering, research, drafting, and adaptive learning. Includes FastAPI backend and React analytics dashboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors