Skip to content

Chaitanya-784/AutoStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ AutoStream AI Agent - Social-to-Lead Conversion System

An intelligent conversational AI agent that converts social media conversations into qualified business leads using state-of-the-art LLM technology and agentic workflows.


πŸ“‹ Table of Contents


🎯 Overview

This project implements a production-ready conversational AI agent for AutoStream, a fictional SaaS company offering AI-powered video editing tools. The agent demonstrates advanced capabilities in:

  1. Intent Recognition - Classifying user intent with context-awareness
  2. RAG-Powered Knowledge Retrieval - Answering questions using a local knowledge base
  3. Lead Qualification - Detecting high-intent users and capturing their information
  4. Tool Execution - Triggering backend actions through validated workflows

Tech Stack: Python, LangChain, LangGraph, Google Gemini 2.5 Flash


✨ Key Features

🧠 Intelligent Intent Classification

  • Multi-turn context awareness
  • Distinguishes between greetings, inquiries, and high-intent signals
  • Fallback logic for robustness
  • Handles conversation state transitions

πŸ“š Enhanced RAG Implementation

  • Direct context injection for accurate responses
  • Structured knowledge base with comprehensive product information
  • Fallback knowledge base if file is missing
  • Optimized prompts for concise, helpful answers

🎯 Smart Lead Capture

  • Progressive information gathering (Name β†’ Email β†’ Platform)
  • Multi-strategy extraction:
    • Regex patterns for email/platform
    • LLM-assisted name extraction
    • Heuristic validation
  • Prevents infinite loops with attempt tracking
  • Professional confirmation messages

πŸ› οΈ Production-Ready Features

  • Comprehensive error handling
  • Graceful degradation on failures
  • Extensive logging for debugging
  • Type hints and documentation
  • State management with validation

πŸ—οΈ Architecture

Why LangGraph?

I chose LangGraph over AutoGen for this implementation because:

  1. Explicit State Management: LangGraph provides a clear, type-safe state graph that makes conversation flow predictable and debuggable. The AgentState TypedDict explicitly defines all state variables, preventing bugs from implicit state changes.

  2. Deterministic Flow Control: Unlike AutoGen's agent-to-agent communication which can be non-deterministic, LangGraph uses explicit edges and conditional routing. This ensures:

    • Predictable conversation paths
    • Easier testing and validation
    • Clear separation of concerns
  3. Production Scalability: LangGraph's architecture naturally extends to:

    • Persistent state stores (Redis, PostgreSQL)
    • Streaming responses for real-time UX
    • Checkpointing for conversation resume
    • Easy integration with FastAPI/Flask
  4. Tool Calling Precision: The conditional routing in LangGraph ensures tools are called only when conditions are met (all 3 fields collected), preventing premature API calls.

State Management Strategy

The agent maintains state through a strongly-typed AgentState dictionary:

class AgentState(TypedDict):
    messages: List[BaseMessage]      # Conversation history
    lead_info: dict                  # {name, email, platform, validated}
    intent: str                      # Current intent (GREETING/INQUIRY/HIGH_INTENT)
    conversation_stage: str          # Workflow stage
    missing_fields: List[str]        # Tracks incomplete data
    attempts: int                    # Prevents infinite loops

State persistence across turns is achieved through:

  • operator.add annotation for message accumulation
  • Explicit state updates in each node's return value
  • Conditional routing based on state inspection

Memory window: Last 3 messages used for intent classification context while maintaining full conversation history for reference.

Graph Flow Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   START     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Intent Classifier β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β”œβ”€β”€[GREETING]──────────────┐
       β”‚                          β–Ό
       β”œβ”€β”€[INQUIRY]───────┐    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚                  β”‚    β”‚ Greeting β”‚
       β”‚                  β–Ό    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
       β”‚            β”‚   RAG   β”‚     β”‚
       β”‚            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜     β”‚
       β”‚                 β”‚          β”‚
       └──[HIGH_INTENT]──┼───────────
                         β”‚          β”‚
                         β–Ό          β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                  β”‚ Lead Capture β”‚  β”‚
                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                         β”‚          β”‚
                         β–Ό          β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚     END      β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Routing Logic:

  • If in QUALIFYING stage with missing fields β†’ Stay in Lead Capture
  • Otherwise, route based on latest intent classification
  • Lead Capture loops until all fields validated
  • All nodes eventually reach END state

πŸš€ Installation

Prerequisites

  • Python 3.9 or higher
  • Google API Key (Gemini)

Step-by-Step Setup

  1. Clone the repository
git clone https://github.com/yourusername/autostream-ai-agent.git
cd autostream-ai-agent
  1. Create virtual environment (recommended)
# Windows
python -m venv autoenv
autoenv\Scripts\activate

  1. Install dependencies
pip install -r requirements.txt
  1. Set up environment variables

Create a .env file in the project root:

GOOGLE_API_KEY=your_gemini_api_key_here

To get your Gemini API key:

  1. Verify knowledge base

Ensure knowledge_base.md is in the project root. A fallback is included if missing.


πŸ’» Usage

Running the Agent

python agent.py

Example Conversation Flow

You: Hi there!

Agent: Hello! πŸ‘‹ Welcome to AutoStream.

I'm here to help you streamline your video editing workflow. I can:
β€’ Explain our pricing and features
β€’ Answer questions about our AI-powered tools
β€’ Help you get started with a 14-day free trial

What would you like to know?

You: What's the difference between Basic and Pro?

Agent: The key differences between our plans:

Basic Plan ($29/month):
- 10 videos/month, 720p resolution
- Email support during business hours

Pro Plan ($79/month):
- Unlimited videos, 4K resolution
- AI captions in 40+ languages
- 24/7 premium support
- Priority processing (2x faster)
- Advanced AI features like smart scene detection

Plus, you can try Pro free for 14 days with no credit card needed!

You: That sounds great! I want to try the Pro plan for my YouTube channel.

Agent: Great! Let's get you set up. What's your name?

You: John Doe

Agent: Thanks, John! What email address should we use to contact you?

You: john@example.com

Agent: Perfect! Which platform do you primarily create content for? (YouTube, Instagram, TikTok, etc.)

You: YouTube

Agent: Perfect! I've got everything I need, John. βœ…

Your information has been captured:
β€’ Name: John Doe
β€’ Email: john@example.com
β€’ Platform: YouTube

Our team will reach out within 24 hours to get you started with AutoStream Pro...

πŸ“± WhatsApp Integration

Webhook-Based Architecture

To integrate this agent with WhatsApp, I would use the WhatsApp Business API with a webhook-based architecture:

High-Level Integration Flow

WhatsApp User β†’ WhatsApp Cloud API β†’ Webhook Server β†’ AI Agent β†’ Response β†’ WhatsApp
1. Infrastructure Setup
To establish the connection, the system uses a FastAPI web server that acts as a secure gateway to listen for real-time events from WhatsApp. This server exposes two critical endpoints: a GET route that handles the initial verification "handshake" by checking a security token to confirm the server's legitimacy with Meta, and a POST route that actively captures incoming message payloads. When a user sends a message, WhatsApp pushes the data to this listener, where the system extracts the sender's phone number and the text body to pass them downstream for immediate processing.

2. Session Management
Because WhatsApp communication is asynchronous and stateless, the system utilizes Redis as a high-speed database to maintain the conversational memory for each unique user. When a message arrives, the code uses the sender's phone number as a key to look up their specific conversation history; if a session is found, the previous context is loaded, but if not, a fresh state is initialized. After the AI generates a response, the updated state is serialized and saved back to Redis with a set expiration time (e.g., 24 hours), ensuring the agent remembers the context of the chat while automatically clearing out old, inactive data.

3. WhatsApp-Specific Adaptations
The final stage involves formatting and transmitting the AI's output back to the user through the official WhatsApp Cloud API. The system includes a safety utility that automatically splits any AI response exceeding 4,096 characters into smaller chunks to comply with WhatsApp's strict message limits. Once the text is prepared, the system authenticates using a secure access token and dispatches an HTTP POST request to the Meta Graph API, routing the reply specifically to the user's phone number to complete the interaction loop.

4. Deployment Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  WhatsApp User  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ WhatsApp Cloud  β”‚
β”‚      API        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό HTTPS Webhook
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Load Balancer β”‚
β”‚   (nginx/AWS)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    β”‚         β”‚
    β–Ό         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”
β”‚FastAPI β”‚FastAPIβ”‚  (Horizontal scaling)
β”‚Server 1β”‚Server 2β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜β””β”€β”€β”€β”¬β”€β”€β”˜
    β”‚         β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Redis Cache    β”‚  (Session storage)
β”‚  + Agent State  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   PostgreSQL    β”‚  (Lead storage)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5. Required Components

  • Webhook Server: FastAPI/Flask hosted on cloud (AWS, GCP, Railway)
  • Public HTTPS URL: Required by WhatsApp (use ngrok for testing)
  • Session Store: Redis for conversation state
  • Database: PostgreSQL for captured leads
  • Message Queue: Celery/RabbitMQ for async processing (optional but recommended)

6. Key Considerations

  • Rate Limiting: WhatsApp has rate limits (1000 messages/user/day for free tier)
  • Message Templates: Pre-approved templates needed for proactive messages
  • Media Handling: Support images/videos users might send
  • Latency: Target <3 second response time (use streaming if longer)
  • Error Handling: Implement retry logic for failed webhooks
  • Security: Validate webhook signatures to prevent spoofing

πŸ” Technical Deep Dive

RAG Implementation

Unlike traditional vector database approaches, this implementation uses direct context injection for RAG:

prompt = f"""You are AutoStream's AI assistant.

Knowledge Base:
{KNOWLEDGE_BASE_TEXT}  # Entire markdown file loaded into context

User Query: {query}
"""

Advantages:

  • No embedding API costs or complexity
  • Perfect for small knowledge bases (<10k tokens)
  • 100% retrieval accuracy (entire context available)
  • Simpler to debug and maintain

When to use vector DB instead:

  • Knowledge base >20k tokens
  • Need semantic search across many documents
  • Frequent updates requiring re-indexing

Intent Classification Logic

Three-layer approach for robustness:

  1. LLM Classification: Primary method using Gemini with structured prompt
  2. Keyword Fallback: If LLM returns invalid category, use keyword matching
  3. Context Awareness: Uses last 3 messages to understand conversation flow
# Example: Detecting HIGH_INTENT after pricing question
History: 
  User: "What's your pricing?"
  Agent: [explains plans]
  User: "That sounds perfect!"  # Context makes this HIGH_INTENT

Lead Capture State Machine

Progressive disclosure pattern:

Start β†’ Ask Name β†’ Ask Email β†’ Ask Platform β†’ Execute Tool β†’ End
                ↑_______|________|___________|
                    (Repeat if missing)

Validation layers:

  1. Regex extraction (email, platform keywords)
  2. LLM-assisted extraction (name from natural language)
  3. User confirmation before tool execution

πŸ“Š Demo & Examples

Example 1: Pricing Inquiry

User: How much does AutoStream cost?

Agent: AutoStream offers two plans:

1. Basic Plan - $29/month
   - 10 videos/month
   - 720p resolution
   - Email support

2. Pro Plan - $79/month (Most Popular)
   - Unlimited videos
   - 4K resolution
   - AI captions
   - 24/7 support

Both plans come with a 14-day free trial. Which features interest you most?

Example 2: Policy Question

User: What's your refund policy?

Agent: We offer a 7-day money-back guarantee on all plans. If you're not satisfied 
within the first week, we'll refund your payment, no questions asked. After 7 days, 
refunds are not available, but you can cancel anytime to avoid future charges.

Is there anything specific about the policy you'd like to know?

Example 3: High-Intent Lead Capture

User: I want to sign up for Pro plan

Agent: Great! Let's get you set up. What's your name?

User: Sarah Johnson

Agent: Thanks, Sarah! What email address should we use to contact you?

User: sarah.j@gmail.com

Agent: Perfect! Which platform do you primarily create content for?

User: I create content for Instagram and TikTok

Agent: Perfect! I've got everything I need, Sarah. βœ…
[Lead Captured Successfully - see console output]

πŸ“ Project Structure

autostream-ai-agent/
β”‚
β”œβ”€β”€ agent.py                 # Main agent implementation
β”œβ”€β”€ knowledge_base.md        # RAG knowledge base
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ .env                     # Environment variables (create this)
β”œβ”€β”€ .env.example            # Template for environment variables
β”‚
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ DEMO.md                 # Detailed demo scenarios
β”‚
└── tests/                  # Unit tests (future)
    β”œβ”€β”€ test_intent.py
    β”œβ”€β”€ test_rag.py
    └── test_lead_capture.py

πŸŽ“ Learning Outcomes

This project demonstrates proficiency in:

  1. LLM Application Development

    • Prompt engineering for classification and extraction
    • Context window management
    • Temperature tuning for consistency
  2. State Machine Design

    • Conditional routing based on conversation stage
    • Loop prevention and error recovery
    • State persistence strategies
  3. Production Engineering

    • Comprehensive error handling
    • Logging and debugging
    • Type safety with TypedDict
    • Graceful degradation
  4. AI Agent Patterns

    • Tool calling workflows
    • Progressive information gathering
    • Multi-turn context tracking

πŸš€ Future Enhancements

Phase 1: Robustness

  • Add unit tests (pytest)
  • Implement conversation logging to database
  • Add monitoring/metrics (response time, success rate)
  • Support multiple LLM providers (OpenAI, Anthropic)

Phase 2: Features

  • Multi-language support
  • Voice input/output integration
  • Sentiment analysis for better intent detection
  • A/B testing framework for prompt variations

Phase 3: Scale

  • Deploy as REST API (FastAPI)
  • Add Redis for distributed state
  • Implement conversation replay/debugging UI
  • Real WhatsApp integration
  • Slack/Discord bot versions

πŸ“„ License

MIT License - feel free to use this project as a learning resource or foundation for your own agents.


πŸ™ Acknowledgments

  • LangChain Team - For the excellent LangGraph framework
  • Google DeepMind - For Gemini API access
  • ServiceHive - For the challenging and educational assignment

⭐ If you found this helpful, please consider starring the repository!

Last Updated: January 2026

About

An intelligent conversational AI agent that converts social media conversations into qualified business leads using state-of-the-art LLM technology and agentic workflows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages