An intelligent conversational AI agent that converts social media conversations into qualified business leads using state-of-the-art LLM technology and agentic workflows.
- Overview
- Key Features
- Architecture
- Installation
- Usage
- WhatsApp Integration
- Technical Deep Dive
- Demo & Examples
- Project Structure
- Future Enhancements
This project implements a production-ready conversational AI agent for AutoStream, a fictional SaaS company offering AI-powered video editing tools. The agent demonstrates advanced capabilities in:
- Intent Recognition - Classifying user intent with context-awareness
- RAG-Powered Knowledge Retrieval - Answering questions using a local knowledge base
- Lead Qualification - Detecting high-intent users and capturing their information
- Tool Execution - Triggering backend actions through validated workflows
Tech Stack: Python, LangChain, LangGraph, Google Gemini 2.5 Flash
- Multi-turn context awareness
- Distinguishes between greetings, inquiries, and high-intent signals
- Fallback logic for robustness
- Handles conversation state transitions
- Direct context injection for accurate responses
- Structured knowledge base with comprehensive product information
- Fallback knowledge base if file is missing
- Optimized prompts for concise, helpful answers
- Progressive information gathering (Name β Email β Platform)
- Multi-strategy extraction:
- Regex patterns for email/platform
- LLM-assisted name extraction
- Heuristic validation
- Prevents infinite loops with attempt tracking
- Professional confirmation messages
- Comprehensive error handling
- Graceful degradation on failures
- Extensive logging for debugging
- Type hints and documentation
- State management with validation
I chose LangGraph over AutoGen for this implementation because:
-
Explicit State Management: LangGraph provides a clear, type-safe state graph that makes conversation flow predictable and debuggable. The
AgentStateTypedDict explicitly defines all state variables, preventing bugs from implicit state changes. -
Deterministic Flow Control: Unlike AutoGen's agent-to-agent communication which can be non-deterministic, LangGraph uses explicit edges and conditional routing. This ensures:
- Predictable conversation paths
- Easier testing and validation
- Clear separation of concerns
-
Production Scalability: LangGraph's architecture naturally extends to:
- Persistent state stores (Redis, PostgreSQL)
- Streaming responses for real-time UX
- Checkpointing for conversation resume
- Easy integration with FastAPI/Flask
-
Tool Calling Precision: The conditional routing in LangGraph ensures tools are called only when conditions are met (all 3 fields collected), preventing premature API calls.
The agent maintains state through a strongly-typed AgentState dictionary:
class AgentState(TypedDict):
messages: List[BaseMessage] # Conversation history
lead_info: dict # {name, email, platform, validated}
intent: str # Current intent (GREETING/INQUIRY/HIGH_INTENT)
conversation_stage: str # Workflow stage
missing_fields: List[str] # Tracks incomplete data
attempts: int # Prevents infinite loopsState persistence across turns is achieved through:
operator.addannotation for message accumulation- Explicit state updates in each node's return value
- Conditional routing based on state inspection
Memory window: Last 3 messages used for intent classification context while maintaining full conversation history for reference.
βββββββββββββββ
β START β
ββββββββ¬βββββββ
β
βΌ
ββββββββββββββββββββ
β Intent Classifier β
ββββββββ¬ββββββββββββ
β
βββ[GREETING]βββββββββββββββ
β βΌ
βββ[INQUIRY]ββββββββ ββββββββββββ
β β β Greeting β
β βΌ ββββββ¬ββββββ
β βββββββββββ β
β β RAG β β
β ββββββ¬βββββ β
β β β
βββ[HIGH_INTENT]βββΌβββββββββββ€
β β
βΌ β
ββββββββββββββββ β
β Lead Capture β β
ββββββββ¬ββββββββ β
β β
βΌ βΌ
ββββββββββββββββ
β END β
ββββββββββββββββ
Routing Logic:
- If in
QUALIFYINGstage with missing fields β Stay in Lead Capture - Otherwise, route based on latest intent classification
- Lead Capture loops until all fields validated
- All nodes eventually reach END state
- Python 3.9 or higher
- Google API Key (Gemini)
- Clone the repository
git clone https://github.com/yourusername/autostream-ai-agent.git
cd autostream-ai-agent- Create virtual environment (recommended)
# Windows
python -m venv autoenv
autoenv\Scripts\activate
- Install dependencies
pip install -r requirements.txt- Set up environment variables
Create a .env file in the project root:
GOOGLE_API_KEY=your_gemini_api_key_hereTo get your Gemini API key:
- Visit Google AI Studio
- Create a new API key
- Copy it to your
.envfile
- Verify knowledge base
Ensure knowledge_base.md is in the project root. A fallback is included if missing.
python agent.pyYou: Hi there!
Agent: Hello! π Welcome to AutoStream.
I'm here to help you streamline your video editing workflow. I can:
β’ Explain our pricing and features
β’ Answer questions about our AI-powered tools
β’ Help you get started with a 14-day free trial
What would you like to know?
You: What's the difference between Basic and Pro?
Agent: The key differences between our plans:
Basic Plan ($29/month):
- 10 videos/month, 720p resolution
- Email support during business hours
Pro Plan ($79/month):
- Unlimited videos, 4K resolution
- AI captions in 40+ languages
- 24/7 premium support
- Priority processing (2x faster)
- Advanced AI features like smart scene detection
Plus, you can try Pro free for 14 days with no credit card needed!
You: That sounds great! I want to try the Pro plan for my YouTube channel.
Agent: Great! Let's get you set up. What's your name?
You: John Doe
Agent: Thanks, John! What email address should we use to contact you?
You: john@example.com
Agent: Perfect! Which platform do you primarily create content for? (YouTube, Instagram, TikTok, etc.)
You: YouTube
Agent: Perfect! I've got everything I need, John. β
Your information has been captured:
β’ Name: John Doe
β’ Email: john@example.com
β’ Platform: YouTube
Our team will reach out within 24 hours to get you started with AutoStream Pro...
To integrate this agent with WhatsApp, I would use the WhatsApp Business API with a webhook-based architecture:
WhatsApp User β WhatsApp Cloud API β Webhook Server β AI Agent β Response β WhatsApp
1. Infrastructure Setup
To establish the connection, the system uses a FastAPI web server that acts as a secure gateway to listen for real-time events from WhatsApp. This server exposes two critical endpoints: a GET route that handles the initial verification "handshake" by checking a security token to confirm the server's legitimacy with Meta, and a POST route that actively captures incoming message payloads. When a user sends a message, WhatsApp pushes the data to this listener, where the system extracts the sender's phone number and the text body to pass them downstream for immediate processing.
2. Session Management
Because WhatsApp communication is asynchronous and stateless, the system utilizes Redis as a high-speed database to maintain the conversational memory for each unique user. When a message arrives, the code uses the sender's phone number as a key to look up their specific conversation history; if a session is found, the previous context is loaded, but if not, a fresh state is initialized. After the AI generates a response, the updated state is serialized and saved back to Redis with a set expiration time (e.g., 24 hours), ensuring the agent remembers the context of the chat while automatically clearing out old, inactive data.
3. WhatsApp-Specific Adaptations
The final stage involves formatting and transmitting the AI's output back to the user through the official WhatsApp Cloud API. The system includes a safety utility that automatically splits any AI response exceeding 4,096 characters into smaller chunks to comply with WhatsApp's strict message limits. Once the text is prepared, the system authenticates using a secure access token and dispatches an HTTP POST request to the Meta Graph API, routing the reply specifically to the user's phone number to complete the interaction loop.
4. Deployment Architecture
βββββββββββββββββββ
β WhatsApp User β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β WhatsApp Cloud β
β API β
ββββββββββ¬βββββββββ
β
βΌ HTTPS Webhook
βββββββββββββββββββ
β Load Balancer β
β (nginx/AWS) β
ββββββββββ¬βββββββββ
β
ββββββ΄βββββ
β β
βΌ βΌ
ββββββββ ββββββββ
βFastAPI βFastAPIβ (Horizontal scaling)
βServer 1βServer 2β
βββββ¬ββββββββββ¬βββ
β β
ββββββ¬βββββ
βΌ
βββββββββββββββββββ
β Redis Cache β (Session storage)
β + Agent State β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β PostgreSQL β (Lead storage)
βββββββββββββββββββ
5. Required Components
- Webhook Server: FastAPI/Flask hosted on cloud (AWS, GCP, Railway)
- Public HTTPS URL: Required by WhatsApp (use ngrok for testing)
- Session Store: Redis for conversation state
- Database: PostgreSQL for captured leads
- Message Queue: Celery/RabbitMQ for async processing (optional but recommended)
6. Key Considerations
- Rate Limiting: WhatsApp has rate limits (1000 messages/user/day for free tier)
- Message Templates: Pre-approved templates needed for proactive messages
- Media Handling: Support images/videos users might send
- Latency: Target <3 second response time (use streaming if longer)
- Error Handling: Implement retry logic for failed webhooks
- Security: Validate webhook signatures to prevent spoofing
Unlike traditional vector database approaches, this implementation uses direct context injection for RAG:
prompt = f"""You are AutoStream's AI assistant.
Knowledge Base:
{KNOWLEDGE_BASE_TEXT} # Entire markdown file loaded into context
User Query: {query}
"""Advantages:
- No embedding API costs or complexity
- Perfect for small knowledge bases (<10k tokens)
- 100% retrieval accuracy (entire context available)
- Simpler to debug and maintain
When to use vector DB instead:
- Knowledge base >20k tokens
- Need semantic search across many documents
- Frequent updates requiring re-indexing
Three-layer approach for robustness:
- LLM Classification: Primary method using Gemini with structured prompt
- Keyword Fallback: If LLM returns invalid category, use keyword matching
- Context Awareness: Uses last 3 messages to understand conversation flow
# Example: Detecting HIGH_INTENT after pricing question
History:
User: "What's your pricing?"
Agent: [explains plans]
User: "That sounds perfect!" # Context makes this HIGH_INTENTProgressive disclosure pattern:
Start β Ask Name β Ask Email β Ask Platform β Execute Tool β End
β_______|________|___________|
(Repeat if missing)
Validation layers:
- Regex extraction (email, platform keywords)
- LLM-assisted extraction (name from natural language)
- User confirmation before tool execution
User: How much does AutoStream cost?
Agent: AutoStream offers two plans:
1. Basic Plan - $29/month
- 10 videos/month
- 720p resolution
- Email support
2. Pro Plan - $79/month (Most Popular)
- Unlimited videos
- 4K resolution
- AI captions
- 24/7 support
Both plans come with a 14-day free trial. Which features interest you most?
User: What's your refund policy?
Agent: We offer a 7-day money-back guarantee on all plans. If you're not satisfied
within the first week, we'll refund your payment, no questions asked. After 7 days,
refunds are not available, but you can cancel anytime to avoid future charges.
Is there anything specific about the policy you'd like to know?
User: I want to sign up for Pro plan
Agent: Great! Let's get you set up. What's your name?
User: Sarah Johnson
Agent: Thanks, Sarah! What email address should we use to contact you?
User: sarah.j@gmail.com
Agent: Perfect! Which platform do you primarily create content for?
User: I create content for Instagram and TikTok
Agent: Perfect! I've got everything I need, Sarah. β
[Lead Captured Successfully - see console output]
autostream-ai-agent/
β
βββ agent.py # Main agent implementation
βββ knowledge_base.md # RAG knowledge base
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (create this)
βββ .env.example # Template for environment variables
β
βββ README.md # This file
βββ DEMO.md # Detailed demo scenarios
β
βββ tests/ # Unit tests (future)
βββ test_intent.py
βββ test_rag.py
βββ test_lead_capture.py
This project demonstrates proficiency in:
-
LLM Application Development
- Prompt engineering for classification and extraction
- Context window management
- Temperature tuning for consistency
-
State Machine Design
- Conditional routing based on conversation stage
- Loop prevention and error recovery
- State persistence strategies
-
Production Engineering
- Comprehensive error handling
- Logging and debugging
- Type safety with TypedDict
- Graceful degradation
-
AI Agent Patterns
- Tool calling workflows
- Progressive information gathering
- Multi-turn context tracking
- Add unit tests (pytest)
- Implement conversation logging to database
- Add monitoring/metrics (response time, success rate)
- Support multiple LLM providers (OpenAI, Anthropic)
- Multi-language support
- Voice input/output integration
- Sentiment analysis for better intent detection
- A/B testing framework for prompt variations
- Deploy as REST API (FastAPI)
- Add Redis for distributed state
- Implement conversation replay/debugging UI
- Real WhatsApp integration
- Slack/Discord bot versions
MIT License - feel free to use this project as a learning resource or foundation for your own agents.
- LangChain Team - For the excellent LangGraph framework
- Google DeepMind - For Gemini API access
- ServiceHive - For the challenging and educational assignment
β If you found this helpful, please consider starring the repository!
Last Updated: January 2026