A production-ready conversational AI agent for AutoStream, a fictional SaaS platform providing automated video editing tools for content creators. Built as part of the ServiceHive ML Intern assignment.
- Python 3.9+
- A free Groq API key from console.groq.com
# 1. Clone the repository
git clone https://github.com/ashish-doing/autostream-agent
cd autostream-agent
# 2. Install dependencies
pip install -r requirements.txt
# 3. Create a .env file in the root directory
echo "GROQ_API_KEY=your_groq_api_key_here" > .env
# 4. Run the agent
python main.pyYou: Hi
Agent: Hello! Welcome to AutoStream...
You: What are your pricing plans?
Agent: We have two plans...
You: I want to try the Pro plan for my YouTube channel
Agent: That's great! Could I get your name first?
You: Ashish
You: ashish@gmail.com
You: YouTube
Agent: Lead captured successfully!
LangGraph was chosen over AutoGen because it provides explicit, deterministic state management through a typed state graph. For a lead capture agent, predictability is critical — we need guaranteed control over when the lead capture tool fires, and LangGraph's node-based architecture makes this straightforward. AutoGen's multi-agent conversation model introduces unnecessary complexity for a single-agent workflow.
The agent uses a TypedDict-based AgentState that persists across all conversation turns within a session. The state tracks:
messages— full conversation history using LangGraph'sadd_messagesreducerintent— classified intent of the latest user messagelead_name,lead_email,lead_platform— collected incrementally across turnslead_captured— boolean flag to prevent duplicate tool callsawaiting— tracks which lead field the agent is currently collecting
This design ensures the agent never triggers mock_lead_capture() until all three fields are collected, and never prematurely exits the collection flow regardless of what the user says mid-collection.
Product knowledge (pricing, features, policies) is stored in a local knowledge_base.json file. The rag.py module performs keyword-based retrieval to find the most relevant context, which is then injected into the LLM prompt. This keeps responses grounded and prevents hallucination of non-existent plans or features.
To deploy this agent on WhatsApp, the following architecture would be used:
-
WhatsApp Business API — Register a WhatsApp Business account and obtain API credentials via Meta's developer portal.
-
Webhook Endpoint — Deploy a FastAPI or Flask server that exposes a POST
/webhookendpoint. WhatsApp sends incoming messages to this URL. -
Message Handling — On each incoming webhook event, extract the user's phone number and message text, then pass it into the LangGraph agent's
invoke()method with the appropriate session state. -
Session Persistence — Store per-user
AgentStatein a Redis or PostgreSQL database, keyed by phone number, so state persists across messages. -
Reply — After the agent responds, use the WhatsApp Business API to send the response back to the user's phone number.
WhatsApp User
↓ message
WhatsApp Business API
↓ webhook POST
FastAPI Server (/webhook)
↓ invoke
LangGraph Agent
↓ response
WhatsApp Business API
↓ reply
WhatsApp User
autostream-agent/
├── main.py # Entry point, conversation loop
├── agent.py # LangGraph agent, state management, intent detection
├── rag.py # Knowledge base loader and retrieval
├── tools.py # mock_lead_capture tool
├── knowledge_base.json # AutoStream pricing, features, policies
├── .env # API keys (not committed)
├── .gitignore
├── requirements.txt
└── README.md
- Language: Python 3.9+
- Framework: LangGraph + LangChain
- LLM: Llama 3.1 8B via Groq API (free tier)
- State Management: LangGraph TypedDict state graph
- Knowledge Base: Local JSON file with keyword-based RAG