# 🤖 Intelligent Web Automation Agent with LangGraph & Playwright MCP

This notebook demonstrates a comprehensive web automation agent that can perform dynamic web tasks using:
- **LangGraph**: For intelligent agent orchestration and state management
- **Playwright MCP**: For browser automation and web interaction
- **OpenAI GPT**: For intelligent planning (optional)

## 🚀 Features

✅ **Dynamic Task Planning**: Analyzes user queries and plans appropriate web actions  
✅ **Error Handling**: Robust error recovery and retry mechanisms  
✅ **State Management**: Tracks browser state and task progress  
✅ **Verbose Logging**: Detailed logging for debugging and monitoring  
✅ **Tool Integration**: Full access to all Playwright MCP tools  
✅ **Flexible Architecture**: Can work with or without LLM integration  

## 📋 Requirements

Before running this agent, ensure you have:

1. **Node.js installed** (for Playwright MCP server)
2. **Required Python packages** (install with pip)
3. **Optional: OpenAI API key** (for LLM-powered planning)

## 🛠️ Installation

Run the following commands to install dependencies:

In [None]:
# Install required packages
import subprocess
import sys
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

def install_packages():
    """Install all required packages for the Web Automation Agent"""
    
    packages = [
        "mcp",                    # Model Context Protocol
        "langchain",              # LangChain core
        "langchain-google-genai", # Google Generative AI integration
        "langgraph",              # LangGraph for agent orchestration
        "nest-asyncio",           # For Jupyter async compatibility
        "google-generativeai",    # Google Gemini API
    ]
    
    print("📦 Installing required packages...")
    for package in packages:
        try:
            print(f"Installing {package}...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])
            print(f"✅ {package} installed successfully")
        except subprocess.CalledProcessError as e:
            print(f"❌ Failed to install {package}: {e}")
    
    print("\n🎉 Package installation completed!")
    print("\n📝 Note: Make sure you have Node.js installed for Playwright MCP server")
    print("📝 Set GOOGLE_API_KEY environment variable for Gemini LLM features")

# Uncomment the line below to install packages
# install_packages()

In [8]:
import asyncio
import json
import os
from typing import Any, Dict, List, Optional, TypedDict
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolExecutor, ToolInvocation
import nest_asyncio

# Apply nest_asyncio for Jupyter compatibility
nest_asyncio.apply()

# Agent State Definition
class WebAgentState(TypedDict):
    query: str
    messages: List[Any]
    next_action: Dict[str, Any]
    browser_context: str
    task_complete: bool
    step_count: int

class SimpleWebAgent:
    def __init__(self, google_api_key: str = None):
        self.mcp_client = None
        self.session = None
        self.available_tools = []
        
        # Initialize Gemini LLM
        self.llm = ChatGoogleGenerativeAI(
            model="gemini-pro",
            google_api_key=google_api_key or os.getenv("GOOGLE_API_KEY"),
            temperature=0.1,
            verbose=True
        )
        
        # LangGraph workflow
        self.workflow = None
        self.app = None

    async def connect_mcp(self):
        """Connect to Playwright MCP server"""
        print("🔌 Connecting to Playwright MCP server...")
        
        server_params = StdioServerParameters(
            command="npx",
            args=["-y", "@playwright/mcp@latest"]
        )
        
        self.read, self.write = await stdio_client(server_params).__aenter__()
        self.session = ClientSession(self.read, self.write)
        await self.session.initialize()
        
        # Get available tools
        tools_response = await self.session.list_tools()
        self.available_tools = [tool.name for tool in tools_response.tools]
        
        print(f"✅ Connected! Available tools: {len(self.available_tools)}")
        return True

    async def call_mcp_tool(self, tool_name: str, args: Dict[str, Any]) -> str:
        """Call Playwright MCP tool"""
        print(f"🔧 Calling {tool_name} with args: {args}")
        
        result = await self.session.call_tool(tool_name, args)
        
        if result.content and hasattr(result.content[0], "text"):
            return result.content[0].text
        return str(result)

    def build_workflow(self):
        """Build simple LangGraph workflow"""
        workflow = StateGraph(WebAgentState)
        
        # Add nodes
        workflow.add_node("planner", self.plan_action)
        workflow.add_node("executor", self.execute_action)
        workflow.add_node("evaluator", self.evaluate_progress)
        
        # Set entry point
        workflow.set_entry_point("planner")
        
        # Add edges
        workflow.add_conditional_edges(
            "planner",
            lambda state: "execute" if state["next_action"] else "end",
            {"execute": "executor", "end": END}
        )
        
        workflow.add_conditional_edges(
            "executor", 
            lambda state: "evaluate" if not state["task_complete"] else "end",
            {"evaluate": "evaluator", "end": END}
        )
        
        workflow.add_conditional_edges(
            "evaluator",
            lambda state: "end" if state["task_complete"] or state["step_count"] > 10 else "planner",
            {"end": END, "planner": "planner"}
        )
        
        self.app = workflow.compile()

    async def plan_action(self, state: WebAgentState) -> WebAgentState:
        """LLM plans next action based on query and browser context"""
        print(f"\n🧠 PLANNING (Step {state['step_count'] + 1})...")
        
        system_prompt = f"""You are a web automation agent with access to Playwright browser tools.

Available tools: {', '.join(self.available_tools)}

Current browser context: {state.get('browser_context', 'No browser open yet')}

User wants: {state['query']}

Plan the next single action. Respond ONLY with JSON:
{{
    "tool": "tool_name",
    "args": {{"param": "value"}},
    "reasoning": "why this action"
}}

Common tools:
- mcp_playwright_browser_navigate: Navigate to URL
- mcp_playwright_browser_click: Click elements  
- mcp_playwright_browser_type: Type text
- mcp_playwright_browser_snapshot: Get page state
- mcp_playwright_browser_take_screenshot: Take screenshot
"""

        try:
            response = await self.llm.ainvoke([SystemMessage(content=system_prompt)])
            plan = json.loads(response.content)
            
            state["next_action"] = plan
            state["step_count"] = state.get("step_count", 0) + 1
            
            print(f"📋 PLAN: {plan['reasoning']}")
            print(f"🎯 ACTION: {plan['tool']} {plan['args']}")
            
        except Exception as e:
            print(f"❌ Planning failed: {e}")
            state["next_action"] = {}
            
        return state

    async def execute_action(self, state: WebAgentState) -> WebAgentState:
        """Execute the planned action"""
        action = state["next_action"]
        
        if not action:
            state["task_complete"] = True
            return state
            
        print(f"\n⚡ EXECUTING: {action['tool']}")
        
        try:
            result = await self.call_mcp_tool(action["tool"], action["args"])
            
            # Update browser context if we got page info
            if "Page state" in result or "Page URL" in result:
                state["browser_context"] = result[:1000]  # Keep context manageable
                
            state["messages"].append(f"Executed {action['tool']}: Success")
            print("✅ Action completed")
            
        except Exception as e:
            print(f"❌ Execution failed: {e}")
            state["messages"].append(f"Failed {action['tool']}: {e}")
            
        return state

    async def evaluate_progress(self, state: WebAgentState) -> WebAgentState:
        """LLM evaluates if task is complete"""
        print(f"\n🔍 EVALUATING progress...")
        
        eval_prompt = f"""Based on the user query and current browser state, is the task complete?

User query: {state['query']}
Current browser context: {state.get('browser_context', 'No context')}
Steps taken: {state['step_count']}

Respond ONLY with JSON:
{{"complete": true/false, "reason": "explanation"}}
"""

        try:
            response = await self.llm.ainvoke([SystemMessage(content=eval_prompt)])
            evaluation = json.loads(response.content)
            
            state["task_complete"] = evaluation["complete"]
            print(f"📊 EVALUATION: {evaluation['reason']}")
            
        except Exception as e:
            print(f"❌ Evaluation failed: {e}")
            # Fallback: complete after many steps
            state["task_complete"] = state["step_count"] > 8
            
        return state

    async def run_task(self, query: str) -> Dict[str, Any]:
        """Run a web automation task"""
        print(f"\n🚀 STARTING TASK: {query}")
        print("=" * 60)
        
        # Initialize state
        initial_state = WebAgentState(
            query=query,
            messages=[],
            next_action={},
            browser_context="",
            task_complete=False,
            step_count=0
        )
        
        # Run workflow
        final_state = await self.app.ainvoke(initial_state)
        
        print(f"\n🎯 TASK COMPLETED in {final_state['step_count']} steps")
        
        return {
            "success": final_state["task_complete"],
            "query": query,
            "steps": final_state["step_count"],
            "messages": final_state["messages"],
            "browser_context": final_state.get("browser_context", "")
        }

    async def cleanup(self):
        """Cleanup MCP connection"""
        if self.session:
            await self.session.close()
            print("🧹 Cleaned up MCP connection")

# Initialize the simple agent
async def create_web_agent(google_api_key: str = None) -> SimpleWebAgent:
    """Create and initialize the web automation agent"""
    agent = SimpleWebAgent(google_api_key)
    await agent.connect_mcp()
    agent.build_workflow()
    return agent


ImportError: cannot import name 'ToolExecutor' from 'langgraph.prebuilt' (/Users/dineshk/Downloads/clean-connection/venv/lib/python3.13/site-packages/langgraph/prebuilt/__init__.py)

In [None]:
# Simple Web Automation Usage Examples
import os

async def demo_web_automation():
    """Demo the simple web automation agent"""
    
    # Get API key (set your Google API key here or in environment)
    google_api_key = os.getenv("GOOGLE_API_KEY")  # or "your-api-key-here"
    
    if not google_api_key:
        print("⚠️  No Google API key found. Set GOOGLE_API_KEY environment variable or pass directly.")
        print("🔧 Using fallback behavior without LLM planning...")
    
    # Create agent
    agent = await create_web_agent(google_api_key)
    
    try:
        # Test different web automation tasks
        test_queries = [
            "Go to Google and search for latest AI news",
            "Navigate to YouTube and search for Python tutorials", 
            "Visit GitHub and search for LangGraph repositories",
            "Go to Wikipedia and search for artificial intelligence",
            "Open Reddit and browse the programming subreddit"
        ]
        
        for i, query in enumerate(test_queries, 1):
            print(f"\n{'='*60}")
            print(f"🎯 TEST {i}: {query}")
            print('='*60)
            
            result = await agent.run_task(query)
            
            if result["success"]:
                print(f"✅ SUCCESS: Completed in {result['steps']} steps")
            else:
                print(f"⚠️  PARTIAL: Completed {result['steps']} steps")
                
            print(f"📝 Final context: {result['browser_context'][:200]}...")
            
            # Small delay between tasks
            await asyncio.sleep(2)
            
    finally:
        await agent.cleanup()

# Quick single task function
async def run_single_task(query: str, google_api_key: str = None):
    """Run a single web automation task"""
    agent = await create_web_agent(google_api_key)
    
    try:
        result = await agent.run_task(query)
        return result
    finally:
        await agent.cleanup()

# Interactive function for custom queries
async def interactive_web_automation():
    """Interactive web automation session"""
    google_api_key = os.getenv("")
    agent = await create_web_agent(google_api_key)
    
    print("🤖 Interactive Web Automation Agent")
    print("Type your web automation requests. Type 'quit' to exit.")
    print("Examples:")
    print("  - 'Search Google for latest news'")  
    print("  - 'Go to YouTube and find cooking videos'")
    print("  - 'Navigate to Stack Overflow and search for Python help'")
    
    try:
        while True:
            print("\n" + "-"*50)
            query = input("🎯 Enter your task: ").strip()
            
            if query.lower() in ['quit', 'exit', 'stop']:
                break
                
            if query:
                result = await agent.run_task(query)
                print(f"\n📊 Result: {'Success' if result['success'] else 'Partial'}")
                
    except KeyboardInterrupt:
        print("\n👋 Goodbye!")
    finally:
        await agent.cleanup()

print("🎯 Available functions:")
print("  - demo_web_automation(): Run predefined test cases")
print("  - run_single_task('your query'): Run one task")
print("  - interactive_web_automation(): Interactive session")

🎯 Available functions:
  - demo_web_automation(): Run predefined test cases
  - run_single_task('your query'): Run one task
  - interactive_web_automation(): Interactive session


In [None]:
# Quick Demo - Simple Web Automation Agent
async def quick_demo():
    """Quick demonstration of the simple web automation agent"""
    
    print("🚀 SIMPLE WEB AUTOMATION AGENT DEMO")
    print("=" * 50)
    
    # Set your Google API key here or in environment variable
    google_api_key = load_dotenv("GOOGLE_API_KEY")  # Replace with "your-google-api-key" or set GOOGLE_API_KEY env var

    # Create and test the agent
    agent = await create_web_agent(google_api_key)
    
    try:
        # Demo task 1: Google search (like we did manually before)
        print("\n🎯 DEMO 1: Google AI News Search")
        result1 = await agent.run_task("Go to Google and search for latest AI news")
        
        # Demo task 2: YouTube search  
        print("\n🎯 DEMO 2: YouTube Search")
        result2 = await agent.run_task("Navigate to YouTube and search for LangGraph tutorials")
        
        # Demo task 3: GitHub search
        print("\n🎯 DEMO 3: GitHub Search") 
        result3 = await agent.run_task("Go to GitHub and search for web automation projects")
        
        print("\n📊 DEMO SUMMARY:")
        print(f"✅ Task 1 Success: {result1['success']} ({result1['steps']} steps)")
        print(f"✅ Task 2 Success: {result2['success']} ({result2['steps']} steps)")  
        print(f"✅ Task 3 Success: {result3['success']} ({result3['steps']} steps)")
        
    finally:
        await agent.cleanup()
        print("\n🎉 Demo completed!")

# Run the quick demo
await quick_demo()

# For custom tasks, uncomment and modify:
await run_single_task("Your custom web automation task here")

# For interactive session, uncomment:
# await interactive_web_automation()

🚀 SIMPLE WEB AUTOMATION AGENT DEMO


NameError: name 'create_web_agent' is not defined

## 🧠 How This Intelligent Web Automation Works

### **Key Improvements Over Manual Approach:**

1. **🤖 LLM-Driven Planning**: Gemini LLM analyzes user queries and plans appropriate browser actions
2. **🔄 Dynamic Adaptation**: Agent adapts to any website and task, not hardcoded for specific sites  
3. **📊 Smart Evaluation**: LLM evaluates progress and determines when tasks are complete
4. **🛠️ Full Tool Access**: Direct access to all Playwright MCP tools (navigate, click, type, snapshot, etc.)

### **Architecture:**
```
User Query → LLM Planner → Tool Executor → LLM Evaluator → Results
     ↑                                                       ↓
     └─────────────── LangGraph State Management ────────────┘
```

### **How It Handles Any Web Automation Task:**

1. **Planning Phase**: 
   - LLM receives user query + current browser state
   - Plans next logical action (navigate, click, type, etc.)
   - Chooses appropriate tool and parameters

2. **Execution Phase**:
   - Calls Playwright MCP tool with planned parameters
   - Updates browser context with results

3. **Evaluation Phase**:
   - LLM checks if task objective is met
   - Decides to continue or complete

### **Example Task Flow:**
**Query**: "Go to GitHub and search for AI projects"

1. **Plan**: Navigate to GitHub → **Execute**: `browser_navigate(github.com)`
2. **Plan**: Find search box → **Execute**: `browser_click(search_element)` 
3. **Plan**: Type search query → **Execute**: `browser_type("AI projects")`
4. **Plan**: Submit search → **Execute**: `browser_type(submit=True)`
5. **Evaluate**: Search results visible → **Complete**: ✅

### **Advantages:**
- ✅ **Zero Hardcoding**: Works with any website
- ✅ **Natural Language**: Plain English instructions  
- ✅ **Self-Adapting**: Handles different page layouts
- ✅ **Error Recovery**: LLM can replan if actions fail
- ✅ **Verbose Logging**: See exactly what agent is thinking