This project is a modular automation agent built using:
- Gemini (Google Generative AI) for reasoning and step planning
- Playwright MCP for real browser automation
- LangGraph for orchestration between planning and execution
The agent reads a goal (like “Login to Facebook”), uses Gemini to plan browser actions, and executes them in sequence through the Playwright MCP server.
✅ LLM-driven step planning using Gemini
✅ Real browser automation via Playwright MCP
✅ Modular project structure (nodes, state, workflow, config, utils)
✅ Fully environment-driven configuration using .env
✅ Works with any MCP-compatible automation server
AiFinalProject/
├── main.py
├── config.py
├── graph/
│ ├── nodes.py
│ ├── state.py
│ └── workflow.py
├── utils/
│ └── helper.py
├── .env
└── README.md
Make sure you have:
- Python 3.12 and 3.12+
- Node.js 18+
- Playwright MCP installed globally or locally
Clone and start the Playwright MCP server:
npx @playwright/mcp@latest --port 8931 --shared-browser-context📝 Notes:
- You can replace
8931with any other port you prefer. - Keep this terminal open — the Python agent connects to this MCP server via that port.
In your project root, create a .env file and add the following:
# Google Gemini API
GEMINI_API_KEY=YOUR_GEMINI_API_KEY
GEMINI_MODEL_NAME=gemini-2.5-flash
# MCP Server Configuration
MCP_PLAYWRIGHT_URL=http://localhost:8931/mcp
MCP_PLAYWRIGHT_TRANSPORT=streamable_http💡 You can change the port if you ran MCP on a different one.
Create a virtual environment and install dependencies:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtIf you don’t have a requirements.txt, create one with:
pip install langchain-google-genai langchain-core langchain-mcp-adapters langgraph python-dotenv
pip freeze > requirements.txtWith the MCP server running, start your agent:
python3 main.pyYou should see output similar to:
🚀 Starting LangGraph MCP Test Agent...
🧠 LLM NEXT STEP RAW:
{
"tool": "browser_navigate",
"args": {"url": "https://www.facebook.com"},
"reason": "Navigating to Facebook...",
"done": false
}
⚙️ Executing: browser_navigate with args: {'url': 'https://www.facebook.com'}
✅ Tool 'browser_navigate' executed successfully.
- Gemini generates a JSON plan describing the next action:
{ "tool": "browser_fill_form", "args": { "fields": [ {"selector": "#user-name", "value": "standard_user"}, {"selector": "#password", "value": "secret_sauce"} ] }, "reason": "Filling login fields.", "done": false } - LangGraph sends this step to the Playwright MCP server.
- MCP executes the browser action, returns HTML, and the cycle repeats.
- When Gemini marks
"done": true, the process ends.
| Issue | Cause | Fix |
|---|---|---|
⚠️ No tool found, stopping execution. |
Gemini returned "step" instead of "tool" |
Ensure latest prompt from fixed nodes.py |
❌ Missing GEMINI_API_KEY |
.env not configured properly |
Add GEMINI_API_KEY in .env |
MCP connection error |
MCP not running or wrong port | Run MCP with correct --port or update .env |
You can change the goal in main.py:
goal = "Go to facebook.com and log/signin/signup in with standard_user/secret_sauce"Try experimenting with other goals like:
"Go to saucedemo.com and login as standard_user""Visit twitter.com and find the login button"
| Component | Purpose |
|---|---|
| LangGraph | State-based orchestration |
| LangChain Google GenAI | Gemini API integration |
| Playwright MCP | Browser automation |
| dotenv | Configuration management |
This project is for learning and testing automation frameworks using LangGraph, Playwright MCP, and Gemini.
Use responsibly and follow all API and site terms of service.