An AI Agent that automates and executes a workflow/process based on natural language instructions using a web browser and also captures screenshots of the UI states of the workflow. The system uses an AI Agent architecture to understand user intents and perform browser automation tasks using Playwright MCP Server and captures/screenshots the UI states.
- Natural Language Processing: Understands user queries in plain English
- Automated Web Interactions: Performs actions like clicking, typing, and navigating
- Screenshot Capture: Takes screenshots of relevant UI elements or pages
- Human-in-the-Loop: Optional approval before executing automation steps
- Integration with AI Models: Uses Google's Generative AI for processing queries
- Node.js (v16 or higher)
- pnpm package manager
- Google Gemini API key
- Brave Search API key (for web searches)
- Playwright for browser automation
-
Clone the repository:
git clone https://github.com/abhignakumar/browser-automation-ai-agent cd browser-automation-ai-agent -
Install dependencies:
pnpm install
-
Set up environment variables:
- Copy
.env.exampleto.env - Fill in the required API keys and configuration
- Copy
Create a .env file with the following variables:
MODEL_NAME=your_model_name # gemini-2.5-flash
GEMINI_API_KEY=your_gemini_api_key
BRAVE_API_KEY=your_brave_api_key
SCREENSHOT_DIR=/path/to/save/screenshots
PLAYWRIGHT_MCP_OUTPUT_DIR=/path/to/playwright/output # For MacOS: /private/tmp/playwright-mcp-output
-
Update the query in
src/index.tswith your desired action:const query = `How to create a task with a due date in Asana?`;
-
Run the application:
pnpm dev
Or build and run:
pnpm build pnpm start
- The agent processes the natural language query
- Determines the necessary web interactions
- Optionally requests human approval before execution
- Performs the web automation using Playwright
- Captures and saves relevant screenshots
- Returns the path to the saved screenshots
src/agent.ts- Main agent implementationindex.ts- Entry point and example usagemcp-client.ts- MCP (Model Control Protocol) clientprompts.ts- System prompts for the AItypes.ts- TypeScript type definitionsutils.ts- Utility functions
@google/genai: Google's Generative AI SDKplaywright: Browser automation@modelcontextprotocol/sdk: Model Context Protocol SDK
