Browser Automation AI Agent

An AI Agent that automates and executes a workflow/process based on natural language instructions using a web browser and also captures screenshots of the UI states of the workflow. The system uses an AI Agent architecture to understand user intents and perform browser automation tasks using Playwright MCP Server and captures/screenshots the UI states.

Features

Natural Language Processing: Understands user queries in plain English
Automated Web Interactions: Performs actions like clicking, typing, and navigating
Screenshot Capture: Takes screenshots of relevant UI elements or pages
Human-in-the-Loop: Optional approval before executing automation steps
Integration with AI Models: Uses Google's Generative AI for processing queries

AI Agent Architecture

Prerequisites

Node.js (v16 or higher)
pnpm package manager
Google Gemini API key
Brave Search API key (for web searches)
Playwright for browser automation

Installation

Clone the repository:

git clone https://github.com/abhignakumar/browser-automation-ai-agent
cd browser-automation-ai-agent

Install dependencies:
```
pnpm install
```
Set up environment variables:
- Copy .env.example to .env
- Fill in the required API keys and configuration

Configuration

Create a .env file with the following variables:

MODEL_NAME=your_model_name      # gemini-2.5-flash
GEMINI_API_KEY=your_gemini_api_key
BRAVE_API_KEY=your_brave_api_key
SCREENSHOT_DIR=/path/to/save/screenshots
PLAYWRIGHT_MCP_OUTPUT_DIR=/path/to/playwright/output       # For MacOS: /private/tmp/playwright-mcp-output

Usage

Update the query in src/index.ts with your desired action:

const query = `How to create a task with a due date in Asana?`;

Run the application:
```
pnpm dev
```
Or build and run:
```
pnpm build
pnpm start
```

How It Works

The agent processes the natural language query
Determines the necessary web interactions
Optionally requests human approval before execution
Performs the web automation using Playwright
Captures and saves relevant screenshots
Returns the path to the saved screenshots

Project Structure

src/
- agent.ts - Main agent implementation
- index.ts - Entry point and example usage
- mcp-client.ts - MCP (Model Control Protocol) client
- prompts.ts - System prompts for the AI
- types.ts - TypeScript type definitions
- utils.ts - Utility functions

Dependencies

@google/genai: Google's Generative AI SDK
playwright: Browser automation
@modelcontextprotocol/sdk: Model Context Protocol SDK

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
ai-agent-architecture.png		ai-agent-architecture.png
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Browser Automation AI Agent

Features

AI Agent Architecture

Prerequisites

Installation

Configuration

Usage

How It Works

Project Structure

Dependencies

About

Uh oh!

Releases

Packages

Languages

abhignakumar/browser-automation-ai-agent

Folders and files

Latest commit

History

Repository files navigation

Browser Automation AI Agent

Features

AI Agent Architecture

Prerequisites

Installation

Configuration

Usage

How It Works

Project Structure

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages