🤖 AI Web Browser

A comprehensive AI-powered web browser with autonomous navigation, conversational interface, vision capabilities, and programmatic API control. Built with Claude AI, Electron, and Playwright.

✨ Features

1. Autonomous AI Browser Agent

Navigate websites automatically
Click buttons, fill forms, and complete complex tasks
Execute multi-step workflows autonomously
Smart decision-making based on page content

2. AI-Enhanced Traditional Browser

Content summarization
Intelligent information extraction
Visual page analysis with AI vision
Context-aware assistance

3. Conversational Web Browsing

Chat with AI about current page
Natural language navigation commands
Ask questions and get instant answers
Context-aware conversations

4. Programmatic API Control

RESTful API for browser automation
Headless browsing capabilities
Session management
Perfect for automation and testing

🚀 Quick Start

Prerequisites

Node.js 18+ and npm
Anthropic API key (Get one here)

Installation

# Clone the repository
git clone <your-repo-url>
cd ai-web-browser

# Install dependencies
npm install

# Install Playwright browsers
npx playwright install chromium

# Set up environment variables
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY

Configuration

Edit .env file:

ANTHROPIC_API_KEY=your_api_key_here
API_PORT=3000
API_HOST=localhost
HEADLESS=false
DEFAULT_VIEWPORT_WIDTH=1280
DEFAULT_VIEWPORT_HEIGHT=720

💻 Usage

Desktop Application

# Build the application
npm run build

# Start the Electron app
npm start

Features in the Desktop App:

Visual browser with AI sidebar
Chat with AI about any webpage
Execute autonomous tasks
Summarize, analyze, and extract information
Real-time screenshot updates

API Server

# Start the API server
npm run server

The API will be available at http://localhost:3000

Programmatic Usage

import { AutonomousAgent } from './src/ai/autonomous-agent';

async function example() {
  // Initialize the agent
  const agent = new AutonomousAgent(process.env.ANTHROPIC_API_KEY);
  await agent.initialize();

  // Execute an autonomous task
  const result = await agent.executeTask({
    task: 'Search for latest AI news on Google and summarize the top 3 articles',
    url: 'https://google.com'
  });

  console.log(result.finalResult);
  console.log('Steps taken:', result.steps);

  // Chat about a page
  await agent.getBrowserController().navigate('https://example.com');
  const response = await agent.chat('What is this page about?');
  console.log(response);

  // Summarize content
  const summary = await agent.summarizeCurrentPage();
  console.log(summary);

  // Extract specific information
  const info = await agent.extractInformation('What are the main contact details?');
  console.log(info);

  // Analyze with vision
  const analysis = await agent.analyzeCurrentPage('What elements are visible on this page?');
  console.log(analysis);

  await agent.close();
}

📡 API Documentation

Create Session

POST /api/sessions

Response:

{
  "success": true,
  "sessionId": "session_123abc",
  "message": "Browser session created"
}

Navigate

POST /api/sessions/:sessionId/navigate
Content-Type: application/json

{
  "url": "https://example.com"
}

Execute Autonomous Task

POST /api/sessions/:sessionId/task
Content-Type: application/json

{
  "task": "Search for information about AI",
  "url": "https://google.com",
  "maxSteps": 20
}

Chat

POST /api/sessions/:sessionId/chat
Content-Type: application/json

{
  "message": "What is this page about?"
}

Summarize Page

GET /api/sessions/:sessionId/summarize

Extract Information

POST /api/sessions/:sessionId/extract
Content-Type: application/json

{
  "query": "Extract all email addresses"
}

Analyze with Vision

POST /api/sessions/:sessionId/analyze
Content-Type: application/json

{
  "query": "What products are shown on this page?"
}

Get Page Content

GET /api/sessions/:sessionId/content

Take Screenshot

GET /api/sessions/:sessionId/screenshot

Returns base64-encoded PNG image.

Close Session

DELETE /api/sessions/:sessionId

List Sessions

GET /api/sessions

🎯 Use Cases

1. Web Automation

const task = {
  task: 'Book a flight from NYC to LAX on October 15th',
  url: 'https://airline-website.com'
};
const result = await agent.executeTask(task);

2. Research Assistant

const task = {
  task: 'Research the top 5 competitors in the AI browser space and create a summary',
  url: 'https://google.com'
};
const result = await agent.executeTask(task);

3. Content Extraction

await agent.getBrowserController().navigate('https://news-site.com');
const articles = await agent.extractInformation(
  'Extract all article titles and their summaries'
);

4. Price Monitoring

const task = {
  task: 'Find the price of iPhone 15 Pro on this e-commerce site',
  url: 'https://shop.example.com'
};
const result = await agent.executeTask(task);

5. Form Filling

const task = {
  task: 'Fill out the contact form with: Name: John Doe, Email: john@example.com, Message: Hello!',
  url: 'https://example.com/contact'
};
const result = await agent.executeTask(task);

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                  Electron Desktop UI                     │
│  ┌────────────┐  ┌──────────────┐  ┌─────────────────┐ │
│  │  Browser   │  │   AI Chat    │  │  Task Executor  │ │
│  │   View     │  │  Interface   │  │                 │ │
│  └────────────┘  └──────────────┘  └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│                  Autonomous Agent Layer                  │
│  ┌────────────────────────────────────────────────────┐ │
│  │  • Task Planning      • Conversation Management    │ │
│  │  • Vision Analysis    • Content Extraction         │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
           │                              │
           ▼                              ▼
┌──────────────────────┐      ┌──────────────────────────┐
│   Claude AI Engine   │      │  Browser Controller      │
│                      │      │  (Playwright)            │
│  • GPT-4 Vision      │      │                          │
│  • Task Planning     │      │  • Navigation            │
│  • Content Analysis  │      │  • DOM Interaction       │
│  • Summarization     │      │  • Screenshots           │
└──────────────────────┘      └──────────────────────────┘
           │                              │
           └──────────────┬───────────────┘
                          ▼
┌─────────────────────────────────────────────────────────┐
│                   REST API Server                        │
│          (Express.js - Port 3000)                        │
└─────────────────────────────────────────────────────────┘

📁 Project Structure

ai-web-browser/
├── src/
│   ├── ai/
│   │   ├── claude-engine.ts      # AI engine (Claude API)
│   │   ├── browser-controller.ts # Playwright browser automation
│   │   └── autonomous-agent.ts   # Main agent orchestrator
│   ├── api/
│   │   └── server.ts             # REST API server
│   ├── main/
│   │   └── main.ts               # Electron main process
│   ├── renderer/
│   │   ├── index.html            # UI
│   │   └── renderer.ts           # UI logic
│   └── shared/
│       └── types.ts              # Shared TypeScript types
├── dist/                         # Compiled JavaScript
├── package.json
├── tsconfig.json
└── README.md

🛠️ Development

# Install dependencies
npm install

# Development mode (watch mode)
npm run dev

# Build
npm run build

# Run tests
npm test

# Start API server
npm run server

# Start Electron app
npm start

🔧 Advanced Configuration

Custom Browser Configuration

const agent = new AutonomousAgent(apiKey);
const browserController = new BrowserController({
  headless: true,
  viewport: { width: 1920, height: 1080 },
  userAgent: 'Custom User Agent'
});

Task Execution Options

const result = await agent.executeTask({
  task: 'Your task description',
  url: 'https://starting-url.com',
  context: 'Additional context for the AI',
  maxSteps: 30  // Maximum number of automation steps
});

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📝 License

MIT

⚠️ Disclaimer

This is an AI-powered browser automation tool. Please ensure you:

Comply with websites' Terms of Service
Respect robots.txt files
Use responsibly and ethically
Don't use for malicious purposes
Be mindful of rate limiting

🐛 Troubleshooting

Common Issues

Issue: "Browser not initialized"

Ensure Playwright browsers are installed: npx playwright install chromium

Issue: "API Key not found"

Make sure you've set ANTHROPIC_API_KEY in your .env file

Issue: "Screenshot not loading"

Check if the page has fully loaded
Try increasing wait times in browser configuration

Issue: "Task execution timeout"

Increase maxSteps in task configuration
Check your internet connection
Verify the website is accessible

📚 Additional Resources

🎉 Examples

Check out the examples/ directory for more usage examples:

Web scraping
Automated testing
Data extraction
Form automation
Research workflows

Built with ❤️ using Claude AI, Electron, and Playwright

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src		src
.env.example		.env.example
.gitignore		.gitignore
API.md		API.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.main.json		tsconfig.main.json
tsconfig.renderer.json		tsconfig.renderer.json

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Web Browser

✨ Features

1. Autonomous AI Browser Agent

2. AI-Enhanced Traditional Browser

3. Conversational Web Browsing

4. Programmatic API Control

🚀 Quick Start

Prerequisites

Installation

Configuration

💻 Usage

Desktop Application

API Server

Programmatic Usage

📡 API Documentation

Create Session

Navigate

Execute Autonomous Task

Chat

Summarize Page

Extract Information

Analyze with Vision

Get Page Content

Take Screenshot

Close Session

List Sessions

🎯 Use Cases

1. Web Automation

2. Research Assistant

3. Content Extraction

4. Price Monitoring

5. Form Filling

🏗️ Architecture

📁 Project Structure

🛠️ Development

🔧 Advanced Configuration

Custom Browser Configuration

Task Execution Options

🤝 Contributing

📝 License

⚠️ Disclaimer

🐛 Troubleshooting

Common Issues

📚 Additional Resources

🎉 Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages