🌐 Web Browser Automation Agent

An interactive Streamlit dashboard allowing users to pilot a headless browser instance using natural language instructions. The backend hooks into the Model Context Protocol (MCP) and Playwright.

Developed by Coder Coder. For more projects, interviews, and OA preparation, please visit CodinzHub.

Features

Action Mapping: Use normal language instructions to direct the browser.
Full Navigation: Visit websites, click links, scroll pages, and submit forms.
Visual Capture: Retrieve snapshots and screenshots of loaded elements.
Information Extraction: Automatically extract and summarize site content into markdown format.
Complex Workflows: Execute sequences of instructions across multiple web pages.

Setup Guide

Core Prerequisites

Python 3.8+
Node.js & npm: Required to run the Playwright browser driver. Download from nodejs.org.
OpenAI API Credentials

Setup Steps

Navigate to this agent's folder:

cd mcp_integrations_agents/browser_mcp_agent

Install python packages:
```
pip install -r requirements.txt
```
Confirm Node.js is present:
```
node --version
npm --version
```
Configure your OpenAI access token: Set the environment variable:
```
export OPENAI_API_KEY=your-openai-api-key
```
Or set it within mcp_agent.secrets.yaml (copy it from mcp_agent.secrets.yaml.example if needed).

Alternative: Local Execution via Ollama

This agent supports local LLM execution using Ollama:

Launch Ollama and pull a tool-compatible model:
```
ollama pull llama3.2
ollama serve
```

Modify mcp_agent.config.yaml to point to Ollama's endpoint:

openai:
  base_url: "http://localhost:11434/v1"
  default_model: "llama3.2"

Add a placeholder token in mcp_agent.secrets.yaml:
```
openai:
  api_key: "ollama"
```

Starting the Agent

Run the Streamlit console:
```
streamlit run main.py
```
Open the page in your browser, specify your instructions, and click Execute Command.

Sample Commands

Basic Operations

Go to codinzhub.com
Scroll down to view more content

Actions & Forms

Click on the search input
Click on the login button

Content Extraction

Summarize the main content of this page
Take a screenshot of the main header

System Architecture

This assistant runs on:

Streamlit for the web interface.
MCP (Model Context Protocol) for linking the LLM with system tools.
Playwright for browser steering.
OpenAI / Ollama for reasoning and command parsing.

the tech stack consists of:

Frontend / UI: Streamlit (for building the web-based interactive console dashboard) Agent Framework / Integration: mcp-agent (linking the LLM with system tools using the Model Context Protocol) Browser Automation: Playwright (driven via Node.js/npm dependencies on the backend) LLM / Inference: OpenAI (via openai Python SDK) OR Ollama (for local execution using models like llama3.2) Language: Python 3.8+

Credits & More

Developed by Coder Coder. For more projects, interviews, and OA preparation, please visit CodinzHub.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Web Browser Automation Agent

Features

Setup Guide

Core Prerequisites

Setup Steps

Alternative: Local Execution via Ollama

Starting the Agent

Sample Commands

Basic Operations

Actions & Forms

Content Extraction

System Architecture

Credits & More

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🌐 Web Browser Automation Agent

Features

Setup Guide

Core Prerequisites

Setup Steps

Alternative: Local Execution via Ollama

Starting the Agent

Sample Commands

Basic Operations

Actions & Forms

Content Extraction

System Architecture

Credits & More

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages