An interactive Streamlit dashboard allowing users to pilot a headless browser instance using natural language instructions. The backend hooks into the Model Context Protocol (MCP) and Playwright.
Developed by Coder Coder. For more projects, interviews, and OA preparation, please visit CodinzHub.
- Action Mapping: Use normal language instructions to direct the browser.
- Full Navigation: Visit websites, click links, scroll pages, and submit forms.
- Visual Capture: Retrieve snapshots and screenshots of loaded elements.
- Information Extraction: Automatically extract and summarize site content into markdown format.
- Complex Workflows: Execute sequences of instructions across multiple web pages.
- Python 3.8+
- Node.js & npm: Required to run the Playwright browser driver. Download from nodejs.org.
- OpenAI API Credentials
-
Navigate to this agent's folder:
cd mcp_integrations_agents/browser_mcp_agent -
Install python packages:
pip install -r requirements.txt
-
Confirm Node.js is present:
node --version npm --version
-
Configure your OpenAI access token: Set the environment variable:
export OPENAI_API_KEY=your-openai-api-keyOr set it within
mcp_agent.secrets.yaml(copy it frommcp_agent.secrets.yaml.exampleif needed).
This agent supports local LLM execution using Ollama:
-
Launch Ollama and pull a tool-compatible model:
ollama pull llama3.2 ollama serve
-
Modify
mcp_agent.config.yamlto point to Ollama's endpoint:openai: base_url: "http://localhost:11434/v1" default_model: "llama3.2"
-
Add a placeholder token in
mcp_agent.secrets.yaml:openai: api_key: "ollama"
-
Run the Streamlit console:
streamlit run main.py
-
Open the page in your browser, specify your instructions, and click Execute Command.
Go to codinzhub.comScroll down to view more content
Click on the search inputClick on the login button
Summarize the main content of this pageTake a screenshot of the main header
This assistant runs on:
- Streamlit for the web interface.
- MCP (Model Context Protocol) for linking the LLM with system tools.
- Playwright for browser steering.
- OpenAI / Ollama for reasoning and command parsing.
the tech stack consists of:
Frontend / UI: Streamlit (for building the web-based interactive console dashboard) Agent Framework / Integration: mcp-agent (linking the LLM with system tools using the Model Context Protocol) Browser Automation: Playwright (driven via Node.js/npm dependencies on the backend) LLM / Inference: OpenAI (via openai Python SDK) OR Ollama (for local execution using models like llama3.2) Language: Python 3.8+
Developed by Coder Coder. For more projects, interviews, and OA preparation, please visit CodinzHub.