# Browser Agents

The primary goal in web automation has always been to create tools that don't break when websites change.

Foundational frameworks, with Selenium being the most notable, work by following direct commands. A developer writes a script with explicit, step-by-step instructions for the browser to follow. For example, a command might be, "find an element using its specific XPath or CSS selector," followed by another command to "click on it" or "type into it."

The fundamental weakness of this method is its brittleness. Websites are dynamic; their underlying code structure, the Document Object Model (DOM), is frequently updated by developers. Even a minor code change, like altering an element's ID or adjusting the page layout, can cause the script's selector to fail.

Once a selector breaks, the script cannot find its target element, and the entire automated process grinds to a halt. This creates a high maintenance burden, as scripts must be constantly fixed to keep up with website updates.

An AI browser agent is a software program enhanced with artificial intelligence that can perform web-based tasks autonomously, much like a human user would. These agents operate within a web browser, using LLMs, and computer vision to interact with websites in a way that mimics human behavior.

Instead of following a rigid, pre-written script like traditional automation tools, an AI browser agent is given a high-level goal in natural language, such as "Find the top tech articles today" or "Book the cheapest flight to New York". The agent then independently plans and executes the steps needed to achieve that goal.



## Installation
```python
pip install playwright && playwright install chromium --with-deps
pip install browser-use
```

## Basic Search

```python
import asyncio
import os
import sys

from dotenv import load_dotenv

load_dotenv('./web-ui/.env')

from browser_use import Agent
from langchain_google_genai import ChatGoogleGenerativeAI

async def main():
    # Instantiate the Google Gemini model
    llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
    task = (
        "Search Google for 'what is browser automation' and tell me the top 3 results"
    )
    agent = Agent(task=task, llm=llm)
    history = await agent.run()
    print(history.urls())


if __name__ == "__main__":
    asyncio.run(main())
```

## Form Filling

# Run Basic Search example in your local machine, and provide history.urls()

## Output from Web UI

![ss](./Screenshot%202025-09-28%20at%208.42.10 PM.png)

![gif](./41e0d75f-4c0e-423c-8b03-70f06f39b40f.gif)

## Output from script

[script main.py](./main.py)

```python
import asyncio
import os
import sys

from dotenv import load_dotenv

load_dotenv('./web-ui/.env')

from browser_use import Agent
from langchain_google_genai import ChatGoogleGenerativeAI

async def main():
    # Instantiate the Google Gemini model
    llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
    task = (
        "Search Google for 'what is browser automation' and tell me the top 3 results"
    )
    agent = Agent(task=task, llm=llm)
    history = await agent.run()
    print(history.urls())


if __name__ == "__main__":
    asyncio.run(main())
```

![main](./Screenshot%202025-09-28%20at%208.43.17 PM.png)

History Urls:

* 'about:blank'
* 'https://www.google.com/search?q=what%20is%20browser%20automation&udm=14&sei=83_ZaMSlI5CP7NYPj-aTyQo'
* 'https://www.google.com/search?q=what%20is%20browser%20automation&udm=14&sei=83_ZaMSlI5CP7NYPj-aTyQo'