A production-grade AI agent that automates repetitive browser workflows on tools like Linear and Notion using natural language commands.
Instead of clicking through the same menus every day, you type a plain English instruction and the agent opens the browser, navigates to the right page, fills every field, generates relevant content using AI, and submits — all on its own.
"Create a project named Backend API with status Planned, priority Urgent, and write a relevant description"
The agent:
- Navigates to the Projects page
- Opens the New Project modal
- Types the project name
- Sets status and priority via dropdowns
- Writes a meaningful description generated by Gemini
- Submits the form
| Layer | Technology |
|---|---|
| Language | Python 3.12 |
| Orchestration | LangGraph (state machine) |
| Vision + Reasoning | Google Gemini 2.5 Flash / Pro |
| Browser Automation | Playwright (Chrome) |
| Vector Store | ChromaDB |
| Embeddings | Gemini Embedding API |
| LLM Wrapper | LangChain |
| Observability | LangSmith |
| Reasoning Pattern | ReAct (Reason + Act) |
User Prompt
│
▼
┌─────────────────────────────────────────────────────┐
│ LangGraph Agent │
│ │
│ parse_task → decompose_goals → loop: │
│ extract_page_state │
│ │ (screenshot + accessibility tree) │
│ ▼ │
│ decide_action ←── RAG workflow hints │
│ │ (Gemini vision + ReAct reasoning) │
│ ▼ │
│ match_element │
│ │ (primary-keyword element detection) │
│ ▼ │
│ execute_action │
│ │ (Playwright click / type / navigate) │
│ ▼ │
│ validate_action ──► error_recovery │
│ │ (6-check system, confidence gate) │
│ ▼ │
│ [advance sub-goal or retry] │
│ │
│ task_complete → RAG update → teardown │
└─────────────────────────────────────────────────────┘
git clone https://github.com/Atharva9281/BrowserAgent.git
cd BrowserAgent
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromiumcp .env.example .env
# Add your API key:
# GEMINI_API_KEY=your_key_hereGet a free Gemini API key at Google AI Studio.
python3 src/setup_auth.pypython3 src/agent.pyThen type any task:
📋 Enter task: Create a project named Q3 Planning with status Planned in Linear
📋 Enter task: Filter projects by status In Progress in Linear
📋 Enter task: Create an issue titled Fix login bug, set priority to High in Linear
The agent breaks every task into sub-goals. A sub-goal only advances when the action's validation confidence exceeds 0.6 — measured across 6 checks (URL changed, modal opened, expected keywords present, no errors, click had effect). A Playwright click not throwing an exception is not sufficient evidence of success.
Gemini describes elements with trailing context: "Projects link in the left sidebar under the TrialAgent team section". A naive keyword extractor picks up "TrialAgent team" as a 2-word phrase and clicks the wrong button. The detector extracts the primary name — words before the first positional preposition — and tries that first.
Every successful run is embedded and stored in ChromaDB. On similar future tasks, the agent retrieves past workflows as hints. Bad runs (wrong page, failed validation) are detected and excluded before they can corrupt the knowledge base.
The agent signals completion with task_complete. This always halts execution immediately — it does not advance a sub-goal. This prevents the agent from reopening modals or restarting workflows after a task is already done.
| App | Status |
|---|---|
| Linear | ✅ Full support |
| Notion | 🔧 In progress |
Latest benchmark (Linear, 5 actions):
| Phase | Time | Share |
|---|---|---|
| LLM decision | ~36s | 28% |
| Page state extraction | ~13s | 10% |
| Action execution | ~2s | 2% |
| Element finding | ~0.5s | <1% |
| Total | ~128s | — |
MIT