Swift Browser Agent UI

A voice-controlled web interface for interacting with a browser automation agent. This project combines advanced voice recognition with browser automation to create a hands-free way to control web browsing.

🚀 Features

Voice Input: Uses @ricky0123/vad-react for browser-based Voice Activity Detection (VAD)
Transcription: Configurable to use different Speech-to-Text services (Groq Whisper, OpenAI Whisper API)
Agent Backend: Communicates with Python backend API in /backend directory
Browser Control: Uses the browser-use library to automate browser actions based on voice commands
Real-time Updates: WebSockets for live agent status, goals, actions, and browser screenshots
Modern Stack: Built with Next.js (React) and Tailwind CSS, deployable on Vercel or similar platforms

📋 Project Structure

This repository contains both the frontend UI and the Python backend API in a monorepo structure:

└── swift-browser-ui/ (Repo root)
    ├── app/              <-- Next.js frontend code
    ├── backend/          <-- Python FastAPI backend code
    ├── public/           <-- Frontend public assets (VAD files)
    ├── node_modules/     <-- Node.js dependencies
    ├── backend/.venv/    <-- Python virtual environment (created by uv/venv)
    ├── package.json      <-- Node.js dependencies
    ├── backend/pyproject.toml <-- Python project config
    ├── backend/requirements.txt <-- Python dependencies (for pip)
    ├── .env.local        <-- Frontend & Shared Env Vars (Root)
    ├── backend/.env      <-- Backend-specific Env Vars
    └── ... (other config files)

🏗️ Architecture

The system works through these components:

Frontend (Browser):
- Captures voice using VAD to detect speech
- Sends audio blob to /api/route.ts
- Displays agent progress via WebSocket updates
Next.js API Route:
- Receives audio and sends to configured STT service
- Gets transcription and forwards to Python backend
Python Backend:
- Manages browser-use Agent instance using configured LLM
- Executes browser actions based on commands
- Sends status updates and screenshots to frontend

+-----------------------+  POST /api  +-----------------------+  POST /agent/task  +---------------------+  +-------------+
| Frontend (Browser)    |------------->| Next.js API Route     |-------------------->| Python Backend     |-->| browser-use |
| - VAD (Detects Speech)|  (Audio Blob)| - STT Transcription   |    (Task Text)     | - FastAPI/WebSocket|   | Agent       |
| - WebSocket Client    |              | - Calls Python Backend|                    | - Manages Agent    |   |             |
| - Displays Agent View |<-------------+                       |<--------------------+                     |<--+             |
+-----------------------+ WebSocket JSON+-----------------------+  JSON Response     +---------------------+  +-------------+
                         (Status, Screenshot)                     (Task Accepted + SessionID)

📋 Prerequisites

Node.js: v18 or later recommended
pnpm: npm install -g pnpm for frontend dependencies
Python: v3.11 or later
uv (Recommended) or pip: For Python dependencies
- Install uv: curl -LsSf https://astral.sh/uv/install.sh | sh or pip install uv
API Keys:
- STT provider (Groq or OpenAI) for frontend
- LLM provider (OpenAI, Anthropic, Google, Azure) for backend
Ollama (Optional): For local models
Microphone: A working microphone for your browser

🔧 Setup & Development

1. Clone Your Fork

# Replace with your fork's URL
git clone https://github.com/e8Complete/swift-browser-use.git
cd swift-browser-use

2. Install Frontend Dependencies

From the root directory:

pnpm install
pnpm add openai # Required if using OpenAI for STT

3. Install Backend Dependencies

Navigate to the backend directory:

cd backend

Option A - Recommended with uv

# This creates/uses .venv and installs from pyproject.toml
uv pip install -e .

Option B - Using pip with requirements.txt

# Generate requirements.txt if needed
uv pip freeze > requirements.txt

# Create a virtual environment and install
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

4. Install Playwright Browser

After activating the virtual environment:

playwright install chromium

5. Return to Root

cd ..

6. Set Up Environment Variables

Frontend (.env.local at the root)

Copy .env.example to .env.local: cp .env.example .env.local

Edit .env.local:

# ./ai-ng-swift/.env.local

# --- STT Configuration for Next.js API Route ---
# Choose one: 'groq' or 'openai'
STT_PROVIDER=openai

# Required if STT_PROVIDER=groq
GROQ_API_KEY=gr_...

# Required if STT_PROVIDER=openai
OPENAI_API_KEY=sk_...

# --- Backend URLs ---
# URL for Next.js API route to call Python backend
PYTHON_BACKEND_URL=http://localhost:8000

# WebSocket URL for the browser client (JS) to connect
# Use ws:// locally, wss:// when deployed with SSL
# Needs NEXT_PUBLIC_ prefix!
NEXT_PUBLIC_PYTHON_WS_URL=ws://localhost:8000

Backend (backend/.env)

Create this file inside the backend/ directory:

# ./ai-ng-swift/backend/.env

# --- LLM Configuration for Python Agent ---
# Choose provider: 'openai', 'azure', 'anthropic', 'google', 'ollama'
AGENT_LLM_PROVIDER=ollama
# Model name appropriate for the provider
AGENT_LLM_MODEL=llama3

# --- Provider Specific Settings (only need those for selected provider) ---
# OPENAI_API_KEY=sk_... # If using openai LLM
# ANTHROPIC_API_KEY=...
# GEMINI_API_KEY=...
# AZURE_ENDPOINT=...
# AZURE_OPENAI_API_KEY=...
# AZURE_OPENAI_API_VERSION=...
OLLAMA_BASE_URL=http://localhost:11434 # Default if using ollama

# Optional backend logging level
# BROWSER_USE_LOGGING_LEVEL=debug

7. Run the Development Servers

Terminal 1 (Frontend)

From the root directory:

pnpm dev

Access at http://localhost:3000

Terminal 2 (Backend)

From the backend/ directory:

cd backend
# If using pip/venv, activate it: source .venv/bin/activate
uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload

API runs at http://localhost:8000

8. Test

Open http://localhost:3000
Grant microphone permissions
Speak a command (e.g., "Open Google and search for cute cat videos")
Watch the browser UI for transcription, status updates, and screenshots

🚀 Deployment

Deploying a monorepo with mixed languages requires separate configurations:

Frontend (Root /)

Deploy the Next.js app to Vercel, Netlify, etc:

Configure platform to use Node.js with pnpm
Set environment variables in the platform settings:
- STT_PROVIDER
- GROQ_API_KEY or OPENAI_API_KEY
- PYTHON_BACKEND_URL (must point to deployed backend)
- NEXT_PUBLIC_PYTHON_WS_URL (must point to deployed backend)

Backend (/backend)

Deploy to a platform suitable for long-running Python processes:

Fly.io, Render, Cloud Run, Railway, DigitalOcean Apps, AWS EC2/ECS
Often deployed via Docker

Dockerfile Example (place in backend/)

# Use an official Python base image
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies needed by Playwright/Browsers
RUN apt-get update && apt-get install -y --no-install-recommends \
    # Playwright dependencies
    libnss3 libnspr4 libdbus-1-3 libatk1.0-0 libatk-bridge2.0-0 \
    libcups2 libdrm2 libexpat1 libgbm1 libgcc1 libglib2.0-0 \
    libpango-1.0-0 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 \
    libxdamage1 libxext6 libxfixes3 libxrandr2 libxtst6 \
    ca-certificates fonts-liberation libappindicator3-1 \
    libasound2 libatspi2.0-0 libcairo2 libfontconfig1 \
    libgtk-3-0 libpangoft2-1.0-0 libstdc++6 \
    lsb-release wget xdg-utils \
    # Clean up
    && rm -rf /var/lib/apt/lists/*

# Install uv
RUN pip install uv

# Copy only dependency definition files first for caching
COPY pyproject.toml ./
# Optional: If using requirements.txt
# COPY requirements.txt ./

# Install Python dependencies
RUN uv pip install --system --no-cache -e .
# Or if using requirements.txt:
# RUN uv pip install --system --no-cache -r requirements.txt

# Install Playwright browsers
RUN playwright install chromium --with-deps

# Copy the rest of the application
COPY . .

# Expose the port the app runs on
EXPOSE 8000

# Command to run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

📜 License

MIT License

🙏 Acknowledgements

browser-use for browser automation
@ricky0123/vad-react for voice activity detection
Next.js and FastAPI frameworks

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.cursor/rules		.cursor/rules
app		app
backend		backend
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
Project_design.md		Project_design.md
README.md		README.md
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Swift Browser Agent UI

🚀 Features

📋 Project Structure

🏗️ Architecture

📋 Prerequisites

🔧 Setup & Development

1. Clone Your Fork

2. Install Frontend Dependencies

3. Install Backend Dependencies

Option A - Recommended with uv

Option B - Using pip with requirements.txt

4. Install Playwright Browser

5. Return to Root

6. Set Up Environment Variables

Frontend (.env.local at the root)

Backend (backend/.env)

7. Run the Development Servers

Terminal 1 (Frontend)

Terminal 2 (Backend)

8. Test

🚀 Deployment

Frontend (Root /)

Backend (/backend)

Dockerfile Example (place in backend/)

📜 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

e8Complete/swift-browser-use

Folders and files

Latest commit

History

Repository files navigation

Swift Browser Agent UI

🚀 Features

📋 Project Structure

🏗️ Architecture

📋 Prerequisites

🔧 Setup & Development

1. Clone Your Fork

2. Install Frontend Dependencies

3. Install Backend Dependencies

Option A - Recommended with uv

Option B - Using pip with requirements.txt

4. Install Playwright Browser

5. Return to Root

6. Set Up Environment Variables

Frontend (.env.local at the root)

Backend (backend/.env)

7. Run the Development Servers

Terminal 1 (Frontend)

Terminal 2 (Backend)

8. Test

🚀 Deployment

Frontend (Root /)

Backend (/backend)

Dockerfile Example (place in backend/)

📜 License

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages