Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
5eaeac3
small updates
miguelg719 Sep 19, 2025
a8fe6fb
changes
miguelg719 Sep 19, 2025
5b455df
render last reasoning
miguelg719 Sep 19, 2025
ff3ba00
add sdk to repo
miguelg719 Sep 26, 2025
bf699bf
fix build errors:
miguelg719 Sep 26, 2025
0ce9fec
error description
miguelg719 Sep 26, 2025
a82f7ef
adding stagehand package to next config
Kylejeong2 Sep 28, 2025
a079f57
update tailwind config ts ignore and exclude sdk from build process
Kylejeong2 Sep 28, 2025
2c165ea
updated sdk
miguelg719 Oct 6, 2025
1ea7c9c
probability based region routing (#13)
sameelarif Oct 6, 2025
993dc58
Revert "probability based region routing (#13)"
sameelarif Oct 6, 2025
fd6bf8b
redo region routing
sameelarif Oct 6, 2025
958d8c8
small updates
miguelg719 Sep 19, 2025
1fae745
changes
miguelg719 Sep 19, 2025
dc7bd03
render last reasoning
miguelg719 Sep 19, 2025
c76cd19
add sdk to repo
miguelg719 Sep 26, 2025
3200b83
adding stagehand package to next config
Kylejeong2 Sep 28, 2025
14f392f
update tailwind config ts ignore and exclude sdk from build process
Kylejeong2 Sep 28, 2025
cf2346a
updated sdk
miguelg719 Oct 6, 2025
96c4acf
probability based region routing (#13)
sameelarif Oct 6, 2025
face517
Revert "probability based region routing (#13)"
sameelarif Oct 6, 2025
1820eef
no inline import
sameelarif Oct 6, 2025
4620227
redo region routing
sameelarif Oct 6, 2025
11cc5b1
add suspense wrapper to home component for search params
Kylejeong2 Oct 6, 2025
f3d9381
pull from edge config
sameelarif Oct 6, 2025
9da2e38
Merge branch 'sameel/flags-new' into sameel/edge-config
sameelarif Oct 6, 2025
880651a
Merge branch 'main' into sameel/edge-config
sameelarif Oct 6, 2025
381115f
fix imports
sameelarif Oct 6, 2025
b45306d
Merge branch 'sameel/flags-new' of https://github.com/browserbase/pri…
sameelarif Oct 7, 2025
968f315
Merge branch 'sameel/flags-new' into sameel/edge-config
sameelarif Oct 7, 2025
33a83eb
Merge pull request #15 from browserbase/sameel/edge-config
sameelarif Oct 7, 2025
3cab49b
pull region dist. from edge config
sameelarif Oct 7, 2025
60bb639
kj/final
Kylejeong2 Oct 7, 2025
d990b28
url names
Kylejeong2 Oct 7, 2025
0364810
remove animation cursorrules
Kylejeong2 Oct 7, 2025
735d6ed
parse final message into markdown
Kylejeong2 Oct 7, 2025
0795339
new modelname + og image + readme
Kylejeong2 Oct 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 0 additions & 62 deletions .cursorrules

This file was deleted.

4 changes: 3 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,6 @@ BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here
NEXT_PUBLIC_POSTHOG_HOST=https://us.i.posthog.com
NEXT_PUBLIC_POSTHOG_KEY=your_public_posthog_key_here

NEXT_PUBLIC_SITE_URL=http://localhost:3000
NEXT_PUBLIC_SITE_URL=http://localhost:3000

EDGE_CONFIG=your_edge_config_url
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,5 @@ target/
# pnpm

test/
.vercel
.env*.local
204 changes: 88 additions & 116 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,113 +1,108 @@
# Google CUA Browser
# Gemini CUA Browser

A powerful browser automation playground powered by Google's new Computer Use Agent and Browserbase. This free demo showcases the capabilities of AI-driven browser automation using Stagehand and Google's computer-use capabilities.
[Demo](https://gemini.browserbase.com)

A powerful browser automation playground powered by Gemini's new Computer Use Agent and Browserbase. This free demo showcases the capabilities of AI-driven browser automation using Stagehand and Gemini's computer-use capabilities.

## Features

- 🤖 **AI-Powered Browser Control**: Uses Google's Computer Use model to interact with web pages naturally
- 🌐 **Real Browser Environment**: Runs on actual Chrome browsers via Browserbase
- 🎯 **Natural Language Commands**: Simply describe what you want to do in plain English
- 📊 **Real-time Feedback**: Watch the AI navigate, click, type, and interact with websites
- 📝 **Rich Markdown Support**: AI responses rendered with proper formatting, code blocks, and typography
- 🔄 **Session Management**: Persistent browser sessions with tab management
- 🖼️ **Non-Interactive Preview**: View-only browser iframe prevents accidental user interference
- 🤖 **Gemini Computer Use Agent**: Leverages Gemini's `computer-use-preview-10-2025` model for intelligent web interactions
- 🌐 **Real Browser Control**: Runs on browsers via Browserbase's infrastructure
- 🎯 **Natural Language Commands**: Describe tasks in plain English and watch the AI execute them
- 📊 **Real-time Streaming**: Server-Sent Events (SSE) for live agent feedback and progress updates
- 🔄 **Session Management**: Persistent browser sessions with automatic viewport management

## Tech Stack

- **Frontend**: Next.js 15 with TypeScript, React 19, and Tailwind CSS
- **AI Model**: Google Computer Use
### Frontend
- **Framework**: Next.js 15 with React 19 and TypeScript
- **Styling**: Tailwind CSS with custom fonts (PP Neue, PP Supply)
- **Animation**: Framer Motion for smooth transitions
- **Icons**: Lucide React
- **Markdown**: ReactMarkdown with GitHub Flavored Markdown (remark-gfm)

### Backend
- **AI Model**: Gemini Computer Use (`computer-use-preview-10-2025`)
- **Browser Automation**: Browserbase + Stagehand
- **Streaming**: Server-Sent Events (SSE) for real-time updates
- **UI Components**: Framer Motion animations, Lucide React icons
- **Markdown Rendering**: ReactMarkdown with GitHub Flavored Markdown support
- **Agent Framework**: Stagehand with Playwright Core
- **Streaming**: Server-Sent Events (SSE)
- **Runtime**: Node.js with Next.js API routes

### Infrastructure
- **Analytics**: PostHog for user tracking
- **Configuration**: Vercel Edge Config for region distribution
- **Deployment**: Optimized for Vercel with 600s max duration

## Prerequisites

- Node.js 18.x or later
- pnpm (recommended) or npm
- API keys for Google AI Studio and Browserbase
- pnpm 10.x or later (recommended)
- API keys:
- [Google AI Studio](https://aistudio.google.com/apikey) - for Computer Use Agent
- [Browserbase](https://www.browserbase.com) - for browser infrastructure

## Getting Started

1. **Clone the repository:**
```bash
git clone https://github.com/browserbase/google-cua-browser.git
cd -browser
```

2. **Install dependencies:**
```bash
pnpm install
# or
npm install
```

3. **Set up environment variables:**
```bash
cp .env.example .env.local
```

Then edit `.env.local` with your API keys:
```env
# Google API Configuration
GOOGLE_API_KEY=your_google_api_key_here

# Browserbase Configuration
BROWSERBASE_API_KEY=your_browserbase_api_key_here
BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here

# Optional: Analytics and monitoring
NEXT_PUBLIC_POSTHOG_KEY=your_posthog_key
NEXT_PUBLIC_POSTHOG_HOST=https://us.i.posthog.com

# Site URL (for local development)
NEXT_PUBLIC_SITE_URL=http://localhost:3000
```

**Get your API keys:**
- Google API: [Google AI Studio](https://aistudio.google.com/apikey)
- Browserbase: [Browserbase Dashboard](https://www.browserbase.com)

4. **Start the development server:**
```bash
pnpm dev
# or
npm run dev
```

5. **Open your browser:**
Navigate to [http://localhost:3000](http://localhost:3000)
### 1. Clone the repository
```bash
git clone https://github.com/browserbase/gemini-cua-browser.git
cd gemini-cua-browser
```

## Usage
### 2. Install dependencies
```bash
pnpm install
```

1. **Start a Session**: Click "New Session" to initialize a browser instance
2. **Enter Commands**: Type natural language instructions like:
- "Go to google.com and search for AI news"
- "Navigate to GitHub and explore trending repositories"
- "Fill out the contact form on this page"
3. **Watch the Magic**: The AI will interpret your request and perform the actions
4. **View Results**: See real-time updates with rich markdown formatting including code blocks, lists, and formatted text
### 3. Configure environment variables
```bash
cp .env.example .env.local
```

## Key Components
Edit `.env.local` with your credentials:
```env
# Google AI Studio API Key
GOOGLE_API_KEY=your_google_api_key

- **Stream API** (`/api/agent/stream`): Handles real-time agent execution with SSE
- **Session Management** (`/api/session`): Creates and manages Browserbase sessions
- **Agent Integration**: Uses Stagehand with Google's Computer Use for browser automation
- **Markdown Chat**: AI responses support rich text formatting with code syntax highlighting
- **Browser Preview**: Non-interactive iframe for viewing agent actions without user interference
- **UI Components**: Modern, animated interface with real-time updates
# Browserbase Configuration
BROWSERBASE_API_KEY=your_browserbase_api_key
BROWSERBASE_PROJECT_ID=your_browserbase_project_id

## Codebase Optimization
# Optional: Analytics
NEXT_PUBLIC_POSTHOG_HOST=https://us.i.posthog.com
NEXT_PUBLIC_POSTHOG_KEY=your_posthog_key

This project has been optimized for production deployment:
# Optional: Site URL
NEXT_PUBLIC_SITE_URL=http://localhost:3000

- **Clean Dependencies**: Removed all unused npm packages and dev dependencies
- **Asset Optimization**: Eliminated unused images, fonts, and static files
- **Type Safety**: Cleaned up unused TypeScript types and interfaces with proper ReactMarkdown component typing
- **Bundle Size**: Reduced bundle size by removing dead code and unused imports
- **UI Components**: Modular markdown rendering components for consistent styling
- **Performance**: Optimized for Vercel deployment with proper runtime configuration
# Optional: Vercel Edge Config
EDGE_CONFIG=your_edge_config_url
```

### 4. Start the development server
```bash
pnpm dev
```

### 5. Open your browser
Navigate to [http://localhost:3000](http://localhost:3000)

## Usage

1. **Enter a Command**: Type a natural language instruction or select a preset example:
- "What's the price of NVIDIA stock?"
- "Review a pull request on Github"
- "Browse Hacker News for trending debates"
- "Play a game of 2048"

2. **Watch the Agent**: The AI will:
- Create a browser session
- Navigate to relevant websites
- Interact with page elements (click, type, scroll)
- Take screenshots to verify actions
- Stream real-time progress updates

3. **View Results**: See the agent's reasoning, actions, and final response in rich markdown format

## Available Scripts

Expand All @@ -121,44 +116,21 @@ pnpm build
# Start production server
pnpm start

# Run linting
# Lint code
pnpm lint
```

## Configuration

The agent is configured with specific behaviors:
- Works in atomic steps (one action at a time)
- Prefers direct navigation over search
- Avoids risky actions unless necessary
- Fixed viewport at 1024x768 pixels
- Automatic screenshot capture after actions

## Limitations

- Maximum session duration: 10 minutes (Vercel timeout)
- Viewport locked at 1024x768 pixels
- No keyboard shortcuts support (uses click + type instead)
- Browser sessions are temporary and will expire

## Troubleshooting

- **Session fails to start**: Check your Browserbase API credentials
- **Agent not responding**: Verify your Google API key has access to Google Computer Use
- **Timeout errors**: Complex tasks may exceed the 10-minute limit
- **Connection issues**: Ensure stable internet connection for browser streaming

## Contributing

This is a demo playground project. Feel free to fork and experiment!
This is a demo project showcasing Gemini Computer Use Agent capabilities. Feel free to fork and experiment!

## License

MIT

## Acknowledgments

- [Browserbase](https://browserbase.com) for browser infrastructure
- [Stagehand](https://github.com/browserbasehq/stagehand) for automation framework
- [Google AI Studio](https://aistudio.google.com/) for AI capabilities
- [Vercel](https://vercel.com) for hosting and edge functions
- [Browserbase](https://browserbase.com) - Browser infrastructure and remote browser sessions
- [Stagehand](https://github.com/browserbasehq/stagehand) - Browser automation framework with AI capabilities
- [Google AI Studio](https://aistudio.google.com/) - Computer Use Agent API
- [Vercel](https://vercel.com) - Hosting, edge functions, and edge config
10 changes: 4 additions & 6 deletions app/api/agent/stream/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@ function sseComment(comment: string): Uint8Array {

export async function GET(request: Request) {
const { searchParams } = new URL(request.url);
const [sessionId, goal, fromChat] = [
const [sessionId, goal] = [
searchParams.get("sessionId"),
searchParams.get("goal"),
searchParams.get("fromChat") === "true"
];

if (!sessionId || !goal) {
Expand Down Expand Up @@ -122,7 +121,6 @@ export async function GET(request: Request) {
width: 1288,
height: 711,
},
solveCaptchas: !fromChat, // false if session is from a search param, true otherwise
},
},
useAPI: false,
Expand All @@ -139,14 +137,14 @@ export async function GET(request: Request) {
send("start", {
sessionId,
goal,
model: "computer-use-preview-10-2025",
model: "gemini-2.5-computer-use-preview-10-2025",
init,
startedAt: new Date().toISOString(),
});

const agent = stagehand.agent({
provider: "google",
model: "computer-use-preview-10-2025",
model: "gemini-2.5-computer-use-preview-10-2025",
options: {
apiKey: process.env.GOOGLE_API_KEY,
},
Expand Down Expand Up @@ -200,4 +198,4 @@ export async function GET(request: Request) {
"X-Accel-Buffering": "no",
},
});
}
}
Loading