Skip to content

LLmHub-dev/open-computer-use

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ’» Open Computer Use - Autonomous Computer Using Agents at Scale

Landing Page

Your AI Agent That Actually Uses Computers Like Humans Do

Open Computer Use is an open-source platform that gives AI agents real computer control through browser automation, terminal access, and desktop interaction. Built for developers who want to create truly autonomous AI workflows.

Website β€’ Discord β€’ X

License: Apache 2.0 Next.js FastAPI Docker PRs Welcome

Preview

Main Agent Animation


✨ What Makes This Special?

Unlike traditional AI assistants that only talk about tasks, Open Computer Use enables AI agents to actually perform them by:

  • 🌐 Browsing the web like a human (search, click, fill forms, extract data)
  • πŸ’» Running terminal commands and managing files
  • πŸ–±οΈ Controlling desktop applications with full UI automation
  • πŸ€– Multi-agent orchestration that breaks down complex tasks
  • πŸ”„ Streaming execution with real-time feedback
  • 🎯 100% open-source and self-hostable

"Computer use" capabilities similar to Anthropic's Claude Computer Use, but fully open-source and extensible.


🎬 See It In Action

Browser Automation

AI agent searching, navigating, and interacting with websites autonomously

Browser Automation Demo

▢️ Watch: AI Agent Browsing and Playing

Terminal Operations & Development

Executing commands, managing files, and running complex workflows

Terminal Operations Demo

▢️ Watch: Quant Trading & Research on QuantConnect

Multi-Agent Orchestration

Complex tasks broken down and executed by specialized agents

Multi-Agent Demo

▢️ Watch: Building Nvidia Options Dashboard

Advanced Features

Human-in-the-loop control and intelligent collaboration

Human Control Demo

▢️ Watch: AI Agent with Human Intervention


🎯 Core Capabilities

🌐 Browser Agent

  • Search-first strategy using Google Search API
  • Smart web navigation with automatic form filling
  • Element detection and intelligent clicking
  • Multi-tab management for parallel workflows
  • Page context extraction for AI understanding
  • Screenshot capture for visual verification

πŸ’» Terminal Agent

  • Command execution in isolated environments
  • File operations (read, write, edit, delete)
  • Directory management with full control
  • Script execution (Python, Node.js, bash)
  • Package installation and environment setup
  • Output streaming with real-time feedback

πŸ–±οΈ Desktop Agent

  • UI element detection using computer vision
  • Mouse and keyboard control for any application
  • Window management (focus, resize, arrange)
  • Screenshot analysis for context awareness
  • OCR capabilities for text extraction
  • Cross-platform support (Linux desktop)

πŸ€– Multi-Agent System

  • Task decomposition by AI planner
  • Sequential execution with context passing
  • Specialized agents for different capabilities
  • Error handling with automatic retries
  • User interaction when clarification needed
  • Execution reports with detailed summaries

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Frontend (Next.js 15)                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚  Chat UI     β”‚  β”‚  Model       β”‚  β”‚  VM          β”‚           β”‚
β”‚  β”‚  Components  β”‚  β”‚  Selection   β”‚  β”‚  Management  β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Backend API (FastAPI)                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚           Multi-Agent Executor Service                   β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚   β”‚
β”‚  β”‚  β”‚   Planner   β”‚β†’ β”‚   Browser   β”‚β†’ β”‚   Terminal  β”‚       β”‚   β”‚
β”‚  β”‚  β”‚    Agent    β”‚  β”‚    Agent    β”‚  β”‚    Agent    β”‚       β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚   WebSocket  β”‚  β”‚   Database   β”‚  β”‚   Billing    β”‚           β”‚
β”‚  β”‚   VM Control β”‚  β”‚   Service    β”‚  β”‚   Service    β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               Docker VM (Ubuntu 22.04 + XFCE)                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Chrome Browser  β”‚  Terminal  β”‚  Desktop Apps  β”‚  Tools  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚         WebSocket Agent Server (Port 8080)               β”‚   β”‚
β”‚  β”‚         VNC Server (Port 5900)                           β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Node.js 20+ and npm
  • Python 3.10+ and pip
  • Docker and Docker Compose
  • Supabase account (free tier works)
  • API keys for AI providers (OpenAI, Anthropic, etc.)

1. Clone the Repository

git clone https://github.com/LLmHub-dev/open-computer-use.git
cd open-computer-use

2. Set Up Supabase Database

Create Supabase Project

  1. Go to Supabase and create a new project
  2. Wait for the project to finish setting up
  3. Go to Project Settings β†’ API to get your keys

Run Database Schema

Execute the schema to create all required tables:

# Option A: Using Supabase Dashboard
# 1. Go to SQL Editor in your Supabase dashboard
# 2. Copy contents of supabase/schema.sql
# 3. Paste and run the SQL

# Option B: Using Supabase CLI (recommended)
npm install -g supabase
supabase login
supabase link --project-ref your-project-ref
supabase db push

Or manually run the schema file:

psql -h db.your-project.supabase.co -U postgres -d postgres -f supabase/schema.sql

This creates all necessary tables:

  • πŸ‘€ Users & Auth: users, user_preferences, user_keys
  • πŸ’¬ Chat System: chats, messages, chat_participants, chat_attachments
  • πŸ€– AI Agents: machine_sessions, machine_usage, machine_ai_actions
  • πŸ’³ Billing: user_credits, credit_transactions, stripe_customers, subscription_plans
  • πŸ“Š Projects: projects, user_machines, machine_snapshots

3. Set Up Environment Variables

# Frontend
cp .env.example .env
# Edit .env with your configuration

# Backend
cp backend/.env.example backend/.env
# Edit backend/.env with your configuration

Required Variables

Supabase (Required)

NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key-from-supabase-dashboard
SUPABASE_SERVICE_ROLE=your-service-role-key-from-supabase-dashboard

Security Keys (Required)

# Generate with: openssl rand -hex 32
ENCRYPTION_KEY=your-generated-32-byte-hex-string
CSRF_SECRET=your-generated-32-byte-hex-string

Google Search API (Required for web search)

GOOGLE_SEARCH_KEY=your-google-api-key
GOOGLE_SEARCH_CX=your-custom-search-engine-id

Get these from Google Cloud Console:

  1. Enable Custom Search API
  2. Create API key
  3. Create Custom Search Engine at programmablesearchengine.google.com

AI Provider Keys (Choose at least one)

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# Azure OpenAI (Optional)
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_VERSION=2024-02-15-preview

Azure Container Instances (Optional - for cloud VM deployment)

AZURE_SUBSCRIPTION_ID=your-subscription-id
AZURE_RESOURCE_GROUP=your-resource-group
AZURE_TENANT_ID=your-tenant-id
AZURE_CLIENT_ID=your-client-id
AZURE_CLIENT_SECRET=your-client-secret
AZURE_CONTAINER_REGISTRY=your-registry.azurecr.io
AZURE_DESKTOP_IMAGE=your-registry.azurecr.io/ai-desktop:latest

Stripe (Optional - for billing)

STRIPE_API_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...

4. Install Dependencies

# Frontend
npm install

# Backend
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cd ..

5. Start Development Servers

Option A: Using Docker (Recommended)

# Start all services
docker-compose up --build

# Access the application
# Frontend: http://localhost:3000
# Backend: http://localhost:8001

Option B: Manual Start

# Terminal 1: Frontend
npm run dev

# Terminal 2: Backend
cd backend
python main.py

# Terminal 3: AI Desktop (if needed)
docker-compose -f docker-compose.ai-desktop.yml up --build

6. Create Your First Agent Session

  1. Open http://localhost:3000
  2. Sign up / Log in with Supabase Auth
  3. Start a new chat
  4. Try a command: "Search for the latest AI news and summarize the top 3 articles"
  5. Watch your AI agent work! πŸŽ‰

🎨 Features

Multi-Provider AI Support

Connect your own API keys and switch between providers mid-conversation:

  • βœ… OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
  • βœ… Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
  • βœ… Google (Gemini Pro, Gemini 1.5)
  • βœ… Azure OpenAI (Enterprise deployments)
  • βœ… xAI (Grok models)
  • βœ… Mistral AI (Mistral Large, Mixtral)
  • βœ… Perplexity (Online models)
  • βœ… OpenRouter (Access to 100+ models)

Bring Your Own Keys (BYOK)

All API keys are encrypted and stored securely. You maintain full control over your AI costs and usage.

Real-Time Streaming

Watch your agents work in real-time with:

  • πŸ“Š Task progress indicators
  • πŸ› οΈ Tool call visualization
  • πŸ“Έ Live screenshots from VM
  • πŸ’¬ Streaming responses
  • πŸ“‹ Detailed execution logs

Advanced Task Planning

The AI automatically:

  1. Analyzes your request
  2. Breaks down into subtasks
  3. Assigns to specialized agents
  4. Executes with full context
  5. Reports detailed results

Secure VM Isolation

Each agent session runs in an isolated Docker container:

  • πŸ”’ Sandboxed execution environment
  • πŸ”„ Ephemeral containers (no data persistence)
  • 🌐 Network isolation options
  • πŸ“Š Resource limits and monitoring

πŸ“š Use Cases

πŸ” Research & Data Gathering

  • Web scraping and data extraction
  • Competitive analysis
  • Market research automation
  • Academic paper collection

πŸ§ͺ Testing & QA

  • Automated UI testing
  • Cross-browser testing
  • E2E test generation
  • Regression testing

πŸ“ Content Creation

  • Screenshot and documentation
  • Tutorial generation
  • Workflow recording
  • Demo creation

πŸ”§ DevOps & Automation

  • Server configuration
  • Deployment automation
  • Log analysis
  • System monitoring

πŸ›’ E-commerce Operations

  • Price monitoring
  • Product research
  • Order management
  • Inventory tracking

πŸ“Š Business Intelligence

  • Report generation
  • Dashboard monitoring
  • Data analysis workflows
  • KPI tracking

πŸ› οΈ Technology Stack

Frontend

  • Framework: Next.js 15 (App Router, React 19)
  • Language: TypeScript
  • Styling: Tailwind CSS 4
  • UI Components: Radix UI, shadcn/ui
  • State Management: Zustand
  • AI SDK: Vercel AI SDK
  • Database: Supabase (Auth + Postgres)
  • Payments: Stripe

Backend

  • Framework: FastAPI (Python 3.10+)
  • Async Runtime: asyncio, uvicorn
  • WebSocket: websockets library
  • AI Providers: openai, anthropic, google-generativeai
  • Search: Google Custom Search API
  • Caching: Redis (optional)
  • Image Processing: Pillow, ImageMagick

Infrastructure

  • Containerization: Docker, Docker Compose
  • VM Environment: Ubuntu 22.04 LTS + XFCE
  • Browser: Google Chrome (with remote debugging)
  • Automation: Selenium, Playwright, PyAutoGUI
  • Cloud: Azure Container Instances (optional)

🀝 Contributing

We love contributions! Here's how you can help:

πŸ› Found a Bug?

Open an issue with:

  • Clear description of the bug
  • Steps to reproduce
  • Expected vs actual behavior
  • Screenshots or logs

πŸ’‘ Have a Feature Idea?

  1. Check if it's already requested
  2. Open a new issue with the enhancement label
  3. Describe your use case and proposed solution

πŸ”§ Want to Contribute Code?

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Write tests if applicable
  5. Commit: git commit -m 'Add amazing feature'
  6. Push: git push origin feature/amazing-feature
  7. Open a Pull Request

Please read our Contributing Guide for detailed guidelines.


πŸ“– Documentation


πŸ—ΊοΈ Roadmap

Q1 2026

  • Multi-VM orchestration (parallel agents)
  • Advanced workflow builder (visual programming)
  • Marketplace for custom agents
  • Windows and macOS VM support
  • Mobile app (iOS/Android)

Q2 2026

  • Plugin system for custom tools
  • Collaborative agent sessions
  • Advanced analytics dashboard
  • Enterprise SSO support
  • Self-hosted cloud deployment guides

Future

  • Voice control integration
  • Video understanding capabilities
  • Agent memory and learning
  • Multi-modal agent interactions
  • Community agent templates

Vote on features: Feature Requests


πŸ“Š Performance & Benchmarks

Metric Value
Average Task Completion ~45 seconds
Concurrent Sessions 50+ (per server)
Browser Navigation ~2s per page
Tool Call Latency <500ms
VM Startup Time ~15 seconds
Memory per Session ~2GB

Benchmarks measured on: 4 CPU cores, 8GB RAM, SSD storage


⚠️ Responsible AI Use

Open Computer Use gives AI agents significant autonomy. Please use responsibly:

  • βœ… Do: Automate repetitive tasks, research, testing, content creation
  • ❌ Don't: Violate terms of service, spam, scrape without permission
  • πŸ”’ Security: Never share credentials, use isolated environments
  • πŸ“‹ Compliance: Follow data protection laws (GDPR, CCPA, etc.)
  • 🀝 Ethics: Respect website robots.txt and rate limits

Read our Responsible Use Guidelines for more details.


πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Apache License 2.0

Copyright (c) 2025 Open Computer Use Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

πŸ™ Acknowledgments

Built with amazing open-source projects:

Special thanks to all our contributors! πŸ’™


🌟 Star History

Star History Chart


πŸ’¬ Community & Support


⭐ Star us on GitHub if you find this useful!

Made with ❀️ by the Open Computer Use community

Star on GitHub β€’ Join Discord