Skip to content

auto-use/auto_use

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Auto Use Logo

Auto Use

One Click. Millions of Possibilities.

NOTE: Download the latest setup from the release section and run the installer.

Features โ€ข Agents โ€ข Models โ€ข Requirements


Auto Use Demo

๐Ÿ‘‡ Click here to watch full video demos

OS + Coding Demo ย ย  Coding Task Demo ย ย  OS Based Web Scraping


Benchmark Results


โœจ Features

๐Ÿ•ท๏ธ Undetectable Web Scraping

Scrape any website that traditional CDP-based tools can't touch. Auto Use drives a real browser through pure vision and sophisticated UI scanning โ€” no Chrome DevTools Protocol, no debugging ports, no injected scripts. The browser runs exactly as a human would use it, making detection virtually impossible while keeping your security fully intact.

๐Ÿ” Human-Like Screen Perception

Auto Use sees your screen the way you do. It captures screenshots, maps the depth and layering of every window, and identifies which icons, folders, options, and text are visible โ€” and how much of each is visible. This awareness lets the agent make precise, context-driven decisions about where to click, scroll, or type to complete your task.

๐Ÿง  Collaborative Multi-Agent Framework

Multiple specialized agents operate independently yet coordinate seamlessly when the task demands it, sharing context in real time. The framework intelligently decides which combination of agents can accomplish a task fastest: a GUI click here, a PowerShell command there, a quick web lookup in between โ€” all orchestrated automatically.

๐Ÿ“š Adaptive Context Intelligence

Agents are environment-aware. They detect which application or workflow they're operating in and pull relevant efficiency guidelines on the fly. Inject your own expertise โ€” whether it's app-specific shortcuts, internal processes, or operational playbooks โ€” and the system absorbs it instantly, sharpening its behavior to make every task faster and more seamless.

๐Ÿ”’ Sandboxed Execution

The CLI agent is confined to an isolated sandbox โ€” all coding and shell tasks run strictly inside it and cannot touch critical system paths like C:\Windows. Your OS stays protected while the agent builds, tests, and executes code freely within its boundaries.

๐Ÿ’พ 3-Stage Memory Management

A sophisticated three-stage memory system lets agents carry context well beyond a single context window. Long-running, multi-step sessions stay on track without information loss โ€” intelligent chunking, real-time context optimization, and priority-based compression all happen seamlessly in the background with zero delay, so the agent always knows exactly where it is and what's next.

โšก Kernel-Level Interaction

The GUI agent interfaces at the OS kernel level using low-level input drivers, enabling it to operate smoothly even in restricted scenarios like User Account Control (UAC) dialogs and elevated prompts that block conventional automation tools.

๐ŸŽ›๏ธ Multi-Provider Support

Choose from 20+ AI models across OpenRouter, Groq, OpenAI, and Anthropic. Switch providers based on speed, cost, or capability needs.


๐Ÿค– What You Can Ask

Just tell Auto Use what you need โ€” it figures out the rest.

๐Ÿ–ฅ๏ธ Desktop Automation

"Open Chrome, go to YouTube, and search for Python tutorials"

Interacts with any Windows application through vision โ€” clicks, types, scrolls, navigates menus, and verifies every step before moving on.

๐Ÿ’ป Terminal & System Tasks

"Check disk space and clean up temp files"

Executes PowerShell commands, navigates file systems, manages processes, and handles system operations โ€” all inside a secure sandbox.

๐Ÿ‘จโ€๐Ÿ’ป Code Generation & Editing

"Create a Python Flask API with user authentication"

Writes new files, edits existing code with precision, debugs errors, runs tests, and cleans up โ€” without ever leaving the sandbox.

๐ŸŒ Real-Time Web Lookup

"Find the latest NVIDIA stock price and quarterly revenue"

Searches multiple sources, extracts and summarizes data in real time, and feeds findings directly into the ongoing task.


๐ŸŽฏ What Can Auto Use Do?

Category Examples
Browser Fill forms, extract data, navigate sites, download files
Productivity Create documents, manage spreadsheets, organize files
Development Write code, debug errors, run tests, manage git
System Install software, configure settings, manage processes
Research Search web, compile information, generate reports

๐Ÿง  Supported Models

Auto Use supports 20+ vision-language models across 4 providers.

OpenRouter

Access multiple AI providers through a single API.

Model API Name / Short Name Reasoning
Gemini 2.5 Pro google/gemini-2.5-pro โœ…
Gemini 2.5 Flash google/gemini-2.5-flash โœ…
Gemini 2.5 Flash Lite google/gemini-2.5-flash-lite โœ…
Gemini 3 Pro Preview google/gemini-3-pro-preview โœ…
Gemini 3 Flash Preview google/gemini-3-flash-preview โœ…
Gemini 3.1 Pro google/gemini-3.1-pro โœ…
GPT-5.1 openai/gpt-5.1 โœ…
GPT-5.2 openai/gpt-5.2 โœ…
GPT-5 Pro openai/gpt-5-pro โŒ
Claude Sonnet 4.5 anthropic/claude-sonnet-4.5 โœ…
Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 โœ…
Grok 4 Fast x-ai/grok-4-fast โœ…
Grok 4.1 Fast x-ai/grok-4.1-fast โœ…
Kimi K2.5 moonshotai/kimi-k2.5 โœ…

๐Ÿ”— Get API Key: openrouter.ai/keys


Groq

Ultra-fast inference with open-source models.

Model API Name / Short Name Vision Notes
GPT-OSS 120B openai/gpt-oss-120b โ€” Coding agent only
Llama 4 Scout 17B meta-llama/llama-4-scout-17b-16e-instruct โœ…

๐Ÿ”— Get API Key: console.groq.com/keys


OpenAI Direct

Direct access to OpenAI's latest models.

Model API Name Reasoning
GPT-5.1 gpt-5.1 โœ…
GPT-5.2 gpt-5.2 โœ…

๐Ÿ”— Get API Key: platform.openai.com/api-keys


Anthropic Direct

Native access to Anthropic's Claude models.

Model Model name / API name
Claude Sonnet 4.6 claude-sonnet-4.6 (claude-sonnet-4-6)
Claude Sonnet 4.5 claude-sonnet-4.5 (claude-sonnet-4-5-20250929)
Claude Haiku 4.5 claude-haiku-4.5 (claude-haiku-4-5-20251001)
Claude Opus 4.5 claude-opus-4.5 (claude-opus-4-5-20251101)

๐Ÿ”— Get API Key: console.anthropic.com


๐ŸŽฎ Model Selection Guide

Use Case Recommended Model Why
Fast & Cheap gemini-3-flash Great balance of speed and capability
Most Capable claude-sonnet-4.5 / claude-4.6 / gemini-3.1-pro Best reasoning for complex tasks
Ultra-Fast llama-4-scout (Groq) Lowest latency
Coding agent gpt-oss-120b (Groq) Coding agent only
Best Vision claude-sonnet-4.5 / claude-4.6 (Anthropic) Excellent UI understanding

๐Ÿ“‹ Requirements

  • Windows 10/11 (64-bit)
  • API Key from any supported provider

๐Ÿ›ก๏ธ Safety

  • Sandbox Isolation โ€” Code runs in a protected environment
  • No System Modification โ€” Won't delete files or run destructive commands without permission
  • UAC Awareness โ€” Asks for confirmation before accepting elevation prompts
  • Path Protection โ€” Blocks access to critical system folders

๐ŸŒŸ Why Auto Use?

Feature Auto Use Others
Multi-agent system โœ… โŒ
knowledge system โœ… โŒ
20+ model support โœ… Limited
Vision-based automation โœ… โœ…
Coding agent โœ… โŒ
Web search integration โœ… โŒ
Secure sandbox โœ… โŒ

๐Ÿ’ป OS Support

Operating System Status
Windows โœ… Supported
macOS ๐Ÿšง Coming Soon
Linux ๐Ÿšง Coming Soon

About

One framework. One click. Unlimited automation.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors