A robust framework enabling autonomous Android and computer control using any LLM (local or remote)
- Multi-platform support: Android devices and macOS computers
- Multiple LLM providers: OpenAI, Anthropic Claude, Google Gemini, and local Ollama models
- Flexible interfaces: CLI, API, and web-based Gradio interface
- Visual automation: Screenshot-based element detection and interaction
- Configurable execution: Customizable timeouts, delays, and coordinate settings
- Quick Start
- Prerequisites
- Installation
- Usage
- Configuration
- Model Recommendations
- Examples
- Troubleshooting
- Contributing
- License
-
Install the package:
pip install git+https://github.com/BandarLabs/clickclickclick.git
-
Set up API keys (choose one):
export OPENAI_API_KEY="your-openai-key" # OR export ANTHROPIC_API_KEY="your-anthropic-key" # OR export GEMINI_API_KEY="your-gemini-key"
-
Run a simple task:
click3 run "open Gmail and check for new messages"
- ADB (Android Debug Bridge): Install Android SDK Platform Tools
- USB Debugging: Enable on your Android device
- USB Connection: Connect device to computer
- Python 3.11+: Required for all functionality
- Accessibility Permissions: Grant to Terminal/IDE when prompted
- Python 3.11 or higher
- 4GB+ RAM recommended
- Internet connection for cloud LLM providers
pip install git+https://github.com/BandarLabs/clickclickclick.git
git clone https://github.com/BandarLabs/clickclickclick
cd clickclickclick
pip install -e .
click3 --help
Launch the interactive web interface:
click3 gradio
Features:
- Visual task input and monitoring
- Real-time screenshot feedback
- Model selection and configuration
- Task history and logs
Basic Usage:
click3 run "your task description"
Advanced Options:
click3 run "open calculator and compute 25 * 47" \
--platform=android \
--planner-model=openai \
--finder-model=gemini
Available Options:
--platform
: Target platform (android
orosx
)--planner-model
: Planning LLM (openai
,anthropic
,gemini
,ollama
)--finder-model
: Element detection LLM (openai
,anthropic
,gemini
,ollama
)
from clickclickclick.config import get_config
from clickclickclick.planner.task import execute_task
from clickclickclick.utils import get_executor, get_planner, get_finder
# Configure execution
config = get_config("android", "openai", "gemini")
executor = get_executor("android")
planner = get_planner("openai", config, executor)
finder = get_finder("gemini", config, executor)
# Execute task
success = execute_task(
"open the weather app",
executor, planner, finder, config
)
Start the API server:
uvicorn api:app --host 0.0.0.0 --port 8000
Execute tasks via HTTP:
curl -X POST "http://localhost:8000/execute" \
-H "Content-Type: application/json" \
-d '{
"task_prompt": "open calculator",
"platform": "android",
"planner_model": "openai",
"finder_model": "gemini"
}'
Response:
{"result": true}
Configuration is managed through config/models.yaml
. Key settings include:
openai:
api_key: !ENV OPENAI_API_KEY
model_name: gpt-4o-mini
image_width: 512
image_height: 512
gemini:
api_key: !ENV GEMINI_API_KEY
model_name: gemini-1.5-flash
image_width: 768
image_height: 768
executor:
android:
screen_center_x: 500
screen_center_y: 1000
scroll_distance: 1000
swipe_distance: 600
long_press_duration: 1000
Required API keys (set one or more):
OPENAI_API_KEY
: OpenAI GPT modelsANTHROPIC_API_KEY
: Anthropic Claude modelsGEMINI_API_KEY
: Google Gemini modelsOLLAMA_MODEL_NAME
: Local Ollama model name
Based on performance testing:
Use Case | Recommended Setup | Performance |
---|---|---|
Best Overall | Planner: GPT-4o, Finder: Gemini Flash | ⭐⭐⭐⭐⭐ |
Cost Effective | Planner: GPT-4o-mini, Finder: Gemini Flash | ⭐⭐⭐⭐ |
Privacy Focused | Planner: Ollama, Finder: Ollama | ⭐⭐⭐ |
Speed Optimized | Planner: Gemini Flash, Finder: Gemini Flash | ⭐⭐⭐⭐ |
Notes:
- Gemini Flash offers 15 free API calls daily
- GPT-4o provides the most reliable planning
- Ollama enables fully offline operation
- Anthropic Claude offers balanced performance
Gmail Task:
click3 run "create a draft email to someone@gmail.com asking about lunch plans for Saturday at 1PM"
Navigation:
click3 run "open Google Maps and find bus stops in Alanson, MI"
Gaming:
click3 run "start a 3+2 chess game on lichess"
Web Browsing:
click3 run "open Safari, go to news.ycombinator.com and read the top story" --platform=osx
System Tasks:
click3 run "open System Preferences and check the current display resolution" --platform=osx
ADB Connection Problems:
# Check device connection
adb devices
# Restart ADB server
adb kill-server
adb start-server
API Key Issues:
# Verify environment variables
echo $OPENAI_API_KEY
echo $GEMINI_API_KEY
# Set keys temporarily
export OPENAI_API_KEY="your-key-here"
Permission Errors (macOS):
- Grant Accessibility permissions in System Preferences > Security & Privacy
- Allow Terminal or your IDE to control other applications
Model-Specific Issues:
- Ollama: Ensure the model is downloaded (
ollama pull llama3.2-vision
) - Gemini: Check API quota at Google AI Studio
- OpenAI: Verify billing and usage limits
Enable detailed logging:
import logging
logging.basicConfig(level=logging.DEBUG)
- Reduce image resolution in
config/models.yaml
- Increase
TASK_DELAY
for slower devices - Use smaller models for faster response times
We welcome contributions! Please:
- Open an issue to discuss your idea
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
git clone https://github.com/BandarLabs/clickclickclick
cd clickclickclick
pip install -e ".[test]"
pytest
- iOS support via WebDriverAgent
- Windows support with Win32 APIs
- Voice command integration
- Multi-device orchestration
- Enhanced error recovery
- Plugin system for custom actions
This project is licensed under the MIT License. See the LICENSE file for details.
- 📖 Documentation: Check the examples and configuration sections
- 🐛 Bug Reports: Create an issue
- 💬 Discussions: GitHub Discussions
- ⭐ Star the repo if you find it useful!