A desktop application that uses Gemini 3 Flash to see your screen and perform actions based on your instructions.
- Python 3.10+
- A Google Gemini API Key
-
Install dependencies:
pip install -r requirements.txt
-
Set your Google API Key: Create a
.envfile in the root directory and add:GOOGLE_API_KEY=your_api_key_here
Run the application:
python main.pyEnter an instruction (e.g., "Find the latest news on Google") and click Run Agent.
- Vision: Uses Gemini 3 Flash to interpret screenshots.
- Automation: Can click, double-click, type, scroll, and drag.
- Safety:
pyautoguifailsafe is enabled. Move your mouse to any corner of the screen to stop the agent.
This agent has full control over your computer. Use it with caution and never leave it unattended while it is running.