Gemini Mac Pilot

Voice-controlled macOS agent powered by Gemini. Control your entire Mac just by talking — 24 tools across native apps, browser, and Google Workspace.

What It Does

Speak naturally and Mac Pilot executes actions on your Mac:

"Read my last 3 emails" — reads Gmail via Workspace API, summarizes
"Search for flights to London" — opens Google in your Chrome, searches
"Open WhatsApp and message Daniel" — opens app, finds contact, types message
"Organize my desktop by file type" — creates folders, moves files
"What's on my calendar this week?" — reads Google Calendar
"Create a Google Doc with meeting notes" — creates doc, returns URL
"Check my LinkedIn messages" — reads your real Chrome session via CDP

Architecture

                       USER'S MAC

  +--------------------------------------------+
  |       Floating UI Overlay (PyWebView)      |
  |  Mic waveform + status                     |
  |  Action steps with timing                  |
  |  Markdown result + stats                   |
  +---------------------+----------------------+
                        | WebSocket
              +---------+---------+
              |  Python Backend   |
              |                   |
              |  Gemini Live  <-- Voice I/O (bidirectional audio)
              |       |          |
              |  execute_task    |
              |       v          |
              |  Gemini Flash -- Brain (native function calling)
              |       |          |
              |  24 Tools        |
              |  - Accessibility | macOS AX API (any native app)
              |  - Keyboard      | type_text, press_keys
              |  - Browser (CDP) | Chrome via DevTools Protocol
              |  - Shell         | system commands
              |  - Workspace     | Gmail, Calendar, Drive, Docs
              +------------------+

Voice layer: Gemini Live API (native audio) handles bidirectional speech. When the user asks to do something, it calls execute_task.

Brain layer: Gemini 3 Flash Preview with native function calling. Reads the macOS accessibility tree, decides what tools to call, and executes multi-step workflows autonomously. Supports parallel function calls.

Tools (24 total):

Native macOS (8): open_app, find_app, click, set_value, focus, type_text, press_keys, shell
Browser (8): browse, read_page, get_links, click_text, browser_click, browser_type, search, chrome_js
Google Workspace (8): gmail_read, gmail_read_message, gmail_send, calendar_read, calendar_create, drive_list, drive_upload, docs_create

Tech Stack

Gemini Live API — native audio, bidirectional voice
Gemini 3 Flash Preview — native function calling, decision-making
Vertex AI — GCP-managed API access (billed to your project credits)
macOS Accessibility API — read and control any native app
Chrome DevTools Protocol — control user's real Chrome (Chrome 146+)
Google Workspace CLI — Gmail, Calendar, Drive, Docs without browser
PyWebView — lightweight native overlay window
WebSockets — real-time UI updates

Setup

1. Install dependencies

git clone https://github.com/cloudstudio/gemini-mac-pilot.git
cd gemini-mac-pilot

chmod +x setup.sh && ./setup.sh
playwright install chromium

2. Google Cloud (required)

# Install gcloud CLI
brew install google-cloud-sdk

# Authenticate
gcloud auth application-default login

# Configure project
cp .env.example .env
# Edit .env and set GCP_PROJECT to your project ID

Your GCP project needs the Vertex AI API enabled. New accounts get $300 free credits for 90 days.

3. Google Workspace (optional, for Gmail/Calendar/Drive)

brew install googleworkspace-cli
gws auth login

4. Chrome Remote Debugging (optional, for real browser sessions)

Open Chrome and go to chrome://inspect/#remote-debugging → enable the toggle. This lets Mac Pilot use your real Chrome with all your sessions/cookies instead of a standalone Chromium.

5. Accessibility Permissions

Go to System Settings > Privacy & Security > Accessibility and enable your terminal app.

Usage

# Voice + UI (full experience)
python main.py

# CLI mode (text input only, no voice)
python main.py cli

Cloud Deployment

Deploy the brain to Google Cloud Run:

chmod +x deploy.sh && ./deploy.sh

Requirements

macOS 13+
Python 3.11+
Google Cloud project with Vertex AI API enabled
gcloud CLI installed and authenticated
Accessibility permissions enabled
Google Chrome 146+ (for CDP browser control)
PortAudio (brew install portaudio)

Project Structure

gemini-mac-pilot/
├── mac_pilot/
│   ├── brain.py          # Gemini Flash brain loop
│   ├── voice.py          # Gemini Live API voice I/O
│   ├── prompts.py        # System prompts
│   ├── events.py         # Event bus (brain/voice → UI)
│   ├── config.py         # GCP project, model names
│   ├── tools/
│   │   ├── accessibility.py  # macOS AX API
│   │   ├── keyboard.py       # type_text, press_keys
│   │   ├── apps.py           # open_app, find_app
│   │   ├── browser.py        # Chrome CDP + Playwright
│   │   ├── shell.py          # shell commands
│   │   ├── workspace.py      # Gmail, Calendar, Drive, Docs
│   │   └── schema.py         # 24 tool declarations
│   └── ui/
│       ├── app.py            # PyWebView overlay
│       ├── server.py         # WebSocket server
│       └── static/           # HTML/CSS/JS (Google-style bar)
├── main.py                   # Entry point
├── cloud_api.py              # Cloud Run REST API
├── Dockerfile                # Cloud deployment
├── deploy.sh                 # One-command deploy
├── requirements.txt
└── setup.sh

Troubleshooting

"Not authorized" or accessibility errors: Enable your terminal in System Settings > Privacy & Security > Accessibility.

PortAudio errors: brew install portaudio, then re-run pip install pyaudio.

Chrome CDP not connecting: Go to chrome://inspect/#remote-debugging and enable the toggle. Click "Permitir" on the popup.

Workspace tools fail: Make sure gws is installed (brew install googleworkspace-cli) and authenticated (gws auth login).

GCP errors: Run gcloud auth application-default login and ensure Vertex AI API is enabled on your project.

Security

Mac Pilot has full access to your system via shell, accessibility API, and browser.
Commands are filtered for dangerous patterns but this is not a security sandbox.
Do not use with untrusted AI models or in production environments without review.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
mac_pilot		mac_pilot
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
architecture.png		architecture.png
cloud_api.py		cloud_api.py
deploy.sh		deploy.sh
icon.icns		icon.icns
icon.png		icon.png
icon.svg		icon.svg
main.py		main.py
requirements-cloud.txt		requirements-cloud.txt
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Mac Pilot

What It Does

Architecture

Tech Stack

Setup

1. Install dependencies

2. Google Cloud (required)

3. Google Workspace (optional, for Gmail/Calendar/Drive)

4. Chrome Remote Debugging (optional, for real browser sessions)

5. Accessibility Permissions

Usage

Cloud Deployment

Requirements

Project Structure

Troubleshooting

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemini Mac Pilot

What It Does

Architecture

Tech Stack

Setup

1. Install dependencies

2. Google Cloud (required)

3. Google Workspace (optional, for Gmail/Calendar/Drive)

4. Chrome Remote Debugging (optional, for real browser sessions)

5. Accessibility Permissions

Usage

Cloud Deployment

Requirements

Project Structure

Troubleshooting

Security

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors