Commander

The Commander conversational AI agent handles the mundane tasks that every enterprise employee has to deal with, dictation for creating documents, drafting emails & prompts, taking and summarizing meeting notes and planning multistep workflows to complete complex tasks. By executing API calls, desktop automations and filesystem built in tools, tasks can be converted to systems and repeated. Ultimately freeing up users to focus on the core problems of their jobs.

Commander is a background voice agent for Windows. Double-press a function key to activate one of three modes — dictation, meeting recorder, or computer control — without leaving whatever you are doing.

Modes

Mode	Hotkey	What it does
Dictation	double F12	Types transcribed speech at the cursor in the foreground window. Works in any app with a focused text field (browsers, editors, chat apps).
Meeting	double F11	Records a continuous timestamped transcript of a meeting to `~/meeting_YYYY-MM-DD_HH-MM.txt`.
Command	double F10	Desktop automation via voice (in development).

Pressing the hotkey a second time while a mode is active deactivates it. Switching directly from one mode to another is supported — the current mode stops first.

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Set your Speechmatics API key

setx SPEECHMATICS_API_KEY "your_api_key_here"
# Restart your terminal after setx so the variable is picked up

3. Run

python main.py

The app starts silently in the system tray. Right-click the tray icon to access the menu.

Tray Menu

✓ Dictation   (double F12)
  Meeting     (double F11)
  Command     (double F10)
  ──────────────────────────
  Input Device ▶
    • Microphone Array (Realtek)   ← radio-checked = active device
      Headset Microphone
  ──────────────────────────
  Quit

Checkmarks on the three mode items reflect the currently active mode. The Input Device submenu lists physical microphones only (loopback and virtual devices are filtered out). Switching device while a mode is active briefly pauses and resumes recording on the new device.

Dictation Mode

Activating Dictation streams microphone audio to the Speechmatics real-time API. Each finalised segment is typed at the cursor using pyautogui.

Text is only typed when the foreground window has keyboard focus (classic Win32 caret or hwndFocus for browsers and Electron apps).
If no focused text field is detected the segment is printed to the console and discarded.

Meeting Mode

Activating Meeting opens a new transcript file:

~/meeting_2026-05-18_14-30.txt

Every finalised segment is appended on its own line. The file is flushed after each write so it is readable in real time. A closing timestamp is written when the mode is deactivated or the app quits.

Command Mode

Status: placeholder. Voice input is received and printed to the console. A desktop automation agent will be wired in here.

Planned capabilities:

Open, close, and switch applications by name.
Dictate shell commands.
Control the mouse and keyboard via voice instructions.

Finding Your Microphone

If the default device does not work, run:

python find_mic.py

This lists all usable physical input devices with their PyAudio index. Pass the correct index when constructing TrayApp:

# main.py
app = TrayApp(device_index=3)

Architecture

main.py
└── TrayApp                  tray_app.py
    ├── AppMode (enum)        IDLE / DICTATION / MEETING / COMMAND
    ├── DoubleKeyListener     hotkey.py   — pynput-based double-press detector
    ├── SpeechmaticsAgent     speech_client.py
    │   └── asyncio event loop in daemon thread
    │       ├── VoiceAgentClient  (speechmatics-voice)
    │       └── Microphone        (speechmatics-rt / pyaudio)
    └── utils.py
        ├── caret_available()    — foreground-window focus check
        ├── get_input_devices()  — filtered PyAudio device list
        └── get_default_input_device_index()

Environment Variables

Variable	Required	Description
`SPEECHMATICS_API_KEY`	Yes	API key from the Speechmatics dashboard

License

MIT — see LICENSE.

Double-press F12 to enable/disable listening. When enabled, finished segments will be typed into the focused text input if a caret exists.

Notes

The implementation uses the official Speechmatics Python packages; if they are not installed or the API key is missing, the agent will not start but will print instructions.
To enable ML turn detection features, install the smart extras: pip install "speechmatics-voice[smart]".
The caret detection is a Windows-specific heuristic (uses GetGUIThreadInfo).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
commander		commander
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Commander

Modes

Quick Start

1. Install dependencies

2. Set your Speechmatics API key

3. Run

Tray Menu

Dictation Mode

Meeting Mode

Command Mode

Finding Your Microphone

Architecture

Environment Variables

License

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Commander

Modes

Quick Start

1. Install dependencies

2. Set your Speechmatics API key

3. Run

Tray Menu

Dictation Mode

Meeting Mode

Command Mode

Finding Your Microphone

Architecture

Environment Variables

License

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages