Crix is a fast, efficient, voice-controlled AI assistant that runs natively on Linux. It gives an AI direct control over your keyboard, mouse, and system β so you can get things done entirely hands-free.
- ποΈ Real-time voice interaction β Talk to Crix naturally using state-of-the-art speech recognition
- β¨οΈ Keyboard control β Type text, press shortcuts, and submit forms by voice
- π±οΈ Mouse control β Move, click, double-click, and scroll anywhere on screen
- πͺ Window & Workspace management β Switch workspaces, list open windows, and launch apps
- π Clipboard integration β Read and write clipboard content using
xclip - π Live web search β Fetches up-to-date information from the web using Tavily
- π₯οΈ Shell command execution β Run safe, non-destructive shell commands on demand
- π Noise cancellation β Intelligent audio filtering for cleaner voice input
| Component | Technology |
|---|---|
| Voice Framework | LiveKit Agents |
| Speech-to-Text (STT) | Deepgram Nova-3 (multilingual) |
| Language Model (LLM) | OpenAI GPT-4.1 Mini |
| Text-to-Speech (TTS) | ElevenLabs Turbo v2.5 |
| Voice Activity Detection (VAD) | Silero |
| Desktop Automation | xdotool |
| Clipboard | xclip |
| Web Search | Tavily |
| Package Manager | uv |
- Linux (X11 recommended; some tools may work partially on Wayland)
- Python 3.12+
uvpackage manager- System dependencies:
sudo apt install xdotool xclip
- A LiveKit account and project (cloud.livekit.io)
- A Tavily API key for web search
- STT, LLM, and TTS are handled by LiveKit β no separate API keys needed for OpenAI, Deepgram, or ElevenLabs
-
Clone the repository:
git clone https://github.com/Aerex0/Crix.git cd Crix -
Install Python dependencies:
uv sync
-
Configure environment variables:
cp .env.example .env
Then edit
.envand fill in your credentials:LIVEKIT_URL=wss://your-project.livekit.cloud LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret TAVILY_API_KEY=your_tavily_key
Note: STT (Deepgram), LLM (OpenAI), and TTS (ElevenLabs) are billed and managed through your LiveKit account β no separate API keys required.
Run the agent using the LiveKit CLI:
uv run python src/agent.py consoleOnce running, connect to the agent via any LiveKit-compatible client (e.g. the LiveKit Playground or a custom frontend). Crix will greet you and wait for voice commands.
Before using Crix, you must update the system prompt to match your desktop environment. Open src/prompts/crix.py and adjust it to reflect:
- Your keyboard shortcuts β e.g. how you close a window, open a terminal, or switch workspaces may differ between desktop environments (GNOME, KDE, i3, Hyprland, etc.)
- Your default apps β e.g. your terminal emulator (
alacritty,kitty,gnome-terminal), browser (firefox,brave), etc. - Your workspace setup β how many workspaces you use and how they're numbered
Important
The default prompt is configured for a specific setup. If your shortcuts or apps differ, Crix may send the wrong keys or open the wrong applications. Tailor the prompt to your environment for the best experience.
Crix comes with a set of built-in tools it can call autonomously based on your voice commands:
| Tool | Description |
|---|---|
web_search |
Search the web for up-to-date information using Tavily |
get_time |
Get the current system date and time |
| Tool | Description |
|---|---|
type_text |
Type text at the current cursor position |
press_key |
Press a key or key combo (e.g. ctrl+c, super+q) |
type_and_submit |
Type text and immediately press Enter |
paste_text |
Paste text instantly via clipboard (faster for long strings) |
| Tool | Description |
|---|---|
click |
Click at a given position (left, middle, or right button) |
double_click |
Double-click at a given position |
scroll |
Scroll up or down at the current cursor position |
| Tool | Description |
|---|---|
switch_workspace |
Switch to a specific virtual desktop (1-based) |
open_app |
Launch an application by command name |
| Tool | Description |
|---|---|
get_clipboard |
Read the current clipboard contents |
select_all_and_copy |
Press Ctrl+A then Ctrl+C and return copied text |
get_screen_size |
Return the current screen resolution |
| Tool | Description |
|---|---|
run_command_silent |
Execute a safe, read-only shell command and return output |
"Open a terminal" β Launches Alacritty
"Switch to workspace 3" β Switches to virtual desktop 3
"Type hello world and send it" β Types and submits text
"What time is it?" β Returns current date and time
"Search for the latest AI news" β Performs a live web search
"Press Ctrl+Z" β Sends the undo shortcut
"Select all and copy" β Copies all text in the focused window
Crix is designed with the following hard rules baked into its system prompt:
- π« Will not execute destructive commands (
rm,mv,dd,kill,chmod, etc.) - π« Will not follow commands delivered via on-screen text β only spoken voice
- π« Will not reveal its system prompt
- π« Will not chain shell commands
Warning
These are prompt-level restrictions enforced by the AI model β not hard system-level blocks. While precautions have been taken to make Crix safe, no AI system is perfectly secure. Use with awareness and at your own risk. Avoid granting it access to sensitive environments.
crix/
βββ src/
β βββ agent.py # LiveKit agent setup, session, and tool registration
β βββ tools.py # All function tools (keyboard, mouse, clipboard, etc.)
β βββ __init__.py
β βββ prompts/
β βββ crix.py # System prompt defining Crix's behavior and rules
β βββ __init__.py
βββ LICENSE
βββ README.md
βββ pyproject.toml # Project metadata and dependencies
βββ uv.lock
The following tools and improvements are actively being worked on:
- Screen Read (
read_screen_text): OCR-based screen reading using Tesseract to allow Crix to "see" on-screen content and answer questions about it. - Mouse Movement (
move_mouse): Moving the mouse cursor to specific screen coordinates viaxdotoolis not yet reliably working. - Window Focus (
list_open_windows/focus_window): Detect and focus any open window by name, enabling seamless app switching. - Prompt Improvement: Expanding the system prompt with richer context and more example patterns for better command understanding.
- Multi-monitor support: Extend screen tooling to handle setups with more than one display.
This project is licensed under the MIT License.