Control your Linux desktop from Telegram. Send a message, get a screenshot back. Type text, click coordinates, open apps, transfer files, schedule tasks — all through your phone.
Nullhand is a Telegram bot that runs as a background process on your Linux machine and gives you full desktop control over chat. You send natural language or slash commands; the bot acts on your screen in real time.
It ships with an OTP session gate, a structured audit log, a built-in scheduler, bidirectional file transfer, and a pluggable AI backend (local rule-based parser included — no API key required to get started).
While Nullhand is your ultimate command center for desktop, we've built a specialized, enterprise-grade tool specifically for servers.
AXGhost is a zero-cloud, zero-AI Go daemon that transforms Telegram into a highly secure, natural-language administration console for your server infrastructure.
✨ Why AXGhost?
- 🌍 Bilingual Brilliance: Seamlessly execute commands using natural language in both Arabic and English.
- 🛡️ Ironclad Security: Strict whitelisted execution ensures only explicitly approved, safe operations can run.
- 🔐 100% Private (Zero-Cloud / Zero-AI): A completely self-contained daemon. No third-party APIs, no AI token limits, and absolutely no data leakage.
- 📜 Full Audit Trail: Comprehensive logging provides total visibility over every system action you take.
👉 Discover AXGhost on the AevonX Marketplace 💻 View the Open Source Repository on GitHub
- Natural language control — "open Firefox", "take a screenshot", "type Hello World"
- Bilingual parser (English + Arabic) — "افتح فايرفوكس وروح إلى github.com", "ابحث في الإعدادات عن WiFi", "اضغط زر إرسال"
- Smart recipes with state verification — recipes wait for windows to appear, OCR-click search results, and clear fields before typing instead of relying on fixed sleep timers
- Multi-step app workflows out of the box — WhatsApp send (with contact-picker selection via OCR), browser URL navigation, Settings search, generic button click — all callable from one phrase
- Author your own recipes from Telegram — "save this as recipe morning_routine: open Firefox, go to news.com" / "احفظ هذا كروتين الصباح: …" — parsed and persisted to
~/.nullhand/recipes.json - Voice notes → text — record a Telegram voice note in any language; Nullhand transcribes it via whisper.cpp (Arabic + English bilingual support) and runs it as if you typed it
- Conversation memory — follow-up commands fall back to recently-used entities ("open Firefox" then "go to github.com" remembers Firefox; "send hi" remembers the last contact)
- Preview / dry-run mode — see exactly what a command will do before running it:
preview: …/معاينة: …returns a numbered execution plan with no side effects - Slash commands — explicit commands with arguments for scripting workflows
- Inline quick-action menu — one tap for the most common actions
- Screenshot & bilingual OCR — capture the screen, extract visible text in English or Arabic, or locate a specific phrase's pixel coordinates via Tesseract HOCR (auto-uses
ara+engwhen the Arabic language pack is installed) - Mouse & keyboard automation — click, double-click, right-click, drag, scroll, type, key shortcuts, clear field
- Accessibility-aware element control — click UI elements by exact label, fuzzy substring match, or OCR fallback for Electron apps
- App launcher — open GNOME/GTK/Snap applications by name (Linux) or
.appbundles (macOS) - File transfer (bidirectional) — send files from your desktop to Telegram; receive files from Telegram to disk
- Persistent scheduled tasks (cron-like) — set recurring screenshots, shell commands, or system info reports; supports daily, weekday, weekend, specific days (Mon/Wed/Fri), and multiple fire times per day; survives bot restarts via
~/.nullhand/schedule.json - Audit log — every action appended to
~/.nullhand/audit.log - OTP session lock — cryptographically random 6-digit code, auto-rotates every 2 minutes
- Multiple AI backends — Claude, OpenAI, Gemini, DeepSeek, Grok, Ollama, or offline local mode
- Interactive file browser — browse directories with inline keyboard navigation
You (Telegram)
│
▼
OTP Gate ──── locked? → ignore message
│
▼
Message Router (in order)
│
├── File received? ──────────────→ Destination picker → Save to disk
│
├── OCR trigger? ────────────────→ scrot → tesseract → reply text
│
├── Schedule command? ───────────→ Create/list/cancel task
│
├── File send request? ──────────→ Read file → zip if needed → send
│
├── Slash command? ──────────────→ Execute directly → reply
│
└── Everything else ─────────────→ AI Agent Loop
│
Take screenshot
│
Send to AI model
│
AI picks a tool
(click/type/shell/open)
│
Execute on desktop
│
Take new screenshot
│
Done? → reply result
Not done? → repeat
Every action is logged to ~/.nullhand/audit.log with timestamp and user ID.
- Linux with an X11 session (Wayland is not supported in this version)
- Log in with "Ubuntu on Xorg" (or equivalent) at the display manager
$DISPLAYmust be set;$WAYLAND_DISPLAYmust be unset- GNOME desktop recommended (some launcher entries are GNOME-specific)
Install all required tools in one command:
sudo apt install \
xdotool \
scrot \
wmctrl \
xclip \
imagemagick \
python3 \
x11-xserver-utils \
libgtk-3-bin \
python3-pyatspi \
at-spi2-coreFor OCR support (optional but recommended):
sudo apt install tesseract-ocr # English OCR
sudo apt install tesseract-ocr-ara # Arabic UI text (recommended for Arabic users)If both packages are present, Nullhand auto-uses ara+eng so click_text("إرسال") and wait_for_text("بحث") work alongside English. With only tesseract-ocr installed, OCR falls back to English-only and prints a one-line install hint at startup.
For voice-note transcription (optional but useful for mobile users):
# Linux: install ffmpeg + whisper.cpp manually
sudo apt install ffmpeg
git clone https://github.com/ggerganov/whisper.cpp && cd whisper.cpp && make
sudo cp build/bin/whisper-cli /usr/local/bin/
bash models/download-ggml-model.sh base # downloads ~150 MB model
# macOS: Homebrew has both
brew install ffmpeg whisper-cppVerify with /health after starting the bot — look for the "Voice transcription" line.
| Tool | Package | Purpose |
|---|---|---|
xdotool |
xdotool | Key presses, active window query |
scrot |
scrot | Screenshots |
wmctrl |
wmctrl | App listing and window focus |
xclip |
xclip | Clipboard read/write |
convert |
imagemagick | Screenshot resizing for HiDPI |
python3 |
python3 | Accessibility scripting |
xrandr |
x11-xserver-utils | Screen resolution detection |
gtk-launch |
libgtk-3-bin | App launcher via .desktop files |
python3-pyatspi |
python3-pyatspi | AT-SPI accessibility tree access |
at-spi2-core |
at-spi2-core | AT-SPI2 daemon |
tesseract |
tesseract-ocr | OCR — read text from screen |
| Arabic OCR pack | tesseract-ocr-ara | Recognise Arabic UI labels (auto-detected) |
go 1.21 or later
- Open Telegram and message
@BotFather - Send
/newbotand follow the prompts - Copy the bot token (format:
123456789:ABCdef...) - Get your Telegram user ID — message
@userinfobotto find it - Start a private chat with your new bot before running Nullhand
# 1. Clone the repository
git clone https://github.com/AzozzALFiras/Nullhand
cd Nullhand
# 2. Install system dependencies (see Requirements above)
sudo apt install xdotool scrot wmctrl xclip imagemagick python3 \
x11-xserver-utils libgtk-3-bin python3-pyatspi at-spi2-core
# 3. Build for Linux
GOOS=linux go build -o nullhand ./cmd/nullhand
# 4. Run
./nullhandOn first run, a setup wizard will prompt you for your Telegram bot token, your Telegram user ID, and your preferred AI provider. Configuration is saved to ~/.nullhand/config.json.
When Nullhand starts (and on every restart), it prints a one-time password to the terminal:
╔══════════════════════════════╗
║ OTP CODE: 482917 ║
║ Expires in 2 minutes ║
╚══════════════════════════════╝
Enter this code in Telegram to unlock the bot.
You must send this exact 6-digit code to the bot in Telegram before any command is accepted. The code:
- Is generated with
crypto/rand— cryptographically random - Expires after 2 minutes and is automatically replaced with a new one (printed to terminal again)
- Once entered correctly, the session stays unlocked until you restart or use the Lock Bot button in
/menu - Is stored in memory only — never written to disk
To re-lock the session manually, tap Lock Bot in /menu or press the menu:lock inline button.
Just send a message in plain English or Arabic. The local rule-based parser handles both languages without an API key.
Basics
take a screenshot
what's my CPU usage
open Firefox
type Hello World
click at 960 540
press ctrl+t
read the screen
run git status in terminal
send me /home/user/report.pdf
Browser navigation — opens the browser, waits for the window, clears the address bar, types the URL, and hits Enter
open firefox and go to github.com
افتح فايرفوكس وروح إلى github.com
type google.com in the address bar
اكتب google.com في شريط العنوان
search for "go programming"
ابحث عن golang
new tab / علامة تبويب جديدة
back / ارجع
refresh / تحديث
close tab / أغلق التبويب
WhatsApp messaging — opens WhatsApp, opens new-chat, types the contact name, OCR-clicks the matching contact in the autocomplete list, then types and sends the message
open whatsapp and send azozz a message hello
ارسل لعزوز في الواتساب: مرحبا
واتساب عزوز: مرحبا
افتح واتساب وأرسل لعزوز رسالة مرحبا
Settings (GNOME / Cinnamon / KDE) — opens Settings, focuses the integrated search bar, types the query
search settings for wifi
ابحث في الإعدادات عن WiFi
open WiFi settings
افتح إعدادات WiFi
Click any visible button — tries AT-SPI fuzzy match first, falls back to OCR-locate-and-click for Electron apps
click the Send button
press OK
اضغط زر إرسال
انقر على زر حفظ
click Send in WhatsApp
Schedule recurring tasks (persisted across restarts, cron-like)
schedule a screenshot every day at 9am
remind me to run sysinfo every day at 14:00
schedule a screenshot every weekday at 9am
schedule a screenshot every Monday at 8:30am
schedule a screenshot every Mon and Wed and Fri at 9am
schedule a screenshot every weekend at 10am
schedule a screenshot every day at 9am and 5pm ← multiple times
Voice notes — record a Telegram voice note in Arabic or English; Nullhand transcribes it with whisper.cpp and runs it as a normal command. The bot replies "🎙️ Heard: " then executes. Works with any of the natural-language patterns above.
Save your own recipes — the bot turns each step into a reusable recipe and writes it to ~/.nullhand/recipes.json
save this as recipe morning_routine: open Firefox, go to news.com, take a screenshot
احفظ هذا كروتين الصباح: افتح Firefox، روح إلى news.com، خذ لقطة شاشة
Run a saved recipe later:
recipe morning_routine
recipe الصباح
Conversation memory — follow-up commands fall back to recently-used entities
open Firefox
go to github.com ← uses Firefox automatically
search for golang ← still Firefox
ارسل لعزوز في الواتساب: مرحبا
ارسل: كيف الحال ← يفهم contact = عزوز
Preview / dry-run — see what a command will do before actually doing it
preview: open whatsapp and send azozz hi
dry-run: open firefox and go to github.com
معاينة: افتح فايرفوكس وروح إلى github.com
جرب: ابحث في الإعدادات عن WiFi
The bot replies with a numbered plan of every tool call (and recipe step) that would run, expanding run_recipe(...) calls so you see the full sequence. Nothing is executed.
| Command | Arguments | Description |
|---|---|---|
/start |
— | Welcome message and command list |
/help |
— | Show all available commands |
/screenshot |
— | Capture the full screen and send as photo |
/status |
— | CPU, memory, and active application info |
/apps |
— | List currently open windows |
/open |
<app name> |
Open an application by name |
/ls |
[path] |
List directory contents |
/read |
<path> |
Read a file and return its contents |
/shell |
<command> |
Run a whitelisted shell command |
/click |
<x> <y> |
Click at the given screen coordinates |
/type |
<text> |
Type text into the active window |
/key |
<shortcut> |
Press a key or modifier combination |
/paste |
— | Get current clipboard contents |
/stop |
— | Cancel the currently running AI task |
/diag |
— | Show diagnostic info (frontmost app, screen size) |
/inspect |
— | Dump accessibility tree of the frontmost window |
/ocr |
— | Extract visible text from the screen |
/schedule |
list | cancel <id> | clear |
Manage scheduled tasks |
/recipes |
— | <name> | show <name> | run <name> [k=v ...] | preview <name> [k=v ...] | delete <name> | rename <old> <new> |
Browse and manage built-in + user-saved recipes |
/health |
— | System health: OS, AI provider, OCR languages, permissions, scheduled tasks count, recipes count |
/menu |
— | Open the inline quick-action toolbar |
Keyboard shortcut examples for /key:
/key enter
/key ctrl+t
/key ctrl+shift+5
/key escape
/key f5
/key super
Modifier aliases: cmd and command map to ctrl; option maps to alt.
Send /menu to get the quick-action toolbar with inline keyboard buttons:
| Button | Action |
|---|---|
| 📸 Screenshot | Capture and send the current screen |
| 💻 System Info | Show CPU, memory, active app |
| 📋 Clipboard | Read and return clipboard contents |
| 🐚 Run Command | Prompt for a shell command, execute it |
| 📤 Send File | Prompt for a file path, upload to Telegram |
| 📥 Downloads | List ~/Downloads directory |
| 🔍 Read Screen | OCR — extract text from the current screen |
| 🔒 Lock Bot | Lock the session; new OTP printed to terminal |
| ❓ Help | Show natural language usage examples |
Recipes are pre-built multi-step workflows that the bot can run by name. Unlike a blind keystroke macro, every recipe step verifies state before the next step fires — windows must appear, fields must be empty before typing, and contact pickers are selected via OCR rather than guessed Enter presses.
A flow like "open WhatsApp, search for Azozz, send 'hi'" fails with naïve automation because:
- The new-chat search box appears asynchronously after Ctrl+N
- The autocomplete dropdown takes a variable amount of time to populate
- Pressing Return on a typed name often jumps to the wrong contact
- WhatsApp on Linux is Electron, so AT-SPI cannot see the contact list
Nullhand's recipe engine solves this by combining six step kinds:
| Step kind | What it does | Used to fix |
|---|---|---|
wait_for_window |
Polls every 200 ms until a window with a matching title is active | App-launch race conditions |
wait_for_text |
Polls OCR every 400 ms until the requested phrase is visible on screen | Slow-loading dialogs and dropdowns |
wait_for_element |
Polls AT-SPI every 250 ms for an element matching a label substring | Native GTK/Qt apps |
click_text |
Locates a text region via OCR HOCR and clicks its bounding-box center | Electron apps where AT-SPI is blind (WhatsApp, Slack, VS Code, Discord) |
click_fuzzy |
AT-SPI substring match; falls back to click_text automatically |
Buttons whose accessible name differs slightly from the visible label |
clear_field |
Ctrl+A then Delete |
Replacing existing text in an address bar or search box |
| Recipe | Parameters | What it does |
|---|---|---|
whatsapp_send_message |
contact, message |
Open WhatsApp → wait for window → Ctrl+N → wait for "Search" → type contact → wait for autocomplete → OCR-click matching row → type message → Enter |
whatsapp_new_message |
contact |
Same as above without sending — opens the chat ready for follow-up |
browser_open_url |
browser, url |
Open browser → wait for window → Ctrl+L → clear field → type URL → Enter |
browser_google_search |
browser, query |
Same flow but submits to Google |
browser_new_tab_and_search |
browser, query |
Ctrl+T → clear → query → Enter |
browser_click_link |
text |
OCR-click any visible link or button on the current page |
browser_back / browser_forward / browser_reload |
browser |
Standard navigation shortcuts |
settings_open |
— | Open the system Settings app and wait for it |
settings_search |
query |
Open Settings → Ctrl+F → clear → type query |
settings_open_panel |
panel |
Open Settings → fuzzy-click the named panel (WiFi, Bluetooth, Display, ...) |
click_button |
label |
Fuzzy-click a button in the frontmost app, OCR fallback included |
press_button_in_app |
app, label |
Open app, wait, then fuzzy-click the labelled button inside it |
The full list is available at runtime via the list_recipes tool or by reading internal/service/recipe/defaults.go. User-defined recipes can be added in ~/.nullhand/recipes.json to override or extend the defaults.
Most natural-language phrases route to a recipe automatically (see the examples above). To call one explicitly:
recipe whatsapp_send_message {"contact":"Azozz","message":"hi"}
recipe settings_search {"query":"WiFi"}
recipe click_button {"label":"إرسال"}
Or via the AI agent's tool call (when using a cloud provider):
run_recipe(name="browser_open_url", params_json='{"browser":"Firefox","url":"github.com"}')
You don't need to edit JSON by hand. Send a single message that names the recipe and lists its steps separated by commas (or ;, then, and, ثم, or newlines). Each step is a normal natural-language phrase that the bot already understands.
English
save this as recipe morning_routine: open Firefox, go to news.com, take a screenshot
remember as routine slack_focus: open Slack, click the Channels button
Arabic
احفظ هذا كروتين الصباح: افتح Firefox، روح إلى news.com، خذ لقطة شاشة
احفظ روتين العمل: افتح Slack، انقر على زر Channels
The bot replies with ✅ Saved recipe "morning_routine" (3 step(s)) and writes the recipe to ~/.nullhand/recipes.json. Names are normalised to snake_case automatically. Run them by name later:
recipe morning_routine
recipe الصباح
Supported step types (any phrase that maps to one of these tools is allowed): open_app, type_text, press_key, wait, click_text, click_ui_element_fuzzy, wait_for_text, wait_for_window, wait_for_element, clear_field, focus_via_palette, focus_text_field. Coordinate-based clicks and run_recipe (nesting) are intentionally rejected so saved recipes stay portable across screens.
Browse and curate the recipe library through /recipes:
/recipes # full list (built-in + your own)
/recipes show whatsapp_send_message # see steps of one recipe
/recipes run morning_routine # execute a recipe by name
/recipes run browser_open_url browser=Firefox url=github.com
/recipes preview morning_routine # dry-run; show steps without executing
/recipes delete old_routine # remove a user-saved recipe
/recipes rename old_name new_name # rename a user recipe
Built-in recipes are protected — you can't delete or rename them, but you can shadow any default by saving a new recipe with the same name (the user file overrides the default).
The local parser remembers the most recent browser, contact, URL, and search query per chat (10-minute window). Follow-up commands without an explicit subject fall back to the remembered entity:
open Firefox → opens Firefox
go to github.com → opens github.com in Firefox (not the default)
ابحث عن golang → searches in Firefox
ارسل لعزوز في الواتساب: hi → سياق contact=عزوز محفوظ
ارسل: كيف الحال → contact=عزوز ضمنياً
Memory is per-chat and not persisted across bot restarts.
Natural language:
send me /home/user/documents/report.pdf
Upload keyword with path:
upload /var/log/syslog
Slash command via menu button:
Tap Send File in /menu, then enter the path when prompted.
How it works:
- Files under 50 MB are sent directly
- Files over 50 MB and entire directories are automatically zipped before sending
- The file type determines the Telegram method: images use
sendPhoto, everything else usessendDocument - Temporary zip files are always cleaned up after sending
Simply send or forward any file (document, photo, video, audio) to the bot. You will be asked where to save it:
📥 Where should I save "report.pdf"?
[ 🏠 Home ] [ 🖥️ Desktop ]
[ 📥 Downloads ] [ ✏️ Custom path ]
Tap a button to save to that location, or tap Custom path and type a full directory path (e.g. /home/user/projects/).
If a file with the same name already exists, a timestamp is appended automatically (report_20260417_153012.pdf).
Nullhand can read text visible on screen using Tesseract OCR. Both English and Arabic UI text are supported when the matching language pack is installed.
Requires:
# Linux
sudo apt install tesseract-ocr # English (always required)
sudo apt install tesseract-ocr-ara # Arabic UI text (recommended)
# macOS
brew install tesseract # core
brew install tesseract-lang # all languages including ArabicTrigger via natural language:
read the screen
what does the screen say
read text on screen
ocr
extract text from screen
what's written on screen
اقرأ الشاشة / لقطة شاشة مع نص
Trigger via slash command:
/ocr
Trigger via menu button: tap Read Screen in /menu.
How it works:
- Full screenshot is captured via
scrot(Linux) orscreencapture(macOS) - Screenshot is written to a temp file
tesseract <file> stdout -l <langs>is executed —<langs>is auto-detected:ara+engif the Arabic pack is installed, otherwiseeng- Output is trimmed and truncated to 4096 characters (Telegram message limit)
- Temp file is deleted immediately after
The same auto-detected language list also drives click_text(...) and wait_for_text(...) — meaning Arabic-labelled buttons like إرسال can be located on screen without any configuration once the Arabic pack is installed.
If Tesseract is missing entirely, the bot responds with the install command rather than crashing. If only English is installed, you'll see a one-line hint at startup suggesting how to add Arabic.
Schedule recurring tasks using natural language or slash commands. Tasks are persisted to ~/.nullhand/schedule.json and automatically reloaded on bot restart.
Beyond the basic "every day at 9am" form, the parser understands:
| Pattern | Example | Fires |
|---|---|---|
| Daily (default) | every day at 9am |
Every day at 09:00 |
| Specific weekday | every Monday at 8am |
Mondays at 08:00 |
| Multiple weekdays | every Mon and Wed and Fri at 9am |
M/W/F at 09:00 |
| Weekday group | every weekday at 9am |
Mon-Fri at 09:00 |
| Weekend group | every weekend at 10am |
Sat+Sun at 10:00 |
| Multiple times | every day at 9am and 5pm |
Twice daily |
| Combined | every weekday at 9am and 1pm and 5pm |
Mon-Fri × 3 times |
| Arabic weekdays | every الإثنين at 9am |
Same as every Monday at 9am |
The bot detects schedule intent when your message contains phrases like "every", "schedule", or "remind me to" and at least one time token.
schedule a screenshot every day at 9am
remind me to run sysinfo every day at 8:30am
run git status every day at 14:00
send me /home/user/backup.tar.gz every day at 2am
read screen every day at 9pm
Supported time formats: 8am, 8:30am, 14:00, 9pm
Supported actions:
| Phrase contains | Scheduled action |
|---|---|
screenshot |
Capture screen, send as photo |
sysinfo, cpu, status, system info |
Send system status report |
read screen, ocr |
Run OCR and send text |
run <cmd> or shell <cmd> |
Run shell command, send output |
send + a /path |
Send file to Telegram |
/schedule list
/schedule cancel task_001
/schedule clear
Example output of /schedule list:
📋 Active scheduled tasks:
🆔 task_001 — screenshot — every day at 09:00
🆔 task_002 — sysinfo — every day at 14:00
Use /schedule cancel <id> to remove a task.
Implementation detail: the scheduler aligns to the next whole minute on start, then checks every minute. Panics in task callbacks are recovered and logged.
Every action is appended to ~/.nullhand/audit.log.
Log format:
[2026-04-17 09:31:05] user=123456789 action=screenshot
[2026-04-17 09:32:11] user=123456789 action=shell cmd="git status"
[2026-04-17 09:33:00] user=123456789 action=file_send path="/home/user/report.pdf"
[2026-04-17 09:34:45] user=123456789 action=otp_unlock
[2026-04-17 09:35:00] user=123456789 action=schedule_create id="task_001"
[2026-04-17 09:40:00] user=123456789 action=scheduled_task id="task_001"
[2026-04-17 09:41:10] user=123456789 action=natural_language input="open Firefox and go to..."
Actions logged:
| Action | Triggered by |
|---|---|
otp_unlock |
Successful OTP entry |
otp_lock |
Lock Bot button |
screenshot |
/screenshot, menu button, or AI tool |
shell |
/shell, menu button, or AI tool |
app_open |
/open command |
clipboard |
/paste, menu button |
sysinfo |
/status, menu button |
ocr |
/ocr, natural language, menu button |
file_send |
File send trigger |
file_receive |
File received from Telegram |
downloads |
Downloads menu button |
natural_language |
Free-form AI task (first 80 chars logged) |
recipe_save |
User-authored recipe saved to ~/.nullhand/recipes.json |
recipe_save_failed |
Recipe parsed but disk write failed |
recipe_run |
/recipes run … invocation |
recipe_preview |
/recipes preview … dry-run |
recipe_delete / recipe_rename |
`/recipes delete |
voice_received |
Voice note arrived (duration + size logged) |
voice_transcribed |
Whisper produced a transcript (first 80 chars logged) |
health |
/health invocation |
preview |
"preview: …" / "dry-run: …" inline preview |
schedule_create |
New scheduled task |
schedule_cancel |
Task cancelled |
scheduled_task |
Scheduled task fired |
The log directory (~/.nullhand/) is created with mode 0700. The log file has mode 0600. Logging failures are silently swallowed so a disk error never crashes the bot.
Read the log:
cat ~/.nullhand/audit.logTail it live:
tail -f ~/.nullhand/audit.logRecord a voice note inside Telegram in any language and Nullhand will transcribe it via whisper.cpp, then run the resulting text through the normal command pipeline. Useful when typing on a phone is awkward — especially for Arabic.
Requires:
# Linux
sudo apt install ffmpeg
# whisper.cpp: build from https://github.com/ggerganov/whisper.cpp
# and put the resulting `whisper-cli` binary on PATH
# Download a model file too (~150 MB for ggml-base, supports Arabic):
# bash whisper.cpp/models/download-ggml-model.sh base
# Then point whisper-cli to it via -m or the WHISPER_MODEL env var.
# macOS
brew install ffmpeg
brew install whisper-cpp
# Models go under /opt/homebrew/share/whisper-cpp/ or similar; whisper-cli
# auto-discovers if WHISPER_MODEL is unset.Pipeline:
- Bot sees
voicefield on the Telegram update - Downloads the .ogg via
getFile/download - ffmpeg converts to 16 kHz mono WAV
whisper-cli <wav> -otxt -nt -l arproduces a.txttranscript- Bot replies "🎙️ Heard: " then re-routes the transcript through the normal handler
The default language hint is Arabic because whisper.cpp's Arabic model handles English code-switching gracefully (the reverse is not true). Send preview: … first if you want to check the transcription before it executes.
If whisper or ffmpeg is missing, the bot replies with a clear error and the install command — it never crashes silently. Verify install with /health (look for the "Voice transcription" line).
/health returns a single-message snapshot of the bot's runtime state — useful for triaging "why doesn't X work?" questions without leaving Telegram.
Sample output:
🩺 Nullhand health report
Platform: linux/amd64
AI provider: local
OCR languages: ara+eng
Voice transcription: ✅ whisper-cli + ffmpeg
Screen Recording: ✅ ok
Accessibility: ✅ ok
Scheduled tasks (3):
• task_001 — screenshot — Every weekday at 09:00
• task_002 — sysinfo — Every day at 09:00 and 17:00
• task_003 — backup — Every Saturday at 02:00
Recipes: 27 total (24 built-in, 3 user-defined)
Allowed Telegram user: 123456789
Session unlocked: true
The OCR languages line reflects what tesseract --list-langs returned at startup. If it shows eng only, install the Arabic pack to enable bilingual screen reading.
Single-user only. The bot accepts messages from exactly one Telegram user ID (set during first-run setup). Messages from any other account are silently dropped.
OTP session gate. Before any command is processed, the session must be unlocked with the current OTP code. The code is:
- Generated with Go's
crypto/rand - A 6-digit number in the range 100000–999999
- Stored in memory only, never written to disk or logged
- Automatically replaced every 2 minutes (new code printed to terminal)
- Invalidated on successful entry (cannot be reused within the same session)
X11-only. The startup check rejects runs under Wayland ($WAYLAND_DISPLAY set) and headless SSH sessions ($DISPLAY unset).
Capability checks. Before starting, Nullhand verifies that scrot can actually take a screenshot and that xdotool can query the active window. If either check fails, the process exits with a clear message.
No inbound network ports. Nullhand uses Telegram long-polling outbound only — there is no listening server or open port.
Configure the provider during first-run setup or edit ~/.nullhand/config.json.
| Provider | ai_provider value |
Requires API key | Vision | Notes |
|---|---|---|---|---|
| Anthropic Claude | claude |
Yes | Yes | Set ai_api_key |
| OpenAI | openai |
Yes | Yes | Set ai_api_key; optional ai_base_url for proxies |
| Google Gemini | gemini |
Yes | Yes | Set ai_api_key |
| DeepSeek | deepseek |
Yes | No | Set ai_api_key |
| Grok (xAI) | grok |
Yes | No | Set ai_api_key |
| Ollama (local LLM) | ollama |
No | Model-dependent | Set ai_base_url and ai_model; use a vision model for screenshot analysis |
| Built-in rule-based | local |
No | No | Zero cost, zero external dependency. Bilingual (English + Arabic). Routes to smart recipes for messaging, browser, settings, and button clicks |
Privacy note: Cloud providers (Claude, OpenAI, Gemini, DeepSeek, Grok) receive your commands and screenshots when the AI agent calls
analyze_screenshot. If privacy matters, use Ollama orlocal.
| Local AI (Ollama) | Cloud AI (Claude, GPT, etc.) | |
|---|---|---|
| Privacy | 100% local | Data sent to provider servers |
| Cost | Free | Requires paid API key |
| Vision | Supported (vision models) | Supported |
| Internet | Only for Telegram | Required for AI + Telegram |
The local provider requires no API key, no network, and no external process. Use it to get started immediately or in air-gapped environments.
{
"ai_provider": "local"
}What local understands out of the box:
- All basic primitives: open/close apps, click coordinates, type, press key, screenshot, paste, run shell, list/read files, scroll, wait
- WhatsApp / Slack / Discord / Messages send-to-contact flows (calls into smart recipes that wait for windows and OCR-click contact rows)
- Browser navigation: open URL, search, address-bar typing, back/forward/refresh, new/close tab
- System Settings: search inside settings, open named panel (WiFi, Bluetooth, Display, ...)
- Button click: "click the X button" / "اضغط زر X" — uses fuzzy AT-SPI match with OCR fallback
- Terminal commands, file browsing, git operations, VS Code/Cursor command-palette flows
Both English and Arabic phrasings are supported for every flow. See the Natural Language Examples section above for representative phrases.
Smart-pattern matching is priority-ordered: highly specific patterns (settings search, button click, app-specific messaging) are tried before generic ones (bare "search X" → Google) to avoid misclassification.
The local parser does not support vision (screenshot analysis by an LLM) or open-ended multi-step planning — for those, use Claude/OpenAI/Gemini/Ollama.
Ollama runs open-source LLMs locally. For full screenshot analysis support, use a vision model.
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull a vision model (recommended — supports analyze_screenshot tool)
ollama pull qwen3-vl:8b # 6.1 GB download, needs ~8 GB RAM
# Or a smaller vision model if RAM is limited
ollama pull qwen3-vl:2b # 1.9 GB download, needs ~3-4 GB RAM
# 3. Start Ollama (if not already running as a service)
ollama serveRAM requirements:
| Model | Download size | RAM needed | Quality |
|---|---|---|---|
qwen3-vl:2b |
1.9 GB | ~3–4 GB | Good |
qwen3-vl:8b |
6.1 GB | ~8 GB | Excellent |
Configure Nullhand to use Ollama:
{
"ai_provider": "ollama",
"ai_model": "qwen3-vl:8b",
"ai_base_url": "http://localhost:11434"
}If you don't need screenshot analysis and want a lighter model:
ollama pull llama3{
"ai_provider": "ollama",
"ai_model": "llama3",
"ai_base_url": "http://localhost:11434"
}Symptom: dial tcp: lookup api.telegram.org: server misbehaving or connection timeout errors in terminal.
Fix: Your DNS may not be resolving correctly. Run:
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.confThen restart the bot.
If on a VM, also try:
sudo systemctl restart NetworkManagerSymptom: /screenshot returns nothing, or scrot produces an empty file.
Fix 1 — Set DISPLAY variable:
export DISPLAY=:0Add this to your ~/.bashrc to make it permanent:
echo 'export DISPLAY=:0' >> ~/.bashrc && source ~/.bashrcFix 2 — Allow local X11 connections:
xhost +local:Run this after every login, or add it to your startup applications.
Fix 3 — Verify scrot works manually:
DISPLAY=:0 scrot /tmp/test.png && echo "works" && ls -la /tmp/test.pngIf the file is 0 bytes, your X11 session may not be properly initialized.
Symptom: $DISPLAY not set error on startup, or xdotool/scrot failing completely.
Cause: Nullhand requires an X11 session. Wayland is not supported in v1.
Fix: At the login screen, click the gear icon ⚙️ and select "Ubuntu on Xorg" (or your distro's equivalent X11 session) before logging in.
To verify you're on X11:
echo $XDG_SESSION_TYPEShould output x11. If it outputs wayland, log out and select Xorg session.
Symptom: /click, /type, /key return ✓ but nothing happens on screen.
Fix: Ensure DISPLAY is set and xdotool can reach the display:
export DISPLAY=:0
xdotool getactivewindowIf getactivewindow returns a window ID, xdotool is working correctly.
If it errors, your X11 session needs the local connection fix:
xhost +local:Symptom: /ocr returns no text or random characters.
Cause: Tesseract may not be installed, or the screen content is purely graphical.
Fix:
sudo apt install tesseract-ocr
tesseract --versionNote: OCR works best on text-heavy screens. Purely graphical content (icons, images) will return little or no text — this is expected behavior.
Symptom: /paste returns empty or fails silently.
Fix: Ensure xclip is installed and DISPLAY is set:
sudo apt install xclip
export DISPLAY=:0
xclip -selection clipboard -oIf xclip errors with "Can't open display", run xhost +local: first.
Symptom: Natural language commands return AI call failed: empty choices or similar.
Cause: Your AI provider's API is unavailable or the API key has no credits.
Fix options:
- Switch to the built-in local provider (no API key needed):
Edit
~/.nullhand/config.jsonand set"ai_provider": "local" - Check your API key has credits at your provider's dashboard
- Try a different AI provider
Note: The local provider handles simple commands (open app, screenshot, status)
but does not support vision or complex multi-step tasks.
Symptom: Bot starts but no OTP box appears, or bot exits immediately.
Fix: Check the terminal output for error messages. Common causes:
- Missing dependencies → run the full
apt installcommand from the Requirements section - Wrong display session → ensure you're on X11 not Wayland
- Config file corrupted → delete
~/.nullhand/config.jsonand run setup again:
rm ~/.nullhand/config.json && ./nullhandClipboard sharing between host and VM:
sudo apt install virtualbox-guest-x11
sudo rebootThen in VirtualBox menu: Devices → Shared Clipboard → Bidirectional
No internet in VM:
- In VirtualBox Settings → Network → change to Bridged Adapter
- Select your active network adapter (WiFi or Ethernet) from the Name dropdown
- Start VM and run:
sudo systemctl restart NetworkManager
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
ping google.comSlow performance: Allocate more resources in VirtualBox Settings:
- RAM: 4096 MB recommended
- CPUs: 2 minimum
- Video Memory: 128 MB (Display settings)
Symptom: Scheduled tasks were set but never executed at the expected time.
Diagnosis:
- Confirm the tasks are loaded —
/schedule listshould show them. - Verify the bot is actually running at the scheduled minute (it polls every 60s).
- Check
~/.nullhand/schedule.jsonexists and contains the expected entries. - Tail
~/.nullhand/audit.logand look forscheduled_taskentries firing at the right hour:minute.
Persistence note: tasks are saved to ~/.nullhand/schedule.json whenever you add, cancel, or clear them, and reloaded automatically at startup. If the file is corrupted or unreadable, the bot logs a warning and starts with an empty schedule rather than failing to boot.
Run this to verify all required tools are installed:
which git go xdotool scrot wmctrl xclip convert tesseract && echo "✅ All dependencies found"If any are missing:
sudo apt install -y git golang xdotool scrot wmctrl xclip imagemagick python3-pyatspi at-spi2-core desktop-file-utils tesseract-ocrThis is a Linux port of the original Nullhand by AzozzALFiras. Original repo: https://github.com/AzozzALFiras/Nullhand To contribute to this Linux port, fork https://github.com/AzozzALFiras/Nullhand and open a pull request.
See LICENSE in the repository root.