Nullhand Linux

Control your Linux desktop from Telegram. Send a message, get a screenshot back. Type text, click coordinates, open apps, transfer files, schedule tasks — all through your phone.

What is this?

Nullhand is a Telegram bot that runs as a background process on your Linux machine and gives you full desktop control over chat. You send natural language or slash commands; the bot acts on your screen in real time.

It ships with an OTP session gate, a structured audit log, a built-in scheduler, bidirectional file transfer, and a pluggable AI backend (local rule-based parser included — no API key required to get started).

🚀 Looking for a Server Edition? Meet AXGhost For Your Servers

While Nullhand is your ultimate command center for desktop, we've built a specialized, enterprise-grade tool specifically for servers.

AXGhost is a zero-cloud, zero-AI Go daemon that transforms Telegram into a highly secure, natural-language administration console for your server infrastructure.

✨ Why AXGhost?

🌍 Bilingual Brilliance: Seamlessly execute commands using natural language in both Arabic and English.
🛡️ Ironclad Security: Strict whitelisted execution ensures only explicitly approved, safe operations can run.
🔐 100% Private (Zero-Cloud / Zero-AI): A completely self-contained daemon. No third-party APIs, no AI token limits, and absolutely no data leakage.
📜 Full Audit Trail: Comprehensive logging provides total visibility over every system action you take.

👉 Discover AXGhost on the AevonX Marketplace 💻 View the Open Source Repository on GitHub

Features

Natural language control — "open Firefox", "take a screenshot", "type Hello World"
Bilingual parser (English + Arabic) — "افتح فايرفوكس وروح إلى github.com", "ابحث في الإعدادات عن WiFi", "اضغط زر إرسال"
Smart recipes with state verification — recipes wait for windows to appear, OCR-click search results, and clear fields before typing instead of relying on fixed sleep timers
Multi-step app workflows out of the box — WhatsApp send (with contact-picker selection via OCR), browser URL navigation, Settings search, generic button click — all callable from one phrase
Author your own recipes from Telegram — "save this as recipe morning_routine: open Firefox, go to news.com" / "احفظ هذا كروتين الصباح: …" — parsed and persisted to ~/.nullhand/recipes.json
Voice notes → text — record a Telegram voice note in any language; Nullhand transcribes it via whisper.cpp (Arabic + English bilingual support) and runs it as if you typed it
Conversation memory — follow-up commands fall back to recently-used entities ("open Firefox" then "go to github.com" remembers Firefox; "send hi" remembers the last contact)
Preview / dry-run mode — see exactly what a command will do before running it: preview: … / معاينة: … returns a numbered execution plan with no side effects
Slash commands — explicit commands with arguments for scripting workflows
Inline quick-action menu — one tap for the most common actions
Screenshot & bilingual OCR — capture the screen, extract visible text in English or Arabic, or locate a specific phrase's pixel coordinates via Tesseract HOCR (auto-uses ara+eng when the Arabic language pack is installed)
Mouse & keyboard automation — click, double-click, right-click, drag, scroll, type, key shortcuts, clear field
Accessibility-aware element control — click UI elements by exact label, fuzzy substring match, or OCR fallback for Electron apps
App launcher — open GNOME/GTK/Snap applications by name (Linux) or .app bundles (macOS)
File transfer (bidirectional) — send files from your desktop to Telegram; receive files from Telegram to disk
Persistent scheduled tasks (cron-like) — set recurring screenshots, shell commands, or system info reports; supports daily, weekday, weekend, specific days (Mon/Wed/Fri), and multiple fire times per day; survives bot restarts via ~/.nullhand/schedule.json
Audit log — every action appended to ~/.nullhand/audit.log
OTP session lock — cryptographically random 6-digit code, auto-rotates every 2 minutes
Multiple AI backends — Claude, OpenAI, Gemini, DeepSeek, Grok, Ollama, or offline local mode
Interactive file browser — browse directories with inline keyboard navigation

How it works

You (Telegram)
│
▼
OTP Gate ──── locked? → ignore message
│
▼
Message Router (in order)
│
├── File received? ──────────────→ Destination picker → Save to disk
│
├── OCR trigger? ────────────────→ scrot → tesseract → reply text
│
├── Schedule command? ───────────→ Create/list/cancel task
│
├── File send request? ──────────→ Read file → zip if needed → send
│
├── Slash command? ──────────────→ Execute directly → reply
│
└── Everything else ─────────────→ AI Agent Loop
                                        │
                                   Take screenshot
                                        │
                                   Send to AI model
                                        │
                                   AI picks a tool
                                   (click/type/shell/open)
                                        │
                                   Execute on desktop
                                        │
                                   Take new screenshot
                                        │
                                   Done? → reply result
                                   Not done? → repeat

Every action is logged to ~/.nullhand/audit.log with timestamp and user ID.

Requirements

System

Linux with an X11 session (Wayland is not supported in this version)
Log in with "Ubuntu on Xorg" (or equivalent) at the display manager
$DISPLAY must be set; $WAYLAND_DISPLAY must be unset
GNOME desktop recommended (some launcher entries are GNOME-specific)

Dependencies

Install all required tools in one command:

sudo apt install \
  xdotool \
  scrot \
  wmctrl \
  xclip \
  imagemagick \
  python3 \
  x11-xserver-utils \
  libgtk-3-bin \
  python3-pyatspi \
  at-spi2-core

For OCR support (optional but recommended):

sudo apt install tesseract-ocr            # English OCR
sudo apt install tesseract-ocr-ara        # Arabic UI text (recommended for Arabic users)

If both packages are present, Nullhand auto-uses ara+eng so click_text("إرسال") and wait_for_text("بحث") work alongside English. With only tesseract-ocr installed, OCR falls back to English-only and prints a one-line install hint at startup.

For voice-note transcription (optional but useful for mobile users):

# Linux: install ffmpeg + whisper.cpp manually
sudo apt install ffmpeg
git clone https://github.com/ggerganov/whisper.cpp && cd whisper.cpp && make
sudo cp build/bin/whisper-cli /usr/local/bin/
bash models/download-ggml-model.sh base    # downloads ~150 MB model

# macOS: Homebrew has both
brew install ffmpeg whisper-cpp

Verify with /health after starting the bot — look for the "Voice transcription" line.

Tool	Package	Purpose
`xdotool`	xdotool	Key presses, active window query
`scrot`	scrot	Screenshots
`wmctrl`	wmctrl	App listing and window focus
`xclip`	xclip	Clipboard read/write
`convert`	imagemagick	Screenshot resizing for HiDPI
`python3`	python3	Accessibility scripting
`xrandr`	x11-xserver-utils	Screen resolution detection
`gtk-launch`	libgtk-3-bin	App launcher via .desktop files
`python3-pyatspi`	python3-pyatspi	AT-SPI accessibility tree access
`at-spi2-core`	at-spi2-core	AT-SPI2 daemon
`tesseract`	tesseract-ocr	OCR — read text from screen
Arabic OCR pack	tesseract-ocr-ara	Recognise Arabic UI labels (auto-detected)

Go version

go 1.21 or later

Telegram setup

Open Telegram and message @BotFather
Send /newbot and follow the prompts
Copy the bot token (format: 123456789:ABCdef...)
Get your Telegram user ID — message @userinfobot to find it
Start a private chat with your new bot before running Nullhand

Installation

# 1. Clone the repository
git clone https://github.com/AzozzALFiras/Nullhand
cd Nullhand

# 2. Install system dependencies (see Requirements above)
sudo apt install xdotool scrot wmctrl xclip imagemagick python3 \
  x11-xserver-utils libgtk-3-bin python3-pyatspi at-spi2-core

# 3. Build for Linux
GOOS=linux go build -o nullhand ./cmd/nullhand

# 4. Run
./nullhand

On first run, a setup wizard will prompt you for your Telegram bot token, your Telegram user ID, and your preferred AI provider. Configuration is saved to ~/.nullhand/config.json.

First Run & OTP

When Nullhand starts (and on every restart), it prints a one-time password to the terminal:

╔══════════════════════════════╗
║  OTP CODE: 482917          ║
║  Expires in 2 minutes        ║
╚══════════════════════════════╝

Enter this code in Telegram to unlock the bot.

You must send this exact 6-digit code to the bot in Telegram before any command is accepted. The code:

Is generated with crypto/rand — cryptographically random
Expires after 2 minutes and is automatically replaced with a new one (printed to terminal again)
Once entered correctly, the session stays unlocked until you restart or use the Lock Bot button in /menu
Is stored in memory only — never written to disk

To re-lock the session manually, tap Lock Bot in /menu or press the menu:lock inline button.

Commands & Usage

Natural Language Examples

Just send a message in plain English or Arabic. The local rule-based parser handles both languages without an API key.

Basics

take a screenshot
what's my CPU usage
open Firefox
type Hello World
click at 960 540
press ctrl+t
read the screen
run git status in terminal
send me /home/user/report.pdf

Browser navigation — opens the browser, waits for the window, clears the address bar, types the URL, and hits Enter

open firefox and go to github.com
افتح فايرفوكس وروح إلى github.com
type google.com in the address bar
اكتب google.com في شريط العنوان
search for "go programming"
ابحث عن golang
new tab / علامة تبويب جديدة
back / ارجع
refresh / تحديث
close tab / أغلق التبويب

WhatsApp messaging — opens WhatsApp, opens new-chat, types the contact name, OCR-clicks the matching contact in the autocomplete list, then types and sends the message

open whatsapp and send azozz a message hello
ارسل لعزوز في الواتساب: مرحبا
واتساب عزوز: مرحبا
افتح واتساب وأرسل لعزوز رسالة مرحبا

Settings (GNOME / Cinnamon / KDE) — opens Settings, focuses the integrated search bar, types the query

search settings for wifi
ابحث في الإعدادات عن WiFi
open WiFi settings
افتح إعدادات WiFi

Click any visible button — tries AT-SPI fuzzy match first, falls back to OCR-locate-and-click for Electron apps

click the Send button
press OK
اضغط زر إرسال
انقر على زر حفظ
click Send in WhatsApp

Schedule recurring tasks (persisted across restarts, cron-like)

schedule a screenshot every day at 9am
remind me to run sysinfo every day at 14:00
schedule a screenshot every weekday at 9am
schedule a screenshot every Monday at 8:30am
schedule a screenshot every Mon and Wed and Fri at 9am
schedule a screenshot every weekend at 10am
schedule a screenshot every day at 9am and 5pm    ← multiple times

Voice notes — record a Telegram voice note in Arabic or English; Nullhand transcribes it with whisper.cpp and runs it as a normal command. The bot replies "🎙️ Heard: " then executes. Works with any of the natural-language patterns above.

Save your own recipes — the bot turns each step into a reusable recipe and writes it to ~/.nullhand/recipes.json

save this as recipe morning_routine: open Firefox, go to news.com, take a screenshot
احفظ هذا كروتين الصباح: افتح Firefox، روح إلى news.com، خذ لقطة شاشة

Run a saved recipe later:

recipe morning_routine
recipe الصباح

Conversation memory — follow-up commands fall back to recently-used entities

open Firefox
go to github.com           ← uses Firefox automatically
search for golang          ← still Firefox

ارسل لعزوز في الواتساب: مرحبا
ارسل: كيف الحال           ← يفهم contact = عزوز

Preview / dry-run — see what a command will do before actually doing it

preview: open whatsapp and send azozz hi
dry-run: open firefox and go to github.com
معاينة: افتح فايرفوكس وروح إلى github.com
جرب: ابحث في الإعدادات عن WiFi

The bot replies with a numbered plan of every tool call (and recipe step) that would run, expanding run_recipe(...) calls so you see the full sequence. Nothing is executed.

Slash Commands (table)

Command	Arguments	Description
`/start`	—	Welcome message and command list
`/help`	—	Show all available commands
`/screenshot`	—	Capture the full screen and send as photo
`/status`	—	CPU, memory, and active application info
`/apps`	—	List currently open windows
`/open`	`<app name>`	Open an application by name
`/ls`	`[path]`	List directory contents
`/read`	`<path>`	Read a file and return its contents
`/shell`	`<command>`	Run a whitelisted shell command
`/click`	`<x> <y>`	Click at the given screen coordinates
`/type`	`<text>`	Type text into the active window
`/key`	`<shortcut>`	Press a key or modifier combination
`/paste`	—	Get current clipboard contents
`/stop`	—	Cancel the currently running AI task
`/diag`	—	Show diagnostic info (frontmost app, screen size)
`/inspect`	—	Dump accessibility tree of the frontmost window
`/ocr`	—	Extract visible text from the screen
`/schedule`	`list` \| `cancel <id>` \| `clear`	Manage scheduled tasks
`/recipes`	— \| `<name>` \| `show <name>` \| `run <name> [k=v ...]` \| `preview <name> [k=v ...]` \| `delete <name>` \| `rename <old> <new>`	Browse and manage built-in + user-saved recipes
`/health`	—	System health: OS, AI provider, OCR languages, permissions, scheduled tasks count, recipes count
`/menu`	—	Open the inline quick-action toolbar

Keyboard shortcut examples for /key:

/key enter
/key ctrl+t
/key ctrl+shift+5
/key escape
/key f5
/key super

Modifier aliases: cmd and command map to ctrl; option maps to alt.

Inline Menu (/menu)

Send /menu to get the quick-action toolbar with inline keyboard buttons:

Button	Action
📸 Screenshot	Capture and send the current screen
💻 System Info	Show CPU, memory, active app
📋 Clipboard	Read and return clipboard contents
🐚 Run Command	Prompt for a shell command, execute it
📤 Send File	Prompt for a file path, upload to Telegram
📥 Downloads	List `~/Downloads` directory
🔍 Read Screen	OCR — extract text from the current screen
🔒 Lock Bot	Lock the session; new OTP printed to terminal
❓ Help	Show natural language usage examples

Smart Recipes

Recipes are pre-built multi-step workflows that the bot can run by name. Unlike a blind keystroke macro, every recipe step verifies state before the next step fires — windows must appear, fields must be empty before typing, and contact pickers are selected via OCR rather than guessed Enter presses.

Why recipes (and not just raw click/type)?

A flow like "open WhatsApp, search for Azozz, send 'hi'" fails with naïve automation because:

The new-chat search box appears asynchronously after Ctrl+N
The autocomplete dropdown takes a variable amount of time to populate
Pressing Return on a typed name often jumps to the wrong contact
WhatsApp on Linux is Electron, so AT-SPI cannot see the contact list

Nullhand's recipe engine solves this by combining six step kinds:

Step kind	What it does	Used to fix
`wait_for_window`	Polls every 200 ms until a window with a matching title is active	App-launch race conditions
`wait_for_text`	Polls OCR every 400 ms until the requested phrase is visible on screen	Slow-loading dialogs and dropdowns
`wait_for_element`	Polls AT-SPI every 250 ms for an element matching a label substring	Native GTK/Qt apps
`click_text`	Locates a text region via OCR HOCR and clicks its bounding-box center	Electron apps where AT-SPI is blind (WhatsApp, Slack, VS Code, Discord)
`click_fuzzy`	AT-SPI substring match; falls back to `click_text` automatically	Buttons whose accessible name differs slightly from the visible label
`clear_field`	`Ctrl+A` then `Delete`	Replacing existing text in an address bar or search box

Built-in recipes (selected)

Recipe	Parameters	What it does
`whatsapp_send_message`	`contact`, `message`	Open WhatsApp → wait for window → Ctrl+N → wait for "Search" → type contact → wait for autocomplete → OCR-click matching row → type message → Enter
`whatsapp_new_message`	`contact`	Same as above without sending — opens the chat ready for follow-up
`browser_open_url`	`browser`, `url`	Open browser → wait for window → Ctrl+L → clear field → type URL → Enter
`browser_google_search`	`browser`, `query`	Same flow but submits to Google
`browser_new_tab_and_search`	`browser`, `query`	Ctrl+T → clear → query → Enter
`browser_click_link`	`text`	OCR-click any visible link or button on the current page
`browser_back` / `browser_forward` / `browser_reload`	`browser`	Standard navigation shortcuts
`settings_open`	—	Open the system Settings app and wait for it
`settings_search`	`query`	Open Settings → Ctrl+F → clear → type query
`settings_open_panel`	`panel`	Open Settings → fuzzy-click the named panel (WiFi, Bluetooth, Display, ...)
`click_button`	`label`	Fuzzy-click a button in the frontmost app, OCR fallback included
`press_button_in_app`	`app`, `label`	Open `app`, wait, then fuzzy-click the labelled button inside it

The full list is available at runtime via the list_recipes tool or by reading internal/service/recipe/defaults.go. User-defined recipes can be added in ~/.nullhand/recipes.json to override or extend the defaults.

Calling a recipe

Most natural-language phrases route to a recipe automatically (see the examples above). To call one explicitly:

recipe whatsapp_send_message {"contact":"Azozz","message":"hi"}
recipe settings_search {"query":"WiFi"}
recipe click_button {"label":"إرسال"}

Or via the AI agent's tool call (when using a cloud provider):

run_recipe(name="browser_open_url", params_json='{"browser":"Firefox","url":"github.com"}')

Authoring your own recipes from Telegram

You don't need to edit JSON by hand. Send a single message that names the recipe and lists its steps separated by commas (or ;, then, and, ثم, or newlines). Each step is a normal natural-language phrase that the bot already understands.

English

save this as recipe morning_routine: open Firefox, go to news.com, take a screenshot
remember as routine slack_focus: open Slack, click the Channels button

Arabic

احفظ هذا كروتين الصباح: افتح Firefox، روح إلى news.com، خذ لقطة شاشة
احفظ روتين العمل: افتح Slack، انقر على زر Channels

The bot replies with ✅ Saved recipe "morning_routine" (3 step(s)) and writes the recipe to ~/.nullhand/recipes.json. Names are normalised to snake_case automatically. Run them by name later:

recipe morning_routine
recipe الصباح

Supported step types (any phrase that maps to one of these tools is allowed): open_app, type_text, press_key, wait, click_text, click_ui_element_fuzzy, wait_for_text, wait_for_window, wait_for_element, clear_field, focus_via_palette, focus_text_field. Coordinate-based clicks and run_recipe (nesting) are intentionally rejected so saved recipes stay portable across screens.

Managing recipes from Telegram

Browse and curate the recipe library through /recipes:

/recipes                               # full list (built-in + your own)
/recipes show whatsapp_send_message    # see steps of one recipe
/recipes run morning_routine           # execute a recipe by name
/recipes run browser_open_url browser=Firefox url=github.com
/recipes preview morning_routine       # dry-run; show steps without executing
/recipes delete old_routine            # remove a user-saved recipe
/recipes rename old_name new_name      # rename a user recipe

Built-in recipes are protected — you can't delete or rename them, but you can shadow any default by saving a new recipe with the same name (the user file overrides the default).

Conversation memory

The local parser remembers the most recent browser, contact, URL, and search query per chat (10-minute window). Follow-up commands without an explicit subject fall back to the remembered entity:

open Firefox                  → opens Firefox
go to github.com              → opens github.com in Firefox (not the default)
ابحث عن golang               → searches in Firefox

ارسل لعزوز في الواتساب: hi   → سياق contact=عزوز محفوظ
ارسل: كيف الحال              → contact=عزوز ضمنياً

Memory is per-chat and not persisted across bot restarts.

File Transfer

Sending a file from Linux to Telegram

Natural language:

send me /home/user/documents/report.pdf

Upload keyword with path:

upload /var/log/syslog

Slash command via menu button: Tap Send File in /menu, then enter the path when prompted.

How it works:

Files under 50 MB are sent directly
Files over 50 MB and entire directories are automatically zipped before sending
The file type determines the Telegram method: images use sendPhoto, everything else uses sendDocument
Temporary zip files are always cleaned up after sending

Receiving a file from Telegram

Simply send or forward any file (document, photo, video, audio) to the bot. You will be asked where to save it:

📥 Where should I save "report.pdf"?
[ 🏠 Home ]  [ 🖥️ Desktop ]
[ 📥 Downloads ]  [ ✏️ Custom path ]

Tap a button to save to that location, or tap Custom path and type a full directory path (e.g. /home/user/projects/).

If a file with the same name already exists, a timestamp is appended automatically (report_20260417_153012.pdf).

OCR

Nullhand can read text visible on screen using Tesseract OCR. Both English and Arabic UI text are supported when the matching language pack is installed.

Requires:

# Linux
sudo apt install tesseract-ocr            # English (always required)
sudo apt install tesseract-ocr-ara        # Arabic UI text (recommended)

# macOS
brew install tesseract                    # core
brew install tesseract-lang               # all languages including Arabic

Trigger via natural language:

read the screen
what does the screen say
read text on screen
ocr
extract text from screen
what's written on screen
اقرأ الشاشة / لقطة شاشة مع نص

Trigger via slash command:

/ocr

Trigger via menu button: tap Read Screen in /menu.

How it works:

Full screenshot is captured via scrot (Linux) or screencapture (macOS)
Screenshot is written to a temp file
tesseract <file> stdout -l <langs> is executed — <langs> is auto-detected: ara+eng if the Arabic pack is installed, otherwise eng
Output is trimmed and truncated to 4096 characters (Telegram message limit)
Temp file is deleted immediately after

The same auto-detected language list also drives click_text(...) and wait_for_text(...) — meaning Arabic-labelled buttons like إرسال can be located on screen without any configuration once the Arabic pack is installed.

If Tesseract is missing entirely, the bot responds with the install command rather than crashing. If only English is installed, you'll see a one-line hint at startup suggesting how to add Arabic.

Scheduled Tasks

Schedule recurring tasks using natural language or slash commands. Tasks are persisted to ~/.nullhand/schedule.json and automatically reloaded on bot restart.

Cron-like schedule grammar

Beyond the basic "every day at 9am" form, the parser understands:

Pattern	Example	Fires
Daily (default)	`every day at 9am`	Every day at 09:00
Specific weekday	`every Monday at 8am`	Mondays at 08:00
Multiple weekdays	`every Mon and Wed and Fri at 9am`	M/W/F at 09:00
Weekday group	`every weekday at 9am`	Mon-Fri at 09:00
Weekend group	`every weekend at 10am`	Sat+Sun at 10:00
Multiple times	`every day at 9am and 5pm`	Twice daily
Combined	`every weekday at 9am and 1pm and 5pm`	Mon-Fri × 3 times
Arabic weekdays	`every الإثنين at 9am`	Same as `every Monday at 9am`

Creating a task (natural language)

The bot detects schedule intent when your message contains phrases like "every", "schedule", or "remind me to" and at least one time token.

schedule a screenshot every day at 9am

remind me to run sysinfo every day at 8:30am

run git status every day at 14:00

send me /home/user/backup.tar.gz every day at 2am

read screen every day at 9pm

Supported time formats: 8am, 8:30am, 14:00, 9pm

Supported actions:

Phrase contains	Scheduled action
`screenshot`	Capture screen, send as photo
`sysinfo`, `cpu`, `status`, `system info`	Send system status report
`read screen`, `ocr`	Run OCR and send text
`run <cmd>` or `shell <cmd>`	Run shell command, send output
`send` + a `/path`	Send file to Telegram

Managing tasks

/schedule list

/schedule cancel task_001

/schedule clear

Example output of /schedule list:

📋 Active scheduled tasks:
🆔 task_001 — screenshot — every day at 09:00
🆔 task_002 — sysinfo — every day at 14:00

Use /schedule cancel <id> to remove a task.

Implementation detail: the scheduler aligns to the next whole minute on start, then checks every minute. Panics in task callbacks are recovered and logged.

Audit Log

Every action is appended to ~/.nullhand/audit.log.

Log format:

[2026-04-17 09:31:05] user=123456789 action=screenshot
[2026-04-17 09:32:11] user=123456789 action=shell cmd="git status"
[2026-04-17 09:33:00] user=123456789 action=file_send path="/home/user/report.pdf"
[2026-04-17 09:34:45] user=123456789 action=otp_unlock
[2026-04-17 09:35:00] user=123456789 action=schedule_create id="task_001"
[2026-04-17 09:40:00] user=123456789 action=scheduled_task id="task_001"
[2026-04-17 09:41:10] user=123456789 action=natural_language input="open Firefox and go to..."

Actions logged:

Action	Triggered by
`otp_unlock`	Successful OTP entry
`otp_lock`	Lock Bot button
`screenshot`	`/screenshot`, menu button, or AI tool
`shell`	`/shell`, menu button, or AI tool
`app_open`	`/open` command
`clipboard`	`/paste`, menu button
`sysinfo`	`/status`, menu button
`ocr`	`/ocr`, natural language, menu button
`file_send`	File send trigger
`file_receive`	File received from Telegram
`downloads`	Downloads menu button
`natural_language`	Free-form AI task (first 80 chars logged)
`recipe_save`	User-authored recipe saved to `~/.nullhand/recipes.json`
`recipe_save_failed`	Recipe parsed but disk write failed
`recipe_run`	`/recipes run …` invocation
`recipe_preview`	`/recipes preview …` dry-run
`recipe_delete` / `recipe_rename`	`/recipes delete
`voice_received`	Voice note arrived (duration + size logged)
`voice_transcribed`	Whisper produced a transcript (first 80 chars logged)
`health`	`/health` invocation
`preview`	"preview: …" / "dry-run: …" inline preview
`schedule_create`	New scheduled task
`schedule_cancel`	Task cancelled
`scheduled_task`	Scheduled task fired

The log directory (~/.nullhand/) is created with mode 0700. The log file has mode 0600. Logging failures are silently swallowed so a disk error never crashes the bot.

Read the log:

cat ~/.nullhand/audit.log

Tail it live:

tail -f ~/.nullhand/audit.log

Voice Notes

Record a voice note inside Telegram in any language and Nullhand will transcribe it via whisper.cpp, then run the resulting text through the normal command pipeline. Useful when typing on a phone is awkward — especially for Arabic.

Requires:

# Linux
sudo apt install ffmpeg
# whisper.cpp: build from https://github.com/ggerganov/whisper.cpp
# and put the resulting `whisper-cli` binary on PATH
# Download a model file too (~150 MB for ggml-base, supports Arabic):
#   bash whisper.cpp/models/download-ggml-model.sh base
# Then point whisper-cli to it via -m or the WHISPER_MODEL env var.

# macOS
brew install ffmpeg
brew install whisper-cpp
# Models go under /opt/homebrew/share/whisper-cpp/ or similar; whisper-cli
# auto-discovers if WHISPER_MODEL is unset.

Pipeline:

Bot sees voice field on the Telegram update
Downloads the .ogg via getFile/download
ffmpeg converts to 16 kHz mono WAV
whisper-cli <wav> -otxt -nt -l ar produces a .txt transcript
Bot replies "🎙️ Heard: " then re-routes the transcript through the normal handler

The default language hint is Arabic because whisper.cpp's Arabic model handles English code-switching gracefully (the reverse is not true). Send preview: … first if you want to check the transcription before it executes.

If whisper or ffmpeg is missing, the bot replies with a clear error and the install command — it never crashes silently. Verify install with /health (look for the "Voice transcription" line).

Health Diagnostics

/health returns a single-message snapshot of the bot's runtime state — useful for triaging "why doesn't X work?" questions without leaving Telegram.

Sample output:

🩺 Nullhand health report

Platform: linux/amd64
AI provider: local
OCR languages: ara+eng
Voice transcription: ✅ whisper-cli + ffmpeg
Screen Recording: ✅ ok
Accessibility:    ✅ ok

Scheduled tasks (3):
  • task_001 — screenshot — Every weekday at 09:00
  • task_002 — sysinfo — Every day at 09:00 and 17:00
  • task_003 — backup — Every Saturday at 02:00

Recipes: 27 total (24 built-in, 3 user-defined)

Allowed Telegram user: 123456789
Session unlocked: true

The OCR languages line reflects what tesseract --list-langs returned at startup. If it shows eng only, install the Arabic pack to enable bilingual screen reading.

Security

Single-user only. The bot accepts messages from exactly one Telegram user ID (set during first-run setup). Messages from any other account are silently dropped.

OTP session gate. Before any command is processed, the session must be unlocked with the current OTP code. The code is:

Generated with Go's crypto/rand
A 6-digit number in the range 100000–999999
Stored in memory only, never written to disk or logged
Automatically replaced every 2 minutes (new code printed to terminal)
Invalidated on successful entry (cannot be reused within the same session)

X11-only. The startup check rejects runs under Wayland ($WAYLAND_DISPLAY set) and headless SSH sessions ($DISPLAY unset).

Capability checks. Before starting, Nullhand verifies that scrot can actually take a screenshot and that xdotool can query the active window. If either check fails, the process exits with a clear message.

No inbound network ports. Nullhand uses Telegram long-polling outbound only — there is no listening server or open port.

AI Providers

Configure the provider during first-run setup or edit ~/.nullhand/config.json.

Provider	`ai_provider` value	Requires API key	Vision	Notes
Anthropic Claude	`claude`	Yes	Yes	Set `ai_api_key`
OpenAI	`openai`	Yes	Yes	Set `ai_api_key`; optional `ai_base_url` for proxies
Google Gemini	`gemini`	Yes	Yes	Set `ai_api_key`
DeepSeek	`deepseek`	Yes	No	Set `ai_api_key`
Grok (xAI)	`grok`	Yes	No	Set `ai_api_key`
Ollama (local LLM)	`ollama`	No	Model-dependent	Set `ai_base_url` and `ai_model`; use a vision model for screenshot analysis
Built-in rule-based	`local`	No	No	Zero cost, zero external dependency. Bilingual (English + Arabic). Routes to smart recipes for messaging, browser, settings, and button clicks

Privacy note: Cloud providers (Claude, OpenAI, Gemini, DeepSeek, Grok) receive your commands and screenshots when the AI agent calls analyze_screenshot. If privacy matters, use Ollama or local.

	Local AI (Ollama)	Cloud AI (Claude, GPT, etc.)
Privacy	100% local	Data sent to provider servers
Cost	Free	Requires paid API key
Vision	Supported (vision models)	Supported
Internet	Only for Telegram	Required for AI + Telegram

Local AI Setup

Option 1 — Built-in rule-based parser (zero dependencies)

The local provider requires no API key, no network, and no external process. Use it to get started immediately or in air-gapped environments.

{
  "ai_provider": "local"
}

What local understands out of the box:

All basic primitives: open/close apps, click coordinates, type, press key, screenshot, paste, run shell, list/read files, scroll, wait
WhatsApp / Slack / Discord / Messages send-to-contact flows (calls into smart recipes that wait for windows and OCR-click contact rows)
Browser navigation: open URL, search, address-bar typing, back/forward/refresh, new/close tab
System Settings: search inside settings, open named panel (WiFi, Bluetooth, Display, ...)
Button click: "click the X button" / "اضغط زر X" — uses fuzzy AT-SPI match with OCR fallback
Terminal commands, file browsing, git operations, VS Code/Cursor command-palette flows

Both English and Arabic phrasings are supported for every flow. See the Natural Language Examples section above for representative phrases.

Smart-pattern matching is priority-ordered: highly specific patterns (settings search, button click, app-specific messaging) are tried before generic ones (bare "search X" → Google) to avoid misclassification.

The local parser does not support vision (screenshot analysis by an LLM) or open-ended multi-step planning — for those, use Claude/OpenAI/Gemini/Ollama.

Option 2 — Ollama (recommended for full AI capability)

Ollama runs open-source LLMs locally. For full screenshot analysis support, use a vision model.

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a vision model (recommended — supports analyze_screenshot tool)
ollama pull qwen3-vl:8b    # 6.1 GB download, needs ~8 GB RAM

# Or a smaller vision model if RAM is limited
ollama pull qwen3-vl:2b    # 1.9 GB download, needs ~3-4 GB RAM

# 3. Start Ollama (if not already running as a service)
ollama serve

RAM requirements:

Model	Download size	RAM needed	Quality
`qwen3-vl:2b`	1.9 GB	~3–4 GB	Good
`qwen3-vl:8b`	6.1 GB	~8 GB	Excellent

Configure Nullhand to use Ollama:

{
  "ai_provider": "ollama",
  "ai_model": "qwen3-vl:8b",
  "ai_base_url": "http://localhost:11434"
}

If you don't need screenshot analysis and want a lighter model:

ollama pull llama3

{
  "ai_provider": "ollama",
  "ai_model": "llama3",
  "ai_base_url": "http://localhost:11434"
}

Troubleshooting

Bot can't connect to Telegram

Symptom: dial tcp: lookup api.telegram.org: server misbehaving or connection timeout errors in terminal.

Fix: Your DNS may not be resolving correctly. Run:

echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

Then restart the bot.

If on a VM, also try:

sudo systemctl restart NetworkManager

Screenshot not working / scrot fails silently

Symptom: /screenshot returns nothing, or scrot produces an empty file.

Fix 1 — Set DISPLAY variable:

export DISPLAY=:0

Add this to your ~/.bashrc to make it permanent:

echo 'export DISPLAY=:0' >> ~/.bashrc && source ~/.bashrc

Fix 2 — Allow local X11 connections:

xhost +local:

Run this after every login, or add it to your startup applications.

Fix 3 — Verify scrot works manually:

DISPLAY=:0 scrot /tmp/test.png && echo "works" && ls -la /tmp/test.png

If the file is 0 bytes, your X11 session may not be properly initialized.

Must use X11, not Wayland

Symptom: $DISPLAY not set error on startup, or xdotool/scrot failing completely.

Cause: Nullhand requires an X11 session. Wayland is not supported in v1.

Fix: At the login screen, click the gear icon ⚙️ and select "Ubuntu on Xorg" (or your distro's equivalent X11 session) before logging in.

To verify you're on X11:

echo $XDG_SESSION_TYPE

Should output x11. If it outputs wayland, log out and select Xorg session.

xdotool not working / click and type commands fail

Symptom: /click, /type, /key return ✓ but nothing happens on screen.

Fix: Ensure DISPLAY is set and xdotool can reach the display:

export DISPLAY=:0
xdotool getactivewindow

If getactivewindow returns a window ID, xdotool is working correctly. If it errors, your X11 session needs the local connection fix:

xhost +local:

OCR returns empty or garbled text

Symptom: /ocr returns no text or random characters.

Cause: Tesseract may not be installed, or the screen content is purely graphical.

Fix:

sudo apt install tesseract-ocr
tesseract --version

Note: OCR works best on text-heavy screens. Purely graphical content (icons, images) will return little or no text — this is expected behavior.

Clipboard commands not working (/paste returns empty)

Symptom: /paste returns empty or fails silently.

Fix: Ensure xclip is installed and DISPLAY is set:

sudo apt install xclip
export DISPLAY=:0
xclip -selection clipboard -o

If xclip errors with "Can't open display", run xhost +local: first.

AI agent not responding / "empty choices" error

Symptom: Natural language commands return AI call failed: empty choices or similar.

Cause: Your AI provider's API is unavailable or the API key has no credits.

Fix options:

Switch to the built-in local provider (no API key needed): Edit ~/.nullhand/config.json and set "ai_provider": "local"
Check your API key has credits at your provider's dashboard
Try a different AI provider

Note: The local provider handles simple commands (open app, screenshot, status) but does not support vision or complex multi-step tasks.

OTP code not showing / bot not starting

Symptom: Bot starts but no OTP box appears, or bot exits immediately.

Fix: Check the terminal output for error messages. Common causes:

Missing dependencies → run the full apt install command from the Requirements section
Wrong display session → ensure you're on X11 not Wayland
Config file corrupted → delete ~/.nullhand/config.json and run setup again:

rm ~/.nullhand/config.json && ./nullhand

Running in a VirtualBox VM

Clipboard sharing between host and VM:

sudo apt install virtualbox-guest-x11
sudo reboot

Then in VirtualBox menu: Devices → Shared Clipboard → Bidirectional

No internet in VM:

In VirtualBox Settings → Network → change to Bridged Adapter
Select your active network adapter (WiFi or Ethernet) from the Name dropdown
Start VM and run:

sudo systemctl restart NetworkManager
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
ping google.com

Slow performance: Allocate more resources in VirtualBox Settings:

RAM: 4096 MB recommended
CPUs: 2 minimum
Video Memory: 128 MB (Display settings)

Scheduled tasks not firing

Symptom: Scheduled tasks were set but never executed at the expected time.

Diagnosis:

Confirm the tasks are loaded — /schedule list should show them.
Verify the bot is actually running at the scheduled minute (it polls every 60s).
Check ~/.nullhand/schedule.json exists and contains the expected entries.
Tail ~/.nullhand/audit.log and look for scheduled_task entries firing at the right hour:minute.

Persistence note: tasks are saved to ~/.nullhand/schedule.json whenever you add, cancel, or clear them, and reloaded automatically at startup. If the file is corrupted or unreadable, the bot logs a warning and starts with an empty schedule rather than failing to boot.

General dependency check

Run this to verify all required tools are installed:

which git go xdotool scrot wmctrl xclip convert tesseract && echo "✅ All dependencies found"

If any are missing:

sudo apt install -y git golang xdotool scrot wmctrl xclip imagemagick python3-pyatspi at-spi2-core desktop-file-utils tesseract-ocr

Contributing

This is a Linux port of the original Nullhand by AzozzALFiras. Original repo: https://github.com/AzozzALFiras/Nullhand To contribute to this Linux port, fork https://github.com/AzozzALFiras/Nullhand and open a pull request.

License

See LICENSE in the repository root.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
cmd/nullhand		cmd/nullhand
docs		docs
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod

Folders and files

Latest commit

History

Repository files navigation

Nullhand Linux

What is this?

🚀 Looking for a Server Edition? Meet AXGhost For Your Servers

Features

How it works

Requirements

System

Dependencies

Go version

Telegram setup

Installation

First Run & OTP

Commands & Usage

Natural Language Examples

Slash Commands (table)

Inline Menu (/menu)

Smart Recipes

Why recipes (and not just raw click/type)?

Built-in recipes (selected)

Calling a recipe

Authoring your own recipes from Telegram

Managing recipes from Telegram

Conversation memory

File Transfer

Sending a file from Linux to Telegram

Receiving a file from Telegram

OCR

Scheduled Tasks

Cron-like schedule grammar

Creating a task (natural language)

Managing tasks

Audit Log

Voice Notes

Health Diagnostics

Security

AI Providers

Local AI Setup

Option 1 — Built-in rule-based parser (zero dependencies)

Option 2 — Ollama (recommended for full AI capability)

Troubleshooting

Bot can't connect to Telegram

Screenshot not working / scrot fails silently

Must use X11, not Wayland

xdotool not working / click and type commands fail

OCR returns empty or garbled text

Clipboard commands not working (/paste returns empty)

AI agent not responding / "empty choices" error

OTP code not showing / bot not starting

Running in a VirtualBox VM

Scheduled tasks not firing

General dependency check

Contributing

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages