Skip to content

AzozzALFiras/Nullhand

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nullhand Linux

Control your Linux desktop from Telegram. Send a message, get a screenshot back. Type text, click coordinates, open apps, transfer files, schedule tasks — all through your phone.


What is this?

Nullhand is a Telegram bot that runs as a background process on your Linux machine and gives you full desktop control over chat. You send natural language or slash commands; the bot acts on your screen in real time.

It ships with an OTP session gate, a structured audit log, a built-in scheduler, bidirectional file transfer, and a pluggable AI backend (local rule-based parser included — no API key required to get started).


🚀 Looking for a Server Edition? Meet AXGhost For Your Servers

While Nullhand is your ultimate command center for desktop, we've built a specialized, enterprise-grade tool specifically for servers.

AXGhost is a zero-cloud, zero-AI Go daemon that transforms Telegram into a highly secure, natural-language administration console for your server infrastructure.

Why AXGhost?

  • 🌍 Bilingual Brilliance: Seamlessly execute commands using natural language in both Arabic and English.
  • 🛡️ Ironclad Security: Strict whitelisted execution ensures only explicitly approved, safe operations can run.
  • 🔐 100% Private (Zero-Cloud / Zero-AI): A completely self-contained daemon. No third-party APIs, no AI token limits, and absolutely no data leakage.
  • 📜 Full Audit Trail: Comprehensive logging provides total visibility over every system action you take.

👉 Discover AXGhost on the AevonX Marketplace 💻 View the Open Source Repository on GitHub


Features

  • Natural language control — "open Firefox", "take a screenshot", "type Hello World"
  • Bilingual parser (English + Arabic) — "افتح فايرفوكس وروح إلى github.com", "ابحث في الإعدادات عن WiFi", "اضغط زر إرسال"
  • Smart recipes with state verification — recipes wait for windows to appear, OCR-click search results, and clear fields before typing instead of relying on fixed sleep timers
  • Multi-step app workflows out of the box — WhatsApp send (with contact-picker selection via OCR), browser URL navigation, Settings search, generic button click — all callable from one phrase
  • Author your own recipes from Telegram — "save this as recipe morning_routine: open Firefox, go to news.com" / "احفظ هذا كروتين الصباح: …" — parsed and persisted to ~/.nullhand/recipes.json
  • Voice notes → text — record a Telegram voice note in any language; Nullhand transcribes it via whisper.cpp (Arabic + English bilingual support) and runs it as if you typed it
  • Conversation memory — follow-up commands fall back to recently-used entities ("open Firefox" then "go to github.com" remembers Firefox; "send hi" remembers the last contact)
  • Preview / dry-run mode — see exactly what a command will do before running it: preview: … / معاينة: … returns a numbered execution plan with no side effects
  • Slash commands — explicit commands with arguments for scripting workflows
  • Inline quick-action menu — one tap for the most common actions
  • Screenshot & bilingual OCR — capture the screen, extract visible text in English or Arabic, or locate a specific phrase's pixel coordinates via Tesseract HOCR (auto-uses ara+eng when the Arabic language pack is installed)
  • Mouse & keyboard automation — click, double-click, right-click, drag, scroll, type, key shortcuts, clear field
  • Accessibility-aware element control — click UI elements by exact label, fuzzy substring match, or OCR fallback for Electron apps
  • App launcher — open GNOME/GTK/Snap applications by name (Linux) or .app bundles (macOS)
  • File transfer (bidirectional) — send files from your desktop to Telegram; receive files from Telegram to disk
  • Persistent scheduled tasks (cron-like) — set recurring screenshots, shell commands, or system info reports; supports daily, weekday, weekend, specific days (Mon/Wed/Fri), and multiple fire times per day; survives bot restarts via ~/.nullhand/schedule.json
  • Audit log — every action appended to ~/.nullhand/audit.log
  • OTP session lock — cryptographically random 6-digit code, auto-rotates every 2 minutes
  • Multiple AI backends — Claude, OpenAI, Gemini, DeepSeek, Grok, Ollama, or offline local mode
  • Interactive file browser — browse directories with inline keyboard navigation

How it works

You (Telegram)
│
▼
OTP Gate ──── locked? → ignore message
│
▼
Message Router (in order)
│
├── File received? ──────────────→ Destination picker → Save to disk
│
├── OCR trigger? ────────────────→ scrot → tesseract → reply text
│
├── Schedule command? ───────────→ Create/list/cancel task
│
├── File send request? ──────────→ Read file → zip if needed → send
│
├── Slash command? ──────────────→ Execute directly → reply
│
└── Everything else ─────────────→ AI Agent Loop
                                        │
                                   Take screenshot
                                        │
                                   Send to AI model
                                        │
                                   AI picks a tool
                                   (click/type/shell/open)
                                        │
                                   Execute on desktop
                                        │
                                   Take new screenshot
                                        │
                                   Done? → reply result
                                   Not done? → repeat

Every action is logged to ~/.nullhand/audit.log with timestamp and user ID.


Requirements

System

  • Linux with an X11 session (Wayland is not supported in this version)
  • Log in with "Ubuntu on Xorg" (or equivalent) at the display manager
  • $DISPLAY must be set; $WAYLAND_DISPLAY must be unset
  • GNOME desktop recommended (some launcher entries are GNOME-specific)

Dependencies

Install all required tools in one command:

sudo apt install \
  xdotool \
  scrot \
  wmctrl \
  xclip \
  imagemagick \
  python3 \
  x11-xserver-utils \
  libgtk-3-bin \
  python3-pyatspi \
  at-spi2-core

For OCR support (optional but recommended):

sudo apt install tesseract-ocr            # English OCR
sudo apt install tesseract-ocr-ara        # Arabic UI text (recommended for Arabic users)

If both packages are present, Nullhand auto-uses ara+eng so click_text("إرسال") and wait_for_text("بحث") work alongside English. With only tesseract-ocr installed, OCR falls back to English-only and prints a one-line install hint at startup.

For voice-note transcription (optional but useful for mobile users):

# Linux: install ffmpeg + whisper.cpp manually
sudo apt install ffmpeg
git clone https://github.com/ggerganov/whisper.cpp && cd whisper.cpp && make
sudo cp build/bin/whisper-cli /usr/local/bin/
bash models/download-ggml-model.sh base    # downloads ~150 MB model

# macOS: Homebrew has both
brew install ffmpeg whisper-cpp

Verify with /health after starting the bot — look for the "Voice transcription" line.

Tool Package Purpose
xdotool xdotool Key presses, active window query
scrot scrot Screenshots
wmctrl wmctrl App listing and window focus
xclip xclip Clipboard read/write
convert imagemagick Screenshot resizing for HiDPI
python3 python3 Accessibility scripting
xrandr x11-xserver-utils Screen resolution detection
gtk-launch libgtk-3-bin App launcher via .desktop files
python3-pyatspi python3-pyatspi AT-SPI accessibility tree access
at-spi2-core at-spi2-core AT-SPI2 daemon
tesseract tesseract-ocr OCR — read text from screen
Arabic OCR pack tesseract-ocr-ara Recognise Arabic UI labels (auto-detected)

Go version

go 1.21 or later

Telegram setup

  1. Open Telegram and message @BotFather
  2. Send /newbot and follow the prompts
  3. Copy the bot token (format: 123456789:ABCdef...)
  4. Get your Telegram user ID — message @userinfobot to find it
  5. Start a private chat with your new bot before running Nullhand

Installation

# 1. Clone the repository
git clone https://github.com/AzozzALFiras/Nullhand
cd Nullhand

# 2. Install system dependencies (see Requirements above)
sudo apt install xdotool scrot wmctrl xclip imagemagick python3 \
  x11-xserver-utils libgtk-3-bin python3-pyatspi at-spi2-core

# 3. Build for Linux
GOOS=linux go build -o nullhand ./cmd/nullhand

# 4. Run
./nullhand

On first run, a setup wizard will prompt you for your Telegram bot token, your Telegram user ID, and your preferred AI provider. Configuration is saved to ~/.nullhand/config.json.


First Run & OTP

When Nullhand starts (and on every restart), it prints a one-time password to the terminal:

╔══════════════════════════════╗
║  OTP CODE: 482917          ║
║  Expires in 2 minutes        ║
╚══════════════════════════════╝

Enter this code in Telegram to unlock the bot.

You must send this exact 6-digit code to the bot in Telegram before any command is accepted. The code:

  • Is generated with crypto/rand — cryptographically random
  • Expires after 2 minutes and is automatically replaced with a new one (printed to terminal again)
  • Once entered correctly, the session stays unlocked until you restart or use the Lock Bot button in /menu
  • Is stored in memory only — never written to disk

To re-lock the session manually, tap Lock Bot in /menu or press the menu:lock inline button.


Commands & Usage

Natural Language Examples

Just send a message in plain English or Arabic. The local rule-based parser handles both languages without an API key.

Basics

take a screenshot
what's my CPU usage
open Firefox
type Hello World
click at 960 540
press ctrl+t
read the screen
run git status in terminal
send me /home/user/report.pdf

Browser navigation — opens the browser, waits for the window, clears the address bar, types the URL, and hits Enter

open firefox and go to github.com
افتح فايرفوكس وروح إلى github.com
type google.com in the address bar
اكتب google.com في شريط العنوان
search for "go programming"
ابحث عن golang
new tab / علامة تبويب جديدة
back / ارجع
refresh / تحديث
close tab / أغلق التبويب

WhatsApp messaging — opens WhatsApp, opens new-chat, types the contact name, OCR-clicks the matching contact in the autocomplete list, then types and sends the message

open whatsapp and send azozz a message hello
ارسل لعزوز في الواتساب: مرحبا
واتساب عزوز: مرحبا
افتح واتساب وأرسل لعزوز رسالة مرحبا

Settings (GNOME / Cinnamon / KDE) — opens Settings, focuses the integrated search bar, types the query

search settings for wifi
ابحث في الإعدادات عن WiFi
open WiFi settings
افتح إعدادات WiFi

Click any visible button — tries AT-SPI fuzzy match first, falls back to OCR-locate-and-click for Electron apps

click the Send button
press OK
اضغط زر إرسال
انقر على زر حفظ
click Send in WhatsApp

Schedule recurring tasks (persisted across restarts, cron-like)

schedule a screenshot every day at 9am
remind me to run sysinfo every day at 14:00
schedule a screenshot every weekday at 9am
schedule a screenshot every Monday at 8:30am
schedule a screenshot every Mon and Wed and Fri at 9am
schedule a screenshot every weekend at 10am
schedule a screenshot every day at 9am and 5pm    ← multiple times

Voice notes — record a Telegram voice note in Arabic or English; Nullhand transcribes it with whisper.cpp and runs it as a normal command. The bot replies "🎙️ Heard: " then executes. Works with any of the natural-language patterns above.

Save your own recipes — the bot turns each step into a reusable recipe and writes it to ~/.nullhand/recipes.json

save this as recipe morning_routine: open Firefox, go to news.com, take a screenshot
احفظ هذا كروتين الصباح: افتح Firefox، روح إلى news.com، خذ لقطة شاشة

Run a saved recipe later:

recipe morning_routine
recipe الصباح

Conversation memory — follow-up commands fall back to recently-used entities

open Firefox
go to github.com           ← uses Firefox automatically
search for golang          ← still Firefox

ارسل لعزوز في الواتساب: مرحبا
ارسل: كيف الحال           ← يفهم contact = عزوز

Preview / dry-run — see what a command will do before actually doing it

preview: open whatsapp and send azozz hi
dry-run: open firefox and go to github.com
معاينة: افتح فايرفوكس وروح إلى github.com
جرب: ابحث في الإعدادات عن WiFi

The bot replies with a numbered plan of every tool call (and recipe step) that would run, expanding run_recipe(...) calls so you see the full sequence. Nothing is executed.

Slash Commands (table)

Command Arguments Description
/start Welcome message and command list
/help Show all available commands
/screenshot Capture the full screen and send as photo
/status CPU, memory, and active application info
/apps List currently open windows
/open <app name> Open an application by name
/ls [path] List directory contents
/read <path> Read a file and return its contents
/shell <command> Run a whitelisted shell command
/click <x> <y> Click at the given screen coordinates
/type <text> Type text into the active window
/key <shortcut> Press a key or modifier combination
/paste Get current clipboard contents
/stop Cancel the currently running AI task
/diag Show diagnostic info (frontmost app, screen size)
/inspect Dump accessibility tree of the frontmost window
/ocr Extract visible text from the screen
/schedule list | cancel <id> | clear Manage scheduled tasks
/recipes — | <name> | show <name> | run <name> [k=v ...] | preview <name> [k=v ...] | delete <name> | rename <old> <new> Browse and manage built-in + user-saved recipes
/health System health: OS, AI provider, OCR languages, permissions, scheduled tasks count, recipes count
/menu Open the inline quick-action toolbar

Keyboard shortcut examples for /key:

/key enter
/key ctrl+t
/key ctrl+shift+5
/key escape
/key f5
/key super

Modifier aliases: cmd and command map to ctrl; option maps to alt.

Inline Menu (/menu)

Send /menu to get the quick-action toolbar with inline keyboard buttons:

Button Action
📸 Screenshot Capture and send the current screen
💻 System Info Show CPU, memory, active app
📋 Clipboard Read and return clipboard contents
🐚 Run Command Prompt for a shell command, execute it
📤 Send File Prompt for a file path, upload to Telegram
📥 Downloads List ~/Downloads directory
🔍 Read Screen OCR — extract text from the current screen
🔒 Lock Bot Lock the session; new OTP printed to terminal
❓ Help Show natural language usage examples

Smart Recipes

Recipes are pre-built multi-step workflows that the bot can run by name. Unlike a blind keystroke macro, every recipe step verifies state before the next step fires — windows must appear, fields must be empty before typing, and contact pickers are selected via OCR rather than guessed Enter presses.

Why recipes (and not just raw click/type)?

A flow like "open WhatsApp, search for Azozz, send 'hi'" fails with naïve automation because:

  • The new-chat search box appears asynchronously after Ctrl+N
  • The autocomplete dropdown takes a variable amount of time to populate
  • Pressing Return on a typed name often jumps to the wrong contact
  • WhatsApp on Linux is Electron, so AT-SPI cannot see the contact list

Nullhand's recipe engine solves this by combining six step kinds:

Step kind What it does Used to fix
wait_for_window Polls every 200 ms until a window with a matching title is active App-launch race conditions
wait_for_text Polls OCR every 400 ms until the requested phrase is visible on screen Slow-loading dialogs and dropdowns
wait_for_element Polls AT-SPI every 250 ms for an element matching a label substring Native GTK/Qt apps
click_text Locates a text region via OCR HOCR and clicks its bounding-box center Electron apps where AT-SPI is blind (WhatsApp, Slack, VS Code, Discord)
click_fuzzy AT-SPI substring match; falls back to click_text automatically Buttons whose accessible name differs slightly from the visible label
clear_field Ctrl+A then Delete Replacing existing text in an address bar or search box

Built-in recipes (selected)

Recipe Parameters What it does
whatsapp_send_message contact, message Open WhatsApp → wait for window → Ctrl+N → wait for "Search" → type contact → wait for autocomplete → OCR-click matching row → type message → Enter
whatsapp_new_message contact Same as above without sending — opens the chat ready for follow-up
browser_open_url browser, url Open browser → wait for window → Ctrl+L → clear field → type URL → Enter
browser_google_search browser, query Same flow but submits to Google
browser_new_tab_and_search browser, query Ctrl+T → clear → query → Enter
browser_click_link text OCR-click any visible link or button on the current page
browser_back / browser_forward / browser_reload browser Standard navigation shortcuts
settings_open Open the system Settings app and wait for it
settings_search query Open Settings → Ctrl+F → clear → type query
settings_open_panel panel Open Settings → fuzzy-click the named panel (WiFi, Bluetooth, Display, ...)
click_button label Fuzzy-click a button in the frontmost app, OCR fallback included
press_button_in_app app, label Open app, wait, then fuzzy-click the labelled button inside it

The full list is available at runtime via the list_recipes tool or by reading internal/service/recipe/defaults.go. User-defined recipes can be added in ~/.nullhand/recipes.json to override or extend the defaults.

Calling a recipe

Most natural-language phrases route to a recipe automatically (see the examples above). To call one explicitly:

recipe whatsapp_send_message {"contact":"Azozz","message":"hi"}
recipe settings_search {"query":"WiFi"}
recipe click_button {"label":"إرسال"}

Or via the AI agent's tool call (when using a cloud provider):

run_recipe(name="browser_open_url", params_json='{"browser":"Firefox","url":"github.com"}')

Authoring your own recipes from Telegram

You don't need to edit JSON by hand. Send a single message that names the recipe and lists its steps separated by commas (or ;, then, and, ثم, or newlines). Each step is a normal natural-language phrase that the bot already understands.

English

save this as recipe morning_routine: open Firefox, go to news.com, take a screenshot
remember as routine slack_focus: open Slack, click the Channels button

Arabic

احفظ هذا كروتين الصباح: افتح Firefox، روح إلى news.com، خذ لقطة شاشة
احفظ روتين العمل: افتح Slack، انقر على زر Channels

The bot replies with ✅ Saved recipe "morning_routine" (3 step(s)) and writes the recipe to ~/.nullhand/recipes.json. Names are normalised to snake_case automatically. Run them by name later:

recipe morning_routine
recipe الصباح

Supported step types (any phrase that maps to one of these tools is allowed): open_app, type_text, press_key, wait, click_text, click_ui_element_fuzzy, wait_for_text, wait_for_window, wait_for_element, clear_field, focus_via_palette, focus_text_field. Coordinate-based clicks and run_recipe (nesting) are intentionally rejected so saved recipes stay portable across screens.

Managing recipes from Telegram

Browse and curate the recipe library through /recipes:

/recipes                               # full list (built-in + your own)
/recipes show whatsapp_send_message    # see steps of one recipe
/recipes run morning_routine           # execute a recipe by name
/recipes run browser_open_url browser=Firefox url=github.com
/recipes preview morning_routine       # dry-run; show steps without executing
/recipes delete old_routine            # remove a user-saved recipe
/recipes rename old_name new_name      # rename a user recipe

Built-in recipes are protected — you can't delete or rename them, but you can shadow any default by saving a new recipe with the same name (the user file overrides the default).

Conversation memory

The local parser remembers the most recent browser, contact, URL, and search query per chat (10-minute window). Follow-up commands without an explicit subject fall back to the remembered entity:

open Firefox                  → opens Firefox
go to github.com              → opens github.com in Firefox (not the default)
ابحث عن golang               → searches in Firefox

ارسل لعزوز في الواتساب: hi   → سياق contact=عزوز محفوظ
ارسل: كيف الحال              → contact=عزوز ضمنياً

Memory is per-chat and not persisted across bot restarts.


File Transfer

Sending a file from Linux to Telegram

Natural language:

send me /home/user/documents/report.pdf

Upload keyword with path:

upload /var/log/syslog

Slash command via menu button: Tap Send File in /menu, then enter the path when prompted.

How it works:

  • Files under 50 MB are sent directly
  • Files over 50 MB and entire directories are automatically zipped before sending
  • The file type determines the Telegram method: images use sendPhoto, everything else uses sendDocument
  • Temporary zip files are always cleaned up after sending

Receiving a file from Telegram

Simply send or forward any file (document, photo, video, audio) to the bot. You will be asked where to save it:

📥 Where should I save "report.pdf"?
[ 🏠 Home ]  [ 🖥️ Desktop ]
[ 📥 Downloads ]  [ ✏️ Custom path ]

Tap a button to save to that location, or tap Custom path and type a full directory path (e.g. /home/user/projects/).

If a file with the same name already exists, a timestamp is appended automatically (report_20260417_153012.pdf).


OCR

Nullhand can read text visible on screen using Tesseract OCR. Both English and Arabic UI text are supported when the matching language pack is installed.

Requires:

# Linux
sudo apt install tesseract-ocr            # English (always required)
sudo apt install tesseract-ocr-ara        # Arabic UI text (recommended)

# macOS
brew install tesseract                    # core
brew install tesseract-lang               # all languages including Arabic

Trigger via natural language:

read the screen
what does the screen say
read text on screen
ocr
extract text from screen
what's written on screen
اقرأ الشاشة / لقطة شاشة مع نص

Trigger via slash command:

/ocr

Trigger via menu button: tap Read Screen in /menu.

How it works:

  1. Full screenshot is captured via scrot (Linux) or screencapture (macOS)
  2. Screenshot is written to a temp file
  3. tesseract <file> stdout -l <langs> is executed — <langs> is auto-detected: ara+eng if the Arabic pack is installed, otherwise eng
  4. Output is trimmed and truncated to 4096 characters (Telegram message limit)
  5. Temp file is deleted immediately after

The same auto-detected language list also drives click_text(...) and wait_for_text(...) — meaning Arabic-labelled buttons like إرسال can be located on screen without any configuration once the Arabic pack is installed.

If Tesseract is missing entirely, the bot responds with the install command rather than crashing. If only English is installed, you'll see a one-line hint at startup suggesting how to add Arabic.


Scheduled Tasks

Schedule recurring tasks using natural language or slash commands. Tasks are persisted to ~/.nullhand/schedule.json and automatically reloaded on bot restart.

Cron-like schedule grammar

Beyond the basic "every day at 9am" form, the parser understands:

Pattern Example Fires
Daily (default) every day at 9am Every day at 09:00
Specific weekday every Monday at 8am Mondays at 08:00
Multiple weekdays every Mon and Wed and Fri at 9am M/W/F at 09:00
Weekday group every weekday at 9am Mon-Fri at 09:00
Weekend group every weekend at 10am Sat+Sun at 10:00
Multiple times every day at 9am and 5pm Twice daily
Combined every weekday at 9am and 1pm and 5pm Mon-Fri × 3 times
Arabic weekdays every الإثنين at 9am Same as every Monday at 9am

Creating a task (natural language)

The bot detects schedule intent when your message contains phrases like "every", "schedule", or "remind me to" and at least one time token.

schedule a screenshot every day at 9am
remind me to run sysinfo every day at 8:30am
run git status every day at 14:00
send me /home/user/backup.tar.gz every day at 2am
read screen every day at 9pm

Supported time formats: 8am, 8:30am, 14:00, 9pm

Supported actions:

Phrase contains Scheduled action
screenshot Capture screen, send as photo
sysinfo, cpu, status, system info Send system status report
read screen, ocr Run OCR and send text
run <cmd> or shell <cmd> Run shell command, send output
send + a /path Send file to Telegram

Managing tasks

/schedule list
/schedule cancel task_001
/schedule clear

Example output of /schedule list:

📋 Active scheduled tasks:
🆔 task_001 — screenshot — every day at 09:00
🆔 task_002 — sysinfo — every day at 14:00

Use /schedule cancel <id> to remove a task.

Implementation detail: the scheduler aligns to the next whole minute on start, then checks every minute. Panics in task callbacks are recovered and logged.


Audit Log

Every action is appended to ~/.nullhand/audit.log.

Log format:

[2026-04-17 09:31:05] user=123456789 action=screenshot
[2026-04-17 09:32:11] user=123456789 action=shell cmd="git status"
[2026-04-17 09:33:00] user=123456789 action=file_send path="/home/user/report.pdf"
[2026-04-17 09:34:45] user=123456789 action=otp_unlock
[2026-04-17 09:35:00] user=123456789 action=schedule_create id="task_001"
[2026-04-17 09:40:00] user=123456789 action=scheduled_task id="task_001"
[2026-04-17 09:41:10] user=123456789 action=natural_language input="open Firefox and go to..."

Actions logged:

Action Triggered by
otp_unlock Successful OTP entry
otp_lock Lock Bot button
screenshot /screenshot, menu button, or AI tool
shell /shell, menu button, or AI tool
app_open /open command
clipboard /paste, menu button
sysinfo /status, menu button
ocr /ocr, natural language, menu button
file_send File send trigger
file_receive File received from Telegram
downloads Downloads menu button
natural_language Free-form AI task (first 80 chars logged)
recipe_save User-authored recipe saved to ~/.nullhand/recipes.json
recipe_save_failed Recipe parsed but disk write failed
recipe_run /recipes run … invocation
recipe_preview /recipes preview … dry-run
recipe_delete / recipe_rename `/recipes delete
voice_received Voice note arrived (duration + size logged)
voice_transcribed Whisper produced a transcript (first 80 chars logged)
health /health invocation
preview "preview: …" / "dry-run: …" inline preview
schedule_create New scheduled task
schedule_cancel Task cancelled
scheduled_task Scheduled task fired

The log directory (~/.nullhand/) is created with mode 0700. The log file has mode 0600. Logging failures are silently swallowed so a disk error never crashes the bot.

Read the log:

cat ~/.nullhand/audit.log

Tail it live:

tail -f ~/.nullhand/audit.log

Voice Notes

Record a voice note inside Telegram in any language and Nullhand will transcribe it via whisper.cpp, then run the resulting text through the normal command pipeline. Useful when typing on a phone is awkward — especially for Arabic.

Requires:

# Linux
sudo apt install ffmpeg
# whisper.cpp: build from https://github.com/ggerganov/whisper.cpp
# and put the resulting `whisper-cli` binary on PATH
# Download a model file too (~150 MB for ggml-base, supports Arabic):
#   bash whisper.cpp/models/download-ggml-model.sh base
# Then point whisper-cli to it via -m or the WHISPER_MODEL env var.

# macOS
brew install ffmpeg
brew install whisper-cpp
# Models go under /opt/homebrew/share/whisper-cpp/ or similar; whisper-cli
# auto-discovers if WHISPER_MODEL is unset.

Pipeline:

  1. Bot sees voice field on the Telegram update
  2. Downloads the .ogg via getFile/download
  3. ffmpeg converts to 16 kHz mono WAV
  4. whisper-cli <wav> -otxt -nt -l ar produces a .txt transcript
  5. Bot replies "🎙️ Heard: " then re-routes the transcript through the normal handler

The default language hint is Arabic because whisper.cpp's Arabic model handles English code-switching gracefully (the reverse is not true). Send preview: … first if you want to check the transcription before it executes.

If whisper or ffmpeg is missing, the bot replies with a clear error and the install command — it never crashes silently. Verify install with /health (look for the "Voice transcription" line).


Health Diagnostics

/health returns a single-message snapshot of the bot's runtime state — useful for triaging "why doesn't X work?" questions without leaving Telegram.

Sample output:

🩺 Nullhand health report

Platform: linux/amd64
AI provider: local
OCR languages: ara+eng
Voice transcription: ✅ whisper-cli + ffmpeg
Screen Recording: ✅ ok
Accessibility:    ✅ ok

Scheduled tasks (3):
  • task_001 — screenshot — Every weekday at 09:00
  • task_002 — sysinfo — Every day at 09:00 and 17:00
  • task_003 — backup — Every Saturday at 02:00

Recipes: 27 total (24 built-in, 3 user-defined)

Allowed Telegram user: 123456789
Session unlocked: true

The OCR languages line reflects what tesseract --list-langs returned at startup. If it shows eng only, install the Arabic pack to enable bilingual screen reading.


Security

Single-user only. The bot accepts messages from exactly one Telegram user ID (set during first-run setup). Messages from any other account are silently dropped.

OTP session gate. Before any command is processed, the session must be unlocked with the current OTP code. The code is:

  • Generated with Go's crypto/rand
  • A 6-digit number in the range 100000–999999
  • Stored in memory only, never written to disk or logged
  • Automatically replaced every 2 minutes (new code printed to terminal)
  • Invalidated on successful entry (cannot be reused within the same session)

X11-only. The startup check rejects runs under Wayland ($WAYLAND_DISPLAY set) and headless SSH sessions ($DISPLAY unset).

Capability checks. Before starting, Nullhand verifies that scrot can actually take a screenshot and that xdotool can query the active window. If either check fails, the process exits with a clear message.

No inbound network ports. Nullhand uses Telegram long-polling outbound only — there is no listening server or open port.


AI Providers

Configure the provider during first-run setup or edit ~/.nullhand/config.json.

Provider ai_provider value Requires API key Vision Notes
Anthropic Claude claude Yes Yes Set ai_api_key
OpenAI openai Yes Yes Set ai_api_key; optional ai_base_url for proxies
Google Gemini gemini Yes Yes Set ai_api_key
DeepSeek deepseek Yes No Set ai_api_key
Grok (xAI) grok Yes No Set ai_api_key
Ollama (local LLM) ollama No Model-dependent Set ai_base_url and ai_model; use a vision model for screenshot analysis
Built-in rule-based local No No Zero cost, zero external dependency. Bilingual (English + Arabic). Routes to smart recipes for messaging, browser, settings, and button clicks

Privacy note: Cloud providers (Claude, OpenAI, Gemini, DeepSeek, Grok) receive your commands and screenshots when the AI agent calls analyze_screenshot. If privacy matters, use Ollama or local.

Local AI (Ollama) Cloud AI (Claude, GPT, etc.)
Privacy 100% local Data sent to provider servers
Cost Free Requires paid API key
Vision Supported (vision models) Supported
Internet Only for Telegram Required for AI + Telegram

Local AI Setup

Option 1 — Built-in rule-based parser (zero dependencies)

The local provider requires no API key, no network, and no external process. Use it to get started immediately or in air-gapped environments.

{
  "ai_provider": "local"
}

What local understands out of the box:

  • All basic primitives: open/close apps, click coordinates, type, press key, screenshot, paste, run shell, list/read files, scroll, wait
  • WhatsApp / Slack / Discord / Messages send-to-contact flows (calls into smart recipes that wait for windows and OCR-click contact rows)
  • Browser navigation: open URL, search, address-bar typing, back/forward/refresh, new/close tab
  • System Settings: search inside settings, open named panel (WiFi, Bluetooth, Display, ...)
  • Button click: "click the X button" / "اضغط زر X" — uses fuzzy AT-SPI match with OCR fallback
  • Terminal commands, file browsing, git operations, VS Code/Cursor command-palette flows

Both English and Arabic phrasings are supported for every flow. See the Natural Language Examples section above for representative phrases.

Smart-pattern matching is priority-ordered: highly specific patterns (settings search, button click, app-specific messaging) are tried before generic ones (bare "search X" → Google) to avoid misclassification.

The local parser does not support vision (screenshot analysis by an LLM) or open-ended multi-step planning — for those, use Claude/OpenAI/Gemini/Ollama.

Option 2 — Ollama (recommended for full AI capability)

Ollama runs open-source LLMs locally. For full screenshot analysis support, use a vision model.

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a vision model (recommended — supports analyze_screenshot tool)
ollama pull qwen3-vl:8b    # 6.1 GB download, needs ~8 GB RAM

# Or a smaller vision model if RAM is limited
ollama pull qwen3-vl:2b    # 1.9 GB download, needs ~3-4 GB RAM

# 3. Start Ollama (if not already running as a service)
ollama serve

RAM requirements:

Model Download size RAM needed Quality
qwen3-vl:2b 1.9 GB ~3–4 GB Good
qwen3-vl:8b 6.1 GB ~8 GB Excellent

Configure Nullhand to use Ollama:

{
  "ai_provider": "ollama",
  "ai_model": "qwen3-vl:8b",
  "ai_base_url": "http://localhost:11434"
}

If you don't need screenshot analysis and want a lighter model:

ollama pull llama3
{
  "ai_provider": "ollama",
  "ai_model": "llama3",
  "ai_base_url": "http://localhost:11434"
}

Troubleshooting

Bot can't connect to Telegram

Symptom: dial tcp: lookup api.telegram.org: server misbehaving or connection timeout errors in terminal.

Fix: Your DNS may not be resolving correctly. Run:

echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

Then restart the bot.

If on a VM, also try:

sudo systemctl restart NetworkManager

Screenshot not working / scrot fails silently

Symptom: /screenshot returns nothing, or scrot produces an empty file.

Fix 1 — Set DISPLAY variable:

export DISPLAY=:0

Add this to your ~/.bashrc to make it permanent:

echo 'export DISPLAY=:0' >> ~/.bashrc && source ~/.bashrc

Fix 2 — Allow local X11 connections:

xhost +local:

Run this after every login, or add it to your startup applications.

Fix 3 — Verify scrot works manually:

DISPLAY=:0 scrot /tmp/test.png && echo "works" && ls -la /tmp/test.png

If the file is 0 bytes, your X11 session may not be properly initialized.


Must use X11, not Wayland

Symptom: $DISPLAY not set error on startup, or xdotool/scrot failing completely.

Cause: Nullhand requires an X11 session. Wayland is not supported in v1.

Fix: At the login screen, click the gear icon ⚙️ and select "Ubuntu on Xorg" (or your distro's equivalent X11 session) before logging in.

To verify you're on X11:

echo $XDG_SESSION_TYPE

Should output x11. If it outputs wayland, log out and select Xorg session.


xdotool not working / click and type commands fail

Symptom: /click, /type, /key return ✓ but nothing happens on screen.

Fix: Ensure DISPLAY is set and xdotool can reach the display:

export DISPLAY=:0
xdotool getactivewindow

If getactivewindow returns a window ID, xdotool is working correctly. If it errors, your X11 session needs the local connection fix:

xhost +local:

OCR returns empty or garbled text

Symptom: /ocr returns no text or random characters.

Cause: Tesseract may not be installed, or the screen content is purely graphical.

Fix:

sudo apt install tesseract-ocr
tesseract --version

Note: OCR works best on text-heavy screens. Purely graphical content (icons, images) will return little or no text — this is expected behavior.


Clipboard commands not working (/paste returns empty)

Symptom: /paste returns empty or fails silently.

Fix: Ensure xclip is installed and DISPLAY is set:

sudo apt install xclip
export DISPLAY=:0
xclip -selection clipboard -o

If xclip errors with "Can't open display", run xhost +local: first.


AI agent not responding / "empty choices" error

Symptom: Natural language commands return AI call failed: empty choices or similar.

Cause: Your AI provider's API is unavailable or the API key has no credits.

Fix options:

  1. Switch to the built-in local provider (no API key needed): Edit ~/.nullhand/config.json and set "ai_provider": "local"
  2. Check your API key has credits at your provider's dashboard
  3. Try a different AI provider

Note: The local provider handles simple commands (open app, screenshot, status) but does not support vision or complex multi-step tasks.


OTP code not showing / bot not starting

Symptom: Bot starts but no OTP box appears, or bot exits immediately.

Fix: Check the terminal output for error messages. Common causes:

  • Missing dependencies → run the full apt install command from the Requirements section
  • Wrong display session → ensure you're on X11 not Wayland
  • Config file corrupted → delete ~/.nullhand/config.json and run setup again:
rm ~/.nullhand/config.json && ./nullhand

Running in a VirtualBox VM

Clipboard sharing between host and VM:

sudo apt install virtualbox-guest-x11
sudo reboot

Then in VirtualBox menu: Devices → Shared Clipboard → Bidirectional

No internet in VM:

  1. In VirtualBox Settings → Network → change to Bridged Adapter
  2. Select your active network adapter (WiFi or Ethernet) from the Name dropdown
  3. Start VM and run:
sudo systemctl restart NetworkManager
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
ping google.com

Slow performance: Allocate more resources in VirtualBox Settings:

  • RAM: 4096 MB recommended
  • CPUs: 2 minimum
  • Video Memory: 128 MB (Display settings)

Scheduled tasks not firing

Symptom: Scheduled tasks were set but never executed at the expected time.

Diagnosis:

  1. Confirm the tasks are loaded — /schedule list should show them.
  2. Verify the bot is actually running at the scheduled minute (it polls every 60s).
  3. Check ~/.nullhand/schedule.json exists and contains the expected entries.
  4. Tail ~/.nullhand/audit.log and look for scheduled_task entries firing at the right hour:minute.

Persistence note: tasks are saved to ~/.nullhand/schedule.json whenever you add, cancel, or clear them, and reloaded automatically at startup. If the file is corrupted or unreadable, the bot logs a warning and starts with an empty schedule rather than failing to boot.


General dependency check

Run this to verify all required tools are installed:

which git go xdotool scrot wmctrl xclip convert tesseract && echo "✅ All dependencies found"

If any are missing:

sudo apt install -y git golang xdotool scrot wmctrl xclip imagemagick python3-pyatspi at-spi2-core desktop-file-utils tesseract-ocr

Contributing

This is a Linux port of the original Nullhand by AzozzALFiras. Original repo: https://github.com/AzozzALFiras/Nullhand To contribute to this Linux port, fork https://github.com/AzozzALFiras/Nullhand and open a pull request.


License

See LICENSE in the repository root.

About

a self-hosted Linux + macOS agent that lets you remotely control your desktop via Telegram using natural language. Supports local AI for full privacy, bilingual parser (English + Arabic), smart multi-step recipes with state verification, and zero external dependencies.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages