A personal AI assistant that runs on your Mac and responds to your Telegram messages. It uses a local AI model for simple tasks and delegates complex work (coding, debugging, analysis) to Claude Code.
Architecture:
You (Telegram) --> Hermes Gateway --> Qwen 9B (local AI, fast)
--> Claude Code (cloud AI, smart)
What you need:
- Mac with Apple Silicon (M1/M2/M3/M4) and 16GB+ RAM
- Telegram account (free)
- Claude Code with Max plan (optional, for heavy-lifting delegation)
Homebrew is a package manager for Mac. Open Terminal (search for it in Spotlight) and paste:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"It will ask for your Mac password. Type it (nothing will appear on screen, that's normal) and press Enter.
When it finishes, it may tell you to run two more commands. Run them. They look like:
echo >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"Close and reopen Terminal, then verify:
brew --versionYou should see something like Homebrew 5.x.x.
brew install cmake git# Clone the Hermes repository
git clone https://github.com/NousResearch/hermes-agent.git ~/.hermes/hermes-agent
# Run the setup script
cd ~/.hermes/hermes-agent
./setup-hermes.shThe setup script will:
- Install Python 3.11 via
uv - Create a virtual environment
- Install all dependencies
- Add
hermesto your PATH
When it asks "Would you like to run the setup wizard?", press N (we'll configure manually).
Reload your shell:
source ~/.zshrcVerify:
hermes --versionWe need a local AI model. We'll download Qwen 3.5 9B (a good balance of speed and quality for 16GB Macs).
# Create model directory
mkdir -p ~/.hermes/models
# Download from Hugging Face (about 5.2 GB)
curl -L -o ~/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf \
"https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf"This will take a few minutes depending on your internet speed.
TurboQuant is a compression technology from Google that makes AI models run faster and use less memory. We'll build a version of llama.cpp that supports it.
# Clone the TurboQuant fork
git clone https://github.com/TheTom/llama-cpp-turboquant.git ~/.hermes/llama-cpp-turboquant
# Switch to the TurboQuant branch
cd ~/.hermes/llama-cpp-turboquant
git checkout feature/turboquant-kv-cache
# Build with Apple Metal GPU support
cmake -B build -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -j$(sysctl -n hw.ncpu)Verify it built:
ls ~/.hermes/llama-cpp-turboquant/build/bin/llama-serverYou should see the file path printed (not "No such file").
- Open Telegram on your phone
- Search for @BotFather and start a chat
- Send
/newbot - Give it a name (e.g., "My Hermes Bot")
- Give it a username (e.g.,
my_hermes_bot) -- must end inbot - Copy the bot token BotFather gives you (looks like
1234567890:AAH...)
Now get your Telegram user ID:
- Search for @userinfobot in Telegram
- Send
/start - Copy the numeric ID it gives you (looks like
5057031728)
Save both of these -- you'll need them in the next step.
Create ~/.hermes/.env with your settings. Replace the placeholder values:
cat > ~/.hermes/.env << 'ENVEOF'
# LLM Configuration
LLM_MODEL=qwen/qwen3.5-9b
OPENAI_BASE_URL=http://localhost:1235/v1
OPENAI_API_KEY=local
HERMES_API_TIMEOUT=3600
HERMES_STREAM_READ_TIMEOUT=300
HERMES_MAX_ITERATIONS=60
TERMINAL_ENV=local
# Telegram (replace with YOUR values)
TELEGRAM_BOT_TOKEN=YOUR_BOT_TOKEN_HERE
TELEGRAM_ALLOWED_USERS=YOUR_USER_ID_HERE
ENVEOFNow edit the file to add your actual Telegram values:
nano ~/.hermes/.envReplace YOUR_BOT_TOKEN_HERE with your bot token and YOUR_USER_ID_HERE with your user ID. Save with Ctrl+O, Enter, Ctrl+X.
cat > ~/.hermes/config.yaml << 'YAMLEOF'
model:
default: qwen/qwen3.5-9b
provider: custom
base_url: http://localhost:1235/v1
context_length: 65536
toolsets:
- hermes-cli
agent:
max_turns: 60
tool_use_enforcement: auto
verbose: false
reasoning_effort: medium
terminal:
backend: local
timeout: 180
persistent_shell: true
lifetime_seconds: 300
compression:
enabled: true
threshold: 0.8
target_ratio: 0.2
protect_last_n: 20
summary_model: qwen/qwen3.5-9b
summary_provider: custom
summary_base_url: http://localhost:1235/v1
display:
compact: false
streaming: true
show_cost: false
tool_progress: all
memory:
memory_enabled: true
user_profile_enabled: true
delegation:
model: ''
provider: ''
base_url: ''
api_key: ''
max_iterations: 50
default_toolsets:
- terminal
- file
- web
platform_toolsets:
cli:
- terminal
- file
- memory
- web
- todo
- cronjob
- clarify
telegram:
- terminal
- memory
- web
- todo
- cronjob
skills:
creation_nudge_interval: 15
security:
redact_secrets: true
YAMLEOFThis step requires a Claude Code Max plan. If you don't have one, skip to Step 9.
Claude Code is a powerful AI coding assistant. By configuring Hermes to delegate complex tasks to Claude Code, your local AI handles simple stuff and Claude handles the hard work.
Visit https://claude.ai/code and follow the installation instructions for Mac.
Verify:
claude --versionThis file tells Hermes when and how to delegate to Claude:
cat > ~/.hermes/SOUL.md << 'SOULEOF'
# Hermes Agent
You are a helpful AI assistant. Be concise and helpful.
## Delegation Rule (IMPORTANT)
You are a lightweight orchestrator. For ANY task that involves:
- Writing or editing code
- Debugging or fixing bugs
- Multi-step terminal work
- Complex reasoning or analysis
- Anything you're not 100% confident about
**You MUST delegate to Claude Code via the terminal tool.** Do not attempt these yourself. Run:
```bash
cd /path/to/relevant/project && claude -p "TASK DESCRIPTION HERE"Key flags:
claude -p "prompt"-- non-interactive, returns outputclaude -p --model sonnet "prompt"-- use Sonnet (faster)claude -p --dangerously-skip-permissions "prompt"-- skip permission prompts
Always cd to the relevant project directory first. Include full context in the prompt.
- Simple conversation and questions
- Sending messages
- Reading/listing files
- Quick terminal commands (ls, cat, grep, etc.)
- Memory and note-taking
- Cron jobs and scheduling
- Web searches
SOULEOF
---
## Step 9: Set Up Watchdog Monitoring
The watchdog automatically checks every 5 minutes that everything is running, and restarts services if they go down.
### 9a: Create the health check script
```bash
cat > ~/.hermes/check-health.sh << 'HEALTHEOF'
#!/bin/bash
echo "=== Health Check ==="
echo ""
echo -n "LLM Server: "
models=$(curl -s --connect-timeout 2 http://localhost:1235/v1/models 2>/dev/null)
if [ -n "$models" ]; then
echo "OK"
else
echo "DOWN"
fi
echo -n "Gateway: "
cd ~/.hermes/hermes-agent
status=$(./venv/bin/hermes gateway status 2>&1)
if echo "$status" | grep -q "running"; then
echo "OK"
else
echo "DOWN"
fi
echo -n "LLM test: "
response=$(curl -s --connect-timeout 5 --max-time 30 http://localhost:1235/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen/qwen3.5-9b","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}' 2>/dev/null)
if echo "$response" | grep -q "choices"; then
echo "OK"
else
echo "FAILED"
fi
echo ""
echo "=== Done ==="
HEALTHEOF
chmod +x ~/.hermes/check-health.sh
cat > ~/.hermes/watchdog.sh << 'WATCHEOF'
#!/bin/bash
LOG="$HOME/.hermes/logs/watchdog.log"
mkdir -p "$(dirname "$LOG")"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
log() { echo "$TIMESTAMP $1" >> "$LOG"; }
ISSUES=0
# Check LLM server
if ! curl -s --connect-timeout 3 http://localhost:1235/v1/models >/dev/null 2>&1; then
log "CRITICAL: LLM server not responding"
if pgrep -f "llama-server" >/dev/null 2>&1; then
pkill -f "llama-server" 2>/dev/null
sleep 2
fi
log "Starting llama-server with TurboQuant..."
# Update the model path below if you saved the model somewhere else
MODEL_PATH="$HOME/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf"
# Fallback: check LM Studio's download location
if [ ! -f "$MODEL_PATH" ]; then
MODEL_PATH="$HOME/.lmstudio/models/lmstudio-community/Qwen3.5-9B-GGUF/Qwen3.5-9B-Q4_K_M.gguf"
fi
$HOME/.hermes/llama-cpp-turboquant/build/bin/llama-server \
-m "$MODEL_PATH" \
--cache-type-k turbo3 --cache-type-v turbo3 \
--host 127.0.0.1 --port 1235 \
--alias "qwen/qwen3.5-9b" \
-ngl 99 -c 65536 \
>> $HOME/.hermes/logs/llama-server.log 2>&1 &
sleep 15
if curl -s --connect-timeout 3 http://localhost:1235/v1/models >/dev/null 2>&1; then
log "RECOVERED: llama-server restarted"
else
log "FAILED: llama-server did not recover"
fi
ISSUES=$((ISSUES + 1))
fi
# Check Gateway
cd $HOME/.hermes/hermes-agent
if ! ./venv/bin/hermes gateway status 2>&1 | grep -q "running"; then
log "CRITICAL: Gateway is down — restarting"
./venv/bin/hermes gateway start 2>&1 >> "$LOG"
sleep 5
ISSUES=$((ISSUES + 1))
fi
# Trim log
if [ -f "$LOG" ] && [ $(wc -l < "$LOG") -gt 500 ]; then
tail -500 "$LOG" > "$LOG.tmp" && mv "$LOG.tmp" "$LOG"
fi
if [ $ISSUES -eq 0 ]; then
MINUTE=$(date '+%M')
if [ "$((MINUTE % 50))" -lt 5 ]; then
log "OK: All systems healthy"
fi
fi
WATCHEOF
chmod +x ~/.hermes/watchdog.shcat > ~/Library/LaunchAgents/com.hermes.watchdog.plist << 'PLISTEOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.hermes.watchdog</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>HOMEDIR/.hermes/watchdog.sh</string>
</array>
<key>StartInterval</key>
<integer>300</integer>
<key>RunAtLoad</key>
<true/>
<key>StandardErrorPath</key>
<string>HOMEDIR/.hermes/logs/watchdog-stderr.log</string>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:HOMEDIR/.local/bin</string>
</dict>
</dict>
</plist>
PLISTEOF
# Replace HOMEDIR with your actual home directory
sed -i '' "s|HOMEDIR|$HOME|g" ~/Library/LaunchAgents/com.hermes.watchdog.plist
# Load the watchdog
launchctl load ~/Library/LaunchAgents/com.hermes.watchdog.plist# Find your model (update path if needed)
MODEL_PATH="$HOME/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf"
~/.hermes/llama-cpp-turboquant/build/bin/llama-server \
-m "$MODEL_PATH" \
--cache-type-k turbo3 --cache-type-v turbo3 \
--host 127.0.0.1 --port 1235 \
--alias "qwen/qwen3.5-9b" \
-ngl 99 -c 65536 \
> ~/.hermes/logs/llama-server.log 2>&1 &
echo "LLM server starting... waiting 15 seconds"
sleep 15Verify:
curl -s http://localhost:1235/v1/modelsYou should see JSON output with qwen/qwen3.5-9b.
cd ~/.hermes/hermes-agent
./venv/bin/hermes gateway start- Open Telegram and message your bot
- Send
/start - The bot will give you a pairing code (e.g.,
ABC12345) - In your terminal, run:
cd ~/.hermes/hermes-agent
./venv/bin/hermes pairing approve telegram YOUR_PAIRING_CODE- Send another message to the bot -- it should respond!
~/.hermes/check-health.shEverything should show OK.
You can also chat with Hermes directly in your terminal:
hermesType a message and press Enter. Type /exit to quit.
The AI model's memory is full. Restart Hermes (Ctrl+C then hermes again) to start a fresh session.
The local model is taking too long to respond. This is normal for complex tasks. The timeout settings in .env should handle most cases. If persistent, try restarting the LLM server.
- Run
~/.hermes/check-health.shto see what's down - If Gateway is DOWN:
cd ~/.hermes/hermes-agent && ./venv/bin/hermes gateway restart - If LLM is DOWN: the watchdog should auto-restart it within 5 minutes
Make sure the model name in config.yaml matches what the server reports:
curl -s http://localhost:1235/v1/models# Stop everything
pkill -f "llama-server"
cd ~/.hermes/hermes-agent && ./venv/bin/hermes gateway stop
# Start LLM server
~/.hermes/llama-cpp-turboquant/build/bin/llama-server \
-m ~/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf \
--cache-type-k turbo3 --cache-type-v turbo3 \
--host 127.0.0.1 --port 1235 \
--alias "qwen/qwen3.5-9b" \
-ngl 99 -c 65536 \
> ~/.hermes/logs/llama-server.log 2>&1 &
sleep 15
# Start gateway
cd ~/.hermes/hermes-agent && ./venv/bin/hermes gateway start# Watchdog log
cat ~/.hermes/logs/watchdog.log
# LLM server log
tail -50 ~/.hermes/logs/llama-server.log
# Hermes error log
tail -50 ~/.hermes/logs/errors.logYou now have:
- Qwen 3.5 9B running locally with TurboQuant acceleration
- Hermes Agent as your personal AI gateway
- Telegram bot for messaging from anywhere
- Claude Code delegation for complex tasks (if configured)
- Watchdog monitoring that auto-restarts services
Your AI assistant is always running, reachable from your phone, and smart enough to know when to ask Claude for help.