Skip to content

JulCCrum/hermes-setup-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Hermes Agent Setup Guide

Your Personal AI Assistant on Mac with Telegram


What You're Building

A personal AI assistant that runs on your Mac and responds to your Telegram messages. It uses a local AI model for simple tasks and delegates complex work (coding, debugging, analysis) to Claude Code.

Architecture:

You (Telegram) --> Hermes Gateway --> Qwen 9B (local AI, fast)
                                  --> Claude Code (cloud AI, smart)

What you need:

  • Mac with Apple Silicon (M1/M2/M3/M4) and 16GB+ RAM
  • Telegram account (free)
  • Claude Code with Max plan (optional, for heavy-lifting delegation)

Step 1: Install Homebrew

Homebrew is a package manager for Mac. Open Terminal (search for it in Spotlight) and paste:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

It will ask for your Mac password. Type it (nothing will appear on screen, that's normal) and press Enter.

When it finishes, it may tell you to run two more commands. Run them. They look like:

echo >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

Close and reopen Terminal, then verify:

brew --version

You should see something like Homebrew 5.x.x.


Step 2: Install Build Tools

brew install cmake git

Step 3: Install Hermes Agent

# Clone the Hermes repository
git clone https://github.com/NousResearch/hermes-agent.git ~/.hermes/hermes-agent

# Run the setup script
cd ~/.hermes/hermes-agent
./setup-hermes.sh

The setup script will:

  • Install Python 3.11 via uv
  • Create a virtual environment
  • Install all dependencies
  • Add hermes to your PATH

When it asks "Would you like to run the setup wizard?", press N (we'll configure manually).

Reload your shell:

source ~/.zshrc

Verify:

hermes --version

Step 4: Download the AI Model

We need a local AI model. We'll download Qwen 3.5 9B (a good balance of speed and quality for 16GB Macs).

# Create model directory
mkdir -p ~/.hermes/models

# Download from Hugging Face (about 5.2 GB)
curl -L -o ~/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf \
  "https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf"

This will take a few minutes depending on your internet speed.


Step 5: Build llama.cpp with TurboQuant

TurboQuant is a compression technology from Google that makes AI models run faster and use less memory. We'll build a version of llama.cpp that supports it.

# Clone the TurboQuant fork
git clone https://github.com/TheTom/llama-cpp-turboquant.git ~/.hermes/llama-cpp-turboquant

# Switch to the TurboQuant branch
cd ~/.hermes/llama-cpp-turboquant
git checkout feature/turboquant-kv-cache

# Build with Apple Metal GPU support
cmake -B build -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -j$(sysctl -n hw.ncpu)

Verify it built:

ls ~/.hermes/llama-cpp-turboquant/build/bin/llama-server

You should see the file path printed (not "No such file").


Step 6: Create a Telegram Bot

  1. Open Telegram on your phone
  2. Search for @BotFather and start a chat
  3. Send /newbot
  4. Give it a name (e.g., "My Hermes Bot")
  5. Give it a username (e.g., my_hermes_bot) -- must end in bot
  6. Copy the bot token BotFather gives you (looks like 1234567890:AAH...)

Now get your Telegram user ID:

  1. Search for @userinfobot in Telegram
  2. Send /start
  3. Copy the numeric ID it gives you (looks like 5057031728)

Save both of these -- you'll need them in the next step.


Step 7: Configure Hermes

7a: Create the environment file

Create ~/.hermes/.env with your settings. Replace the placeholder values:

cat > ~/.hermes/.env << 'ENVEOF'
# LLM Configuration
LLM_MODEL=qwen/qwen3.5-9b
OPENAI_BASE_URL=http://localhost:1235/v1
OPENAI_API_KEY=local
HERMES_API_TIMEOUT=3600
HERMES_STREAM_READ_TIMEOUT=300
HERMES_MAX_ITERATIONS=60
TERMINAL_ENV=local

# Telegram (replace with YOUR values)
TELEGRAM_BOT_TOKEN=YOUR_BOT_TOKEN_HERE
TELEGRAM_ALLOWED_USERS=YOUR_USER_ID_HERE
ENVEOF

Now edit the file to add your actual Telegram values:

nano ~/.hermes/.env

Replace YOUR_BOT_TOKEN_HERE with your bot token and YOUR_USER_ID_HERE with your user ID. Save with Ctrl+O, Enter, Ctrl+X.

7b: Create the config file

cat > ~/.hermes/config.yaml << 'YAMLEOF'
model:
  default: qwen/qwen3.5-9b
  provider: custom
  base_url: http://localhost:1235/v1
  context_length: 65536
toolsets:
- hermes-cli
agent:
  max_turns: 60
  tool_use_enforcement: auto
  verbose: false
  reasoning_effort: medium
terminal:
  backend: local
  timeout: 180
  persistent_shell: true
  lifetime_seconds: 300
compression:
  enabled: true
  threshold: 0.8
  target_ratio: 0.2
  protect_last_n: 20
  summary_model: qwen/qwen3.5-9b
  summary_provider: custom
  summary_base_url: http://localhost:1235/v1
display:
  compact: false
  streaming: true
  show_cost: false
  tool_progress: all
memory:
  memory_enabled: true
  user_profile_enabled: true
delegation:
  model: ''
  provider: ''
  base_url: ''
  api_key: ''
  max_iterations: 50
  default_toolsets:
  - terminal
  - file
  - web
platform_toolsets:
  cli:
  - terminal
  - file
  - memory
  - web
  - todo
  - cronjob
  - clarify
  telegram:
  - terminal
  - memory
  - web
  - todo
  - cronjob
skills:
  creation_nudge_interval: 15
security:
  redact_secrets: true
YAMLEOF

Step 8: Set Up Claude Code Delegation (Optional)

This step requires a Claude Code Max plan. If you don't have one, skip to Step 9.

Claude Code is a powerful AI coding assistant. By configuring Hermes to delegate complex tasks to Claude Code, your local AI handles simple stuff and Claude handles the hard work.

8a: Install Claude Code

Visit https://claude.ai/code and follow the installation instructions for Mac.

Verify:

claude --version

8b: Create the SOUL.md file

This file tells Hermes when and how to delegate to Claude:

cat > ~/.hermes/SOUL.md << 'SOULEOF'
# Hermes Agent

You are a helpful AI assistant. Be concise and helpful.

## Delegation Rule (IMPORTANT)

You are a lightweight orchestrator. For ANY task that involves:
- Writing or editing code
- Debugging or fixing bugs
- Multi-step terminal work
- Complex reasoning or analysis
- Anything you're not 100% confident about

**You MUST delegate to Claude Code via the terminal tool.** Do not attempt these yourself. Run:

```bash
cd /path/to/relevant/project && claude -p "TASK DESCRIPTION HERE"

Key flags:

  • claude -p "prompt" -- non-interactive, returns output
  • claude -p --model sonnet "prompt" -- use Sonnet (faster)
  • claude -p --dangerously-skip-permissions "prompt" -- skip permission prompts

Always cd to the relevant project directory first. Include full context in the prompt.

What YOU handle directly (no delegation needed)

  • Simple conversation and questions
  • Sending messages
  • Reading/listing files
  • Quick terminal commands (ls, cat, grep, etc.)
  • Memory and note-taking
  • Cron jobs and scheduling
  • Web searches

When in doubt, delegate to Claude.

SOULEOF


---

## Step 9: Set Up Watchdog Monitoring

The watchdog automatically checks every 5 minutes that everything is running, and restarts services if they go down.

### 9a: Create the health check script

```bash
cat > ~/.hermes/check-health.sh << 'HEALTHEOF'
#!/bin/bash
echo "=== Health Check ==="
echo ""

echo -n "LLM Server: "
models=$(curl -s --connect-timeout 2 http://localhost:1235/v1/models 2>/dev/null)
if [ -n "$models" ]; then
    echo "OK"
else
    echo "DOWN"
fi

echo -n "Gateway:    "
cd ~/.hermes/hermes-agent
status=$(./venv/bin/hermes gateway status 2>&1)
if echo "$status" | grep -q "running"; then
    echo "OK"
else
    echo "DOWN"
fi

echo -n "LLM test:   "
response=$(curl -s --connect-timeout 5 --max-time 30 http://localhost:1235/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen/qwen3.5-9b","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}' 2>/dev/null)
if echo "$response" | grep -q "choices"; then
    echo "OK"
else
    echo "FAILED"
fi

echo ""
echo "=== Done ==="
HEALTHEOF
chmod +x ~/.hermes/check-health.sh

9b: Create the watchdog script

cat > ~/.hermes/watchdog.sh << 'WATCHEOF'
#!/bin/bash
LOG="$HOME/.hermes/logs/watchdog.log"
mkdir -p "$(dirname "$LOG")"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
log() { echo "$TIMESTAMP $1" >> "$LOG"; }
ISSUES=0

# Check LLM server
if ! curl -s --connect-timeout 3 http://localhost:1235/v1/models >/dev/null 2>&1; then
    log "CRITICAL: LLM server not responding"
    if pgrep -f "llama-server" >/dev/null 2>&1; then
        pkill -f "llama-server" 2>/dev/null
        sleep 2
    fi
    log "Starting llama-server with TurboQuant..."

    # Update the model path below if you saved the model somewhere else
    MODEL_PATH="$HOME/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf"
    # Fallback: check LM Studio's download location
    if [ ! -f "$MODEL_PATH" ]; then
        MODEL_PATH="$HOME/.lmstudio/models/lmstudio-community/Qwen3.5-9B-GGUF/Qwen3.5-9B-Q4_K_M.gguf"
    fi

    $HOME/.hermes/llama-cpp-turboquant/build/bin/llama-server \
        -m "$MODEL_PATH" \
        --cache-type-k turbo3 --cache-type-v turbo3 \
        --host 127.0.0.1 --port 1235 \
        --alias "qwen/qwen3.5-9b" \
        -ngl 99 -c 65536 \
        >> $HOME/.hermes/logs/llama-server.log 2>&1 &
    sleep 15
    if curl -s --connect-timeout 3 http://localhost:1235/v1/models >/dev/null 2>&1; then
        log "RECOVERED: llama-server restarted"
    else
        log "FAILED: llama-server did not recover"
    fi
    ISSUES=$((ISSUES + 1))
fi

# Check Gateway
cd $HOME/.hermes/hermes-agent
if ! ./venv/bin/hermes gateway status 2>&1 | grep -q "running"; then
    log "CRITICAL: Gateway is down — restarting"
    ./venv/bin/hermes gateway start 2>&1 >> "$LOG"
    sleep 5
    ISSUES=$((ISSUES + 1))
fi

# Trim log
if [ -f "$LOG" ] && [ $(wc -l < "$LOG") -gt 500 ]; then
    tail -500 "$LOG" > "$LOG.tmp" && mv "$LOG.tmp" "$LOG"
fi

if [ $ISSUES -eq 0 ]; then
    MINUTE=$(date '+%M')
    if [ "$((MINUTE % 50))" -lt 5 ]; then
        log "OK: All systems healthy"
    fi
fi
WATCHEOF
chmod +x ~/.hermes/watchdog.sh

9c: Install the watchdog as a scheduled task

cat > ~/Library/LaunchAgents/com.hermes.watchdog.plist << 'PLISTEOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.hermes.watchdog</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/bash</string>
        <string>HOMEDIR/.hermes/watchdog.sh</string>
    </array>
    <key>StartInterval</key>
    <integer>300</integer>
    <key>RunAtLoad</key>
    <true/>
    <key>StandardErrorPath</key>
    <string>HOMEDIR/.hermes/logs/watchdog-stderr.log</string>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:HOMEDIR/.local/bin</string>
    </dict>
</dict>
</plist>
PLISTEOF

# Replace HOMEDIR with your actual home directory
sed -i '' "s|HOMEDIR|$HOME|g" ~/Library/LaunchAgents/com.hermes.watchdog.plist

# Load the watchdog
launchctl load ~/Library/LaunchAgents/com.hermes.watchdog.plist

Step 10: Start Everything

10a: Start the LLM server

# Find your model (update path if needed)
MODEL_PATH="$HOME/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf"

~/.hermes/llama-cpp-turboquant/build/bin/llama-server \
  -m "$MODEL_PATH" \
  --cache-type-k turbo3 --cache-type-v turbo3 \
  --host 127.0.0.1 --port 1235 \
  --alias "qwen/qwen3.5-9b" \
  -ngl 99 -c 65536 \
  > ~/.hermes/logs/llama-server.log 2>&1 &

echo "LLM server starting... waiting 15 seconds"
sleep 15

Verify:

curl -s http://localhost:1235/v1/models

You should see JSON output with qwen/qwen3.5-9b.

10b: Start the Hermes gateway

cd ~/.hermes/hermes-agent
./venv/bin/hermes gateway start

10c: Pair with Telegram

  1. Open Telegram and message your bot
  2. Send /start
  3. The bot will give you a pairing code (e.g., ABC12345)
  4. In your terminal, run:
cd ~/.hermes/hermes-agent
./venv/bin/hermes pairing approve telegram YOUR_PAIRING_CODE
  1. Send another message to the bot -- it should respond!

10d: Run a health check

~/.hermes/check-health.sh

Everything should show OK.


Step 11: Start Hermes in the Terminal (Optional)

You can also chat with Hermes directly in your terminal:

hermes

Type a message and press Enter. Type /exit to quit.


Troubleshooting

"Context size exceeded"

The AI model's memory is full. Restart Hermes (Ctrl+C then hermes again) to start a fresh session.

"ReadTimeout" or "timed out"

The local model is taking too long to respond. This is normal for complex tasks. The timeout settings in .env should handle most cases. If persistent, try restarting the LLM server.

Telegram bot not responding

  1. Run ~/.hermes/check-health.sh to see what's down
  2. If Gateway is DOWN: cd ~/.hermes/hermes-agent && ./venv/bin/hermes gateway restart
  3. If LLM is DOWN: the watchdog should auto-restart it within 5 minutes

"model not found" error

Make sure the model name in config.yaml matches what the server reports:

curl -s http://localhost:1235/v1/models

How to restart everything

# Stop everything
pkill -f "llama-server"
cd ~/.hermes/hermes-agent && ./venv/bin/hermes gateway stop

# Start LLM server
~/.hermes/llama-cpp-turboquant/build/bin/llama-server \
  -m ~/.hermes/models/Qwen3.5-9B-Q4_K_M.gguf \
  --cache-type-k turbo3 --cache-type-v turbo3 \
  --host 127.0.0.1 --port 1235 \
  --alias "qwen/qwen3.5-9b" \
  -ngl 99 -c 65536 \
  > ~/.hermes/logs/llama-server.log 2>&1 &
sleep 15

# Start gateway
cd ~/.hermes/hermes-agent && ./venv/bin/hermes gateway start

How to check logs

# Watchdog log
cat ~/.hermes/logs/watchdog.log

# LLM server log
tail -50 ~/.hermes/logs/llama-server.log

# Hermes error log
tail -50 ~/.hermes/logs/errors.log

Summary

You now have:

  • Qwen 3.5 9B running locally with TurboQuant acceleration
  • Hermes Agent as your personal AI gateway
  • Telegram bot for messaging from anywhere
  • Claude Code delegation for complex tasks (if configured)
  • Watchdog monitoring that auto-restarts services

Your AI assistant is always running, reachable from your phone, and smart enough to know when to ask Claude for help.

About

Step-by-step guide to setting up Hermes Agent with a local AI model, Telegram bot, TurboQuant acceleration, and Claude Code delegation on Mac

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors