A local-first AI agent powered by Ollama. Follows the "sniper agent" philosophy: lean context, single-purpose, no bloat. Runs fully offline on CPU hardware — no cloud APIs required.
- Fast/Think routing —
phi4-minifor speed,qwen3:4bfor deep reasoning - Agentic loop — calls tools, feeds results back, iterates until done
- Text-based tool calling — tools described in system prompt, parsed from XML response
- Persistent memory — saves and recalls facts across sessions
- Persona system — each persona has its own soul, memory, and session history
- Telegram bot — chat with Norman from your phone
- HTTP gateway — REST API for programmatic access
- Daemon mode — runs with a heartbeat and scheduled tasks
| Tool | What it does |
|---|---|
shell |
Run shell commands (30s timeout) |
read_file |
Read file contents |
write_file |
Write or create files |
list_directory |
List directory contents |
save_memory |
Save facts to persistent memory |
recall_memory |
Keyword search across memory |
web_fetch |
Fetch and strip a URL |
web_search |
Search DuckDuckGo, return top 5 results |
create_schedule |
Schedule interval or daily tasks |
list_schedules |
List all scheduled tasks |
cancel_schedule |
Cancel a scheduled task |
Requirements: Python 3.12+, Ollama installed and running
# 1. Pull the models
ollama pull phi4-mini
ollama pull qwen3:4b
# 2. Clone and install Norman
git clone https://github.com/YOUR_USERNAME/Norman.git
cd Norman
pip install -e .No API key needed for basic use. For Telegram bot mode, copy .env.example:
cp .env.example .envEdit .env (only needed for Telegram):
TELEGRAM_BOT_TOKEN=your_telegram_bot_token_here
TELEGRAM_OWNER_ID=your_telegram_user_id_here
# Interactive REPL
norman chat
# Single-shot prompt
norman run "summarize /var/log/syslog"
# Deep reasoning (slower, more thorough)
norman run "@think explain the tradeoffs of B-tree vs LSM-tree"
# Telegram bot
norman telegram
# HTTP gateway (port 18789)
norman serve
# Daemon with heartbeat triggers
norman daemonPrefix any prompt with @think or use reasoning keywords (analyze, compare, debug, explain why) to route to qwen3:4b with thinking enabled.
you> @think why is my memory usage growing over time?
Triggers → Context → LLM → Tools → loop back
| Zone | Module | Purpose |
|---|---|---|
| Triggers | triggers/ |
Heartbeat, cron, user input |
| Context | context/ |
SOUL.md + MEMORY.md + tool descriptions + history |
| Tools | tools/ |
Shell, files, memory, web, schedules |
| Loop | core/loop.py |
Call LLM → execute tools → feed back → repeat |
Key modules:
core/llm.py— Ollama client via native/api/chatstreaming endpointcore/router.py— Fast/think routing heuristicscore/loop.py— Agentic loop with 10-iteration safety brakecore/compactor.py— Summarizes old messages when context hits 50% of budgetcontext/builder.py— Assembles system prompt from SOUL.md + MEMORY.md + tool descriptionstools/registry.py— Tool registration, schema generation, text-based tool promptsconfig.py— All settings in oneNormanConfigdataclass
| Mode | Model | Speed | Triggers |
|---|---|---|---|
| Fast (default) | phi4-mini:latest |
~10s | Simple queries, tool calls, general chat |
| Think | qwen3:4b |
~60-150s | @think prefix, or keywords: analyze, compare, debug, explain why |
Norman uses text-based tool calling — tool descriptions are injected into the system prompt and the model responds with <tool_call> XML blocks. This is faster than Ollama's native tools API parameter, which triggers excessive reasoning on thinking models.
Each persona lives in personas/<name>/ with:
personas/default/
├── SOUL.md # System prompt / personality
├── MEMORY.md # Accumulated long-term memory
├── heartbeat.md # What to check on each heartbeat
├── sessions/ # JSONL chat history (gitignored)
├── schedules.json # Active scheduled tasks (gitignored)
└── memory/ # Dated memory snapshots (gitignored)
Switch personas with --persona:
norman chat --persona work# norman/tools/my_tool.py
class MyTool:
name = "my_tool"
description = "What it does"
parameters = {
"type": "object",
"properties": {"input": {"type": "string"}},
"required": ["input"]
}
def execute(self, input: str) -> str:
return f"result: {input}"Register it in cli.py:_build_agent():
from norman.tools.my_tool import MyTool
registry.register(MyTool())pip install -e ".[gateway]"
norman serve
# POST http://127.0.0.1:18789/run {"prompt": "..."}httpx— HTTP client (talks to Ollama native API directly)beautifulsoup4— HTML stripping for web toolsfastapi+uvicorn— gateway (optional)
MIT