Homunculus for Claude Code

Stop tuning your AI. Let it tune itself.

You spend hours tweaking prompts, writing rules, researching new features, configuring tools. Homunculus flips this: define your goals, and the system evolves itself — skills, agents, hooks, scripts, everything — while you focus on actual work.

npx homunculus-code init

One command. Define your goals. Your assistant starts evolving.

Proof it works: One developer ran this system for 3 weeks. It auto-generated 179 behavioral patterns, routed them into 10 tested skills, created 3 specialized agents, 15 commands, and 24 automation scripts. The nightly agent made 155 autonomous commits — improving the system while the developer slept. See results →

Why

Every Claude Code power user hits the same wall:

Week 1:  "Claude is amazing!"
Week 2:  "Let me add some rules and hooks..."
Week 3:  "I need skills, agents, MCP servers, custom commands..."
Week 4:  "I spend more time configuring Claude than using it."

Sound familiar? Even OpenClaw — with 300K+ stars and self-extending capabilities — still needs you to decide what to improve, when to improve it, and whether the improvement worked. Edit SOUL.md, tweak AGENTS.md, break something, open another AI session to fix it. The AI can extend itself, but it can't set its own direction or validate its own quality.

Here's the difference:

Without Homunculus:                      With Homunculus:

  You notice tests keep failing          Goal tree detects test_pass_rate dropped
  → You research testing patterns        → Nightly agent auto-evolves tdd skill
  → You write a pre-commit skill         → Eval confirms 100% pass
  → You test it manually                 → Morning report: "Fixed. Here's what changed."
  → It breaks after a Claude update      → Next update? System re-evaluates and adapts
  → You fix it again...                  → You slept through all of this

The Core Idea: Goal Tree

Most AI tools optimize locally — "you did X, so I'll remember X." Homunculus optimizes globally — toward goals you define in a tree:

                    🎯 My AI Assistant
              ┌──────────┼──────────┐
              │          │          │
         Code Quality   Speed    Knowledge
          ┌────┴────┐    │     ┌────┴────┐
       Testing  Review  Tasks  Research  Memory

Each node defines why it exists, how to measure it, and what currently implements it:

# architecture.yaml — from the reference system
autonomous_action:
  purpose: "Act without waiting for human commands"
  goals:
    nightly_research:
      purpose: "Discover better approaches while developer sleeps"
      realized_by: heartbeat/heartbeat.sh        # a shell script
      health_check:
        command: "test $(find logs/ -mtime -1 | wc -l) -gt 0"
        expected: "nightly agent ran in last 24h"

    task_management:
      purpose: "Track and complete tasks autonomously"
      realized_by: quest-board/server.js          # a web app
      metrics:
        - name: forge_completion_rate
          healthy: "> 70%"

continuous_evolution:
  purpose: "Improve over time without human intervention"
  goals:
    pattern_extraction:
      purpose: "Learn from every session"
      realized_by: scripts/evaluate-session.js    # a Node script
    skill_aggregation:
      purpose: "Converge patterns into tested skills"
      realized_by: homunculus/evolved/skills/      # evolved artifacts

The realized_by field can point to anything:

Type	Example	When to use
Skill	`skills/tdd-workflow.md`	Behavioral knowledge
Agent	`agents/code-reviewer.md`	Specialized AI subagent
Hook	`hooks/pre-commit.sh`	Automated trigger
Script	`scripts/deploy.sh`	Shell automation
Cron	`launchd/nightly-check.plist`	Scheduled task
MCP	`mcp-servers/github`	External integration
Rule	`rules/security.md`	Claude Code behavioral rule
Command	`commands/quality-gate.md`	Slash command workflow

Goals are stable. Implementations evolve. The system automatically routes each behavior to its optimal mechanism — hook, rule, skill, script, agent, or system. Implementations get replaced and upgraded while goals stay intact.

How It Evolves

          You use Claude Code normally
                      │
           ┌──────────┼──────────┐
           │          │          │
       Observe    Health Check   Research
       (hooks)    (goal tree)    (nightly)
           │          │          │
           └──────────┼──────────┘
                      │
                      ▼
               ┌────────────┐
               │  Evolve    │   Goals stay the same.
               │  ────────  │   Implementations get better.
               │  Extract   │
               │  Converge  │
               │  Eval      │
               │  Replace   │
               └────────────┘

Three inputs, one engine:

Observe — hooks watch your tool usage, extract recurring patterns into "instincts"
Health check — the goal tree identifies which goals are unhealthy → focus there
Research — a nightly agent scans for better approaches and proposes experiments

The evolution engine then:

Extracts behavioral patterns → instincts (tagged with best mechanism + goal path)
Routes each instinct to its optimal mechanism → hook, rule, skill, script, agent, or system
For skills: eval → improve loop until 100% pass rate
Reviews all goals nightly → is the current mechanism still the best one?
Archives instincts once implemented → the mechanism is the source of truth now

Quick Start

1. Install

Run in your project directory (where your CLAUDE.md or .claude/ lives):

npx homunculus-code init

Homunculus — Self-evolving AI Assistant

✓ Created homunculus/ directory structure
✓ Added evolution rules
✓ Copied evolution scripts
✓ Added slash commands (/hm-goal, /hm-night, /hm-status)
✓ Configured observation hook

Done! Homunculus is installed.

Next steps:
  1. Run claude to open Claude Code
  2. Type /hm-goal to define your goals (AI-guided)
  3. Type /hm-night to run your first evolution cycle

2. Define Your Goals

Open Claude Code and type /hm-goal. Claude asks one question at a time, then generates your goal tree:

> /hm-goal

Claude: What kind of project is this?
You:    A personal AI assistant

Claude: What do you spend most time on that you wish was better?
You:    Keeping up with AI news, too much to follow

Claude: If your AI could fix one thing overnight, what would it be?
You:    Auto fetch new AI news and give me a morning report

Claude: Here's your goal tree:
        🎯 Personal AI Assistant
        ├── ai_intelligence — Stay ahead on AI developments
        │   ├── news_curation — Auto-fetch top AI news daily
        │   └── morning_briefing — Deliver a concise report every morning
        ├── self_improvement — Get smarter over time
        │   ├── skill_evolution — Learn new skills from observations
        │   └── quality — Ensure evolved skills actually work
        └── productivity — Save the user time
            └── automation — Automate repetitive workflows

        Does this look right?
You:    Great!

Claude: ✅ architecture.yaml created with 3 goals!

3. Run Your First Evolution Cycle

> /hm-night

Watch Claude check your goals, route instincts to the right mechanisms (hook/rule/skill/script/agent), evaluate and improve implementations, and generate a report — all in one command. This is what the nightly agent does autonomously while you sleep.

4. Keep Using Claude Code

The observation hook watches your usage automatically. As patterns emerge, instincts are extracted and routed to the right mechanism:

/hm-goal        Define or refine your goals
/hm-night       Run a full evolution cycle (can run manually, but best set up as nightly agent)
/hm-status      Check evolution progress

/hm-night performs the complete evolution pipeline: routes instincts to the best mechanism (hook/rule/skill/script/agent), runs eval + improve on skills, reviews goal health, and generates a report. You can run it manually anytime, but the real power is letting it run autonomously every night.

The first time you run /hm-night, it will ask if you want to set up the nightly agent for automatic overnight evolution.

Key Concepts

Goal Tree (`architecture.yaml`)

The central nervous system. Every goal has a purpose, metrics, and health checks. The evolution system reads this to decide what to improve and how to measure success.

Instincts

Small behavioral patterns auto-extracted from your usage. Each instinct is tagged with:

Confidence score — grows with reinforcement, decays over time (half-life: 90 days)
Suggested mechanism — which implementation type fits best (hook/rule/skill/script/...)
Goal path — which goal in the tree this serves

Think of instincts as raw material. They get routed to the right mechanism and archived once implemented.

Implementation Routing

The system chooses the best mechanism for each behavior:

Behavior type	Mechanism
Deterministic, every time	Hook (zero AI judgment)
Tied to specific files/dirs	Rule (path-scoped)
Reusable knowledge collection	Skill (with eval spec)
Periodic automation	Script + scheduler
External service connection	MCP
Needs isolated context	Agent

Skills & Eval Specs

When multiple instincts cover the same area and routing says "skill," they converge into a tested, versioned knowledge module. Every skill has an eval spec with scenario tests. Skills that fail eval get automatically improved.

Replaceable Implementations

The core principle: same goal, evolving implementation.

Goal: "Catch bugs before merge"

  v1: instinct → "remember to run tests"
  v2: rule     → .claude/rules/testing.md (path-scoped guidance)
  v3: skill    → tdd-workflow.md (with eval spec)
  v4: hook     → pre-commit.sh (deterministic, automated)

The system reviews these nightly — if a skill should be a hook, it suggests the upgrade.

Nightly Agent

This is what makes the system truly autonomous. The nightly agent runs /hm-night automatically while you sleep — routing instincts to the right mechanism, evaluating skills, reviewing goal health, and researching better approaches.

Setup: The first time you run /hm-night, it asks if you want to enable automatic nightly runs. Say yes, and it configures a scheduler (launchd on macOS, cron on Linux) to run every night.

You can also run /hm-night manually anytime to trigger a cycle on demand.

 You go to sleep
        │
        ▼
 ┌─────────────────────────────────────────────┐
 │  Nightly Agent (phase pipeline)             │
 │                                             │
 │  P0: Assigned tasks                         │
 │                                             │
 │  P1: Evolution cycle                        │
 │    Route instincts → 8 mechanisms           │
 │    Eval + improve all implementations       │
 │    Review: best mechanism per goal?         │
 │                                             │
 │  P2: Research (cross-night dedup)           │
 │                                             │
 │  P3: Experiments (hypothesis → verify)      │
 │                                             │
 │  P4: Sync (CLAUDE.md / architecture.yaml)   │
 │                                             │
 │  Bonus: Extra rounds if budget allows       │
 └─────────────────────────────────────────────┘
        │
        ▼
 You wake up to a smarter assistant + a report

Here's what a real morning report looks like:

## Morning Report — 2026-03-22

### What Changed Overnight
- Improved skill: claude-code-reference v4.6 → v4.8
  (added coverage for CC v2.1.77-80 features)
- Archived 3 outdated instincts (covered by evolved skills)
- New experiment passed: eval noise threshold set to 5pp

### Goal Health
- continuous_evolution:  ✅ healthy (10 skills, all 100% eval)
- code_quality:          ✅ healthy (144/144 tests passing)
- resource_awareness:    ⚠️ attention (context usage trending up)
  → Queued experiment: split large skill into chapters

### Research Findings
- Claude Code v2.1.81: new --bare flag could speed up headless mode
  → Experiment queued for tomorrow night
- New pattern detected in community: writer/reviewer agent separation
  → Instinct created, will converge if reinforced

### Suggested Actions (for you)
- Review 2 knowledge card candidates from overnight research
- Approve experiment: context reduction via skill splitting

In our reference system, the nightly agent produced 155 autonomous commits — routing instincts to the right mechanisms, evolving skills, running experiments, researching better approaches, and archiving outdated patterns. All without any human input.

The nightly agent is what turns Homunculus from "a tool you use" into "a system that grows on its own."

See docs/nightly-agent.md for setup.

Real-World Results

Built and tested on a real personal AI assistant. In 3 weeks (starting from zero):

What evolved	Count	Details
Goal tree	10 goals / 46+ sub-goals	Each with health checks and metrics
Instincts	179	24 active + 155 auto-archived (system prunes itself)
Skills	10	All 100% eval pass rate (135 test scenarios)
Experiments	15	Structured A/B tests with pass/fail tracking
Subagents	3	Auto-extracted from repetitive main-thread patterns
Scheduled agents	5	Nightly heartbeat, Discord bridge, daily news, trading × 2
Hooks	11	Observation, compaction, quality gates
Scripts	24	Session lifecycle, health checks, evolution reports
Slash commands	15	Workflow automations (forge-dev, quality-gate, eval...)
Rules	6	Core patterns, evolution system, knowledge management
ADRs	8	Architecture decision records
Total commits	1,335	Mostly automated by nightly agent

The nightly agent alone: 155 autonomous commits.

The system even evolved its own task management board:


Jarvis Dashboard	Quest Board

See the full reference implementation →

What Makes This Different

	Homunculus	OpenClaw	Cursor Rules	Claude Memory
Goal-driven	Goal tree with metrics + health checks	No	No	No
Learns from usage	Auto-observation → instincts → 8 mechanisms	Self-extending	Manual	Auto-memory
Quality control	Eval specs + scenario tests	None	None	None
Autonomous overnight	Nightly agent: eval + improve + research + experiment	No	No	No
Self-improving	Eval → improve → replace loop	Partial	No	No
Meta-evolution	Evolution mechanism evolves itself	No	No	No
Implementation agnostic	Skills, agents, hooks, scripts, MCP, cron...	Skills only	Rules only	Memory only

OpenClaw is great at self-extending. Homunculus goes further: it decides what to improve based on goal health, validates improvements with evals, and does it all autonomously overnight. They solve different problems. OpenClaw is a power tool. Homunculus is an operating system for evolution.

What Gets Generated

After npx homunculus-code init:

your-project/
├── architecture.yaml           # Your goal tree (the brain)
├── homunculus/
│   ├── instincts/
│   │   ├── personal/           # Auto-extracted patterns
│   │   └── archived/           # Auto-pruned old patterns
│   ├── evolved/
│   │   ├── skills/             # Converged, tested knowledge
│   │   ├── agents/             # Specialized subagents
│   │   └── evals/              # Skill evaluation specs
│   └── experiments/            # A/B test tracking
├── .claude/
│   ├── rules/
│   │   └── evolution-system.md # How Claude should evolve
│   └── commands/
│       ├── hm-goal.md          # /hm-goal — define or view goals
│       ├── hm-night.md         # /hm-night — run evolution cycle
│       ├── hm-status.md        # /hm-status — evolution dashboard
│       ├── eval-skill.md       # /eval-skill
│       ├── improve-skill.md    # /improve-skill
│       └── evolve.md           # /evolve
└── scripts/
    ├── observe.sh              # Observation hook
    ├── evaluate-session.js     # Pattern extraction
    └── prune-instincts.js      # Automatic cleanup

Advanced: Meta-Evolution

The evolution mechanism itself evolves:

Instinct survival rate too low? → Automatically raise extraction thresholds
Eval discrimination too low? → Add harder boundary scenarios
Skill convergence too slow? → Adjust aggregation triggers
Mechanism coverage low? → Flag goals that only rely on prompts for upgrade
Dispatch compliance off? → Review if agent dispatches follow the token decision tree

Tracked via five metrics:

instinct_survival_rate — % of instincts that survive 14 days
skill_convergence — time from first instinct to evolved skill
eval_discrimination — % of eval scenarios that actually distinguish between versions
mechanism_coverage — % of goals with non-prompt implementations
compliance_rate — % of agent dispatches at appropriate context pressure

Requirements

Claude Code v2.1.70+
Node.js 18+
macOS or Linux

FAQ

Is this only for Claude Code?

The concepts (goal tree, eval-driven evolution, replaceable implementations) are tool-agnostic. The current implementation targets Claude Code hooks and commands, but the core pipeline could be adapted to other AI harnesses.

Does it cost extra API credits?

The observation hook is lightweight (no API calls). Instinct extraction uses a short Claude call per session (~$0.01). The nightly agent is optional and budget-configurable.

Can I use my existing CLAUDE.md and rules?

Yes. npx homunculus-code init adds to your project without overwriting existing files. Your current setup becomes the starting point for evolution.

How is this different from Claude Code's built-in memory?

Claude's memory records facts. Homunculus evolves behavior — tested skills, automated hooks, specialized agents — all driven by goals you define, with quality gates that prevent regression.

How does this compare to OpenClaw?

OpenClaw is excellent at self-extending. Homunculus solves a different problem: autonomous, goal-directed evolution. It decides what needs improving (via goal health), validates improvements (via evals), and does the work overnight (via the nightly agent). You could use both: OpenClaw for on-demand capability extension, Homunculus for the autonomous evolution layer on top.

I see hook errors on startup (SessionStart / Stop)

Those are from your own user-level hooks (~/.claude/settings.json), not from Homunculus. If your hooks use relative paths like node scripts/foo.js, they'll fail in projects that don't have those scripts. Fix by adding guards:

"command": "test -f scripts/foo.js && node scripts/foo.js || true"

Homunculus only adds hooks to the project-level .claude/settings.json.

Blog Posts

Stop Tuning Your AI. Let It Tune Itself. — The story behind Homunculus and why goal trees beat manual configuration.

Philosophy

"Your AI assistant should be a seed, not a statue."

Stop spending your evenings tuning AI. Plant a seed, define your goals, and let it grow. The more you use it, the better it gets — and it tells you exactly how and why through eval scores, goal health checks, and morning reports.

Acknowledgments

Homunculus builds on ideas from several projects and research:

everything-claude-code — Continuous Learning pattern and Skill Creator's eval → improve loop. Homunculus adopted and extended these into a goal-tree-driven, autonomous evolution system.
OpenClaw — Demonstrated that AI assistants can extend their own capabilities. Homunculus adds goal direction, eval quality gates, and autonomous overnight operation.
Karpathy's Autoresearch — Proved AI can run autonomous experiment loops (118 iterations, 12+ hours). Inspired the nightly agent's research cycle.
Anthropic's Eval Research — Eval methodology, noise tolerance (±6pp), and pass@k / pass^k metrics.

Contributing

See CONTRIBUTING.md.

License

MIT

Built by Javan and his self-evolving AI assistant.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
bin		bin
commands		commands
core		core
docs		docs
examples/reference		examples/reference
templates		templates
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homunculus for Claude Code

Why

The Core Idea: Goal Tree

How It Evolves

Quick Start

1. Install

2. Define Your Goals

3. Run Your First Evolution Cycle

4. Keep Using Claude Code

Key Concepts

Goal Tree (`architecture.yaml`)

Instincts

Implementation Routing

Skills & Eval Specs

Replaceable Implementations

Nightly Agent

Real-World Results

What Makes This Different

What Gets Generated

Advanced: Meta-Evolution

Requirements

FAQ

Blog Posts

Philosophy

Acknowledgments

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

Homunculus for Claude Code

Why

The Core Idea: Goal Tree

How It Evolves

Quick Start

1. Install

2. Define Your Goals

3. Run Your First Evolution Cycle

4. Keep Using Claude Code

Key Concepts

Goal Tree (architecture.yaml)

Instincts

Implementation Routing

Skills & Eval Specs

Replaceable Implementations

Nightly Agent

Real-World Results

What Makes This Different

What Gets Generated

Advanced: Meta-Evolution

Requirements

FAQ

Blog Posts

Philosophy

Acknowledgments

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 0

Languages

Goal Tree (`architecture.yaml`)

Packages

Contributors