From d5575aaad366a095bb56a6f7f06b78e46073297e Mon Sep 17 00:00:00 2001
From: No9 Labs <no9iolabs@gmail.com>
Date: Tue, 7 Apr 2026 09:56:55 -0400
Subject: [PATCH] docs: comprehensive README update to reflect current repo
 structure

- Update role descriptions: 5 -> 6 operators (add security-check preflight)
- Fix merged PR count: 30+ -> 155+
- Fix module count: 34 -> 23 product + 7 framework
- Add new stats: daemon sessions (100+), documented learnings (90+)
- Update make targets to match actual Makefile (remove non-existent ones)
- Add verify-cycle CLI command to examples
- Update config to match .nightshift.json.example (8 blocked paths, 6 globs)
- Add role selection section explaining signal-driven operator scoring
- Expand Recursive/ tree with skills/, annotations, and descriptions
- Update guard rails with specifics (category weights, circuit breaker, etc.)
- Add env var descriptions and two new vars (FORCE_ROLE, CHECKPOINTS)
- Expand "What ships today" with complete self-maintaining feature list
- Update "Current frontier" with current shipped items and open queue
---
 README.md | 266 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 155 insertions(+), 111 deletions(-)

diff --git a/README.md b/README.md
index bf63ad3..bf41ce2 100644
--- a/README.md
+++ b/README.md
@@ -21,13 +21,14 @@
 
 ## This repo maintains itself
 
-Most of the code in this repository was written, tested, reviewed, and merged by AI agents. One unified daemon (`Recursive/engine/daemon.sh`) auto-selects from five roles each cycle via `Recursive/engine/pick-role.py`:
+Most of the code in this repository was written, tested, reviewed, and merged by AI agents. One unified daemon (`Recursive/engine/daemon.sh`) auto-selects from six operators each cycle via `Recursive/engine/pick-role.py`:
 
-- **Builder**: reads the task queue, runs a pentest preflight, builds or fixes one scoped task, tests it, opens a PR, reviews it, and merges it
-- **Reviewer**: audits shipped code and fixes quality gaps
-- **Overseer**: audits process drift, task hygiene, and systemic issues
-- **Strategist**: produces a top-down health report for humans
-- **Achiever**: measures autonomy score (0-100), eliminates human dependencies
+- **Builder**: reads the task queue, builds or fixes one scoped task, tests it, opens a PR, reviews it via sub-agents, and merges it
+- **Reviewer**: picks one file, deep-reviews it against a checklist, fixes every issue found, and logs the review
+- **Overseer**: triages the task queue, closes duplicates and obsolete work, updates stale metadata
+- **Strategist**: gathers evidence across sessions, evaluations, and costs, then produces a top-down health report with auto-created follow-up tasks
+- **Achiever**: measures autonomy score (0-100) across a 20-check scorecard, identifies the highest-impact human dependency, and eliminates it
+- **Security checker**: red-team preflight that runs before each build -- scans for fragile paths, subprocess injection, credential leaks, and outputs a severity-classified pentest report
 
 The human role is operational: start the daemon and monitor it. The agents own the engineering loop -- including deciding what to work on.
 
@@ -53,10 +54,13 @@ numbers change.
 | Loop 1 hardening | 99% | `.recursive/vision-tracker/TRACKER.md` |
 | Loop 2 feature builder | 100% | `.recursive/vision-tracker/TRACKER.md` |
 | Self-maintaining repo | 68% | `.recursive/vision-tracker/TRACKER.md` |
-| Meta-prompt system | 78% | `.recursive/vision-tracker/TRACKER.md` |
-| Tests | 847 passing | `python3 -m pytest nightshift/tests/ -q` |
-| Python modules | 34 | `.recursive/architecture/MODULE_MAP.md` |
-| Merged PRs | 30+ | `gh pr list --state merged --json number` |
+| Meta-prompt system | 79% | `.recursive/vision-tracker/TRACKER.md` |
+| Tests | 847 passing | `python3 -m pytest nightshift/tests/ Recursive/tests/ -q` |
+| Python modules (product) | 23 | `.recursive/architecture/MODULE_MAP.md` |
+| Python modules (framework) | 7 | `Recursive/lib/` + `Recursive/engine/` |
+| Merged PRs | 155+ | `gh pr list --state merged --json number` |
+| Daemon sessions | 100+ | `.recursive/sessions/index.md` |
+| Documented learnings | 90+ | `.recursive/learnings/INDEX.md` |
 
 ---
 
@@ -78,13 +82,16 @@ and persist build state for resume/status flows.
 
 The self-maintaining layer around those loops already ships:
 
-- task queue sync and prioritization
-- structured handoffs and learnings
-- per-version changelogs
-- a generated vision tracker and module map
-- cross-session cost analysis
-- a builder pentest preflight before code changes
-- branch/PR/review/merge automation
+- signal-driven role selection (build/review/oversee/strategize/achieve)
+- security-check preflight before every build cycle
+- task queue sync from GitHub Issues and internal prioritization
+- structured handoffs, 90+ documented learnings, and cross-session memory
+- per-version changelogs and auto-generated vision tracker
+- architecture module map generation
+- cross-session cost analysis and budget enforcement
+- 5-agent sub-agent review pipeline (code, architecture, docs, safety, meta)
+- branch/PR/review/merge automation with prompt-integrity guard
+- self-evaluation against [Phractal](https://github.com/fazxes/Phractal) with 10-dimension scoring
 
 ## Install
 
@@ -115,10 +122,10 @@ Runtime/Nightshift/*.state.json
 EOF
 ```
 
-Optional per-repo config:
+Optional per-repo config (copy and edit):
 
 ```bash
-cp .recursive.json.example .recursive.json
+cp .nightshift.json.example .nightshift.json
 ```
 
 ## Running Nightshift
@@ -128,15 +135,16 @@ cp .recursive.json.example .recursive.json
 Use the Python module entry point that the codebase actually ships:
 
 ```bash
-python3 -m nightshift run --agent claude
-python3 -m nightshift test --agent claude --cycles 2 --cycle-minutes 5
-python3 -m nightshift summarize
-python3 -m nightshift plan "Add OAuth login"
-python3 -m nightshift build "Add OAuth login" --yes
-python3 -m nightshift build --status
-python3 -m nightshift build --resume
-python3 -m nightshift multi /repo1 /repo2 --agent claude --test --cycles 1
-python3 -m nightshift module-map --write
+python3 -m nightshift run --agent claude              # full overnight shift
+python3 -m nightshift test --agent claude --cycles 2   # short validation shift
+python3 -m nightshift summarize                        # print shift state JSON
+python3 -m nightshift verify-cycle --worktree-dir PATH --pre-head HASH  # verify cycle offline
+python3 -m nightshift plan "Add OAuth login"           # plan a feature build
+python3 -m nightshift build "Add OAuth login" --yes    # build a feature end-to-end
+python3 -m nightshift build --status                   # check build progress
+python3 -m nightshift build --resume                   # resume interrupted build
+python3 -m nightshift multi /repo1 /repo2 --agent claude --test --cycles 1  # multi-repo
+python3 -m nightshift module-map --write               # generate architecture map
 ```
 
 `python3 -m nightshift test ...` now keeps its state files, runner logs, and
@@ -156,12 +164,13 @@ Use the bundled wrapper scripts:
 ### Self-maintaining mode
 
 ```bash
-make daemon      # builder
-make review      # reviewer
-make overseer    # process auditor
-make strategist  # one-shot strategic report
-make tasks       # task queue summary
-make check       # local CI gate
+make daemon       # start the daemon (auto-picks operator each cycle)
+make tasks        # show pending/blocked/in-progress task queue
+make check        # full local CI gate (lint + typecheck + tests)
+make test         # run the full test suite
+make dry-run      # preview cycle prompt without spawning agents
+make quick-test   # 2-cycle validation run (~10 min)
+make clean        # remove runtime artifacts
 ```
 
 Daemon examples:
@@ -182,8 +191,8 @@ Abridged example. Full source of truth: [`.nightshift.json.example`](.nightshift
   "hours": 8,
   "cycle_minutes": 30,
   "verify_command": null,
-  "blocked_paths": [".github/", "deploy/", "deployment/", "infra/"],
-  "blocked_globs": ["*.lock", "package-lock.json", "pnpm-lock.yaml"],
+  "blocked_paths": [".github/", "deploy/", "deployment/", "infra/", "k8s/", "ops/", "terraform/", "vendor/"],
+  "blocked_globs": ["*.lock", "package-lock.json", "pnpm-lock.yaml", "yarn.lock", "bun.lockb", "Cargo.lock"],
   "max_fixes_per_cycle": 3,
   "max_files_per_fix": 5,
   "max_files_per_cycle": 12,
@@ -210,27 +219,47 @@ signals such as `pyproject.toml`, `package.json`, `Cargo.toml`, or `go.mod`.
 
 Environment variables:
 
-- `RECURSIVE_CLAUDE_MODEL`
-- `RECURSIVE_CODEX_MODEL`
-- `RECURSIVE_CODEX_THINKING`
-- `RECURSIVE_BUDGET`
-- `RECURSIVE_PENTEST_AGENT`
-- `RECURSIVE_PENTEST_MAX_TURNS`
+- `RECURSIVE_CLAUDE_MODEL` -- override Claude model (default: claude-opus-4-6)
+- `RECURSIVE_CODEX_MODEL` -- override Codex model (default: gpt-5.4)
+- `RECURSIVE_CODEX_THINKING` -- Codex thinking level (default: extra_high)
+- `RECURSIVE_BUDGET` -- max USD spend before daemon stops
+- `RECURSIVE_PENTEST_AGENT` -- agent for security preflight (default: same as main)
+- `RECURSIVE_PENTEST_MAX_TURNS` -- max turns for pentest agent
+- `RECURSIVE_FORCE_ROLE` -- bypass role scoring (build/review/oversee/strategize/achieve)
+- `RECURSIVE_PIPELINE_CHECKPOINTS` -- enable verification checkpoints (0/1)
+
+## How it picks what to do
+
+The daemon reads live system signals each cycle and scores all five roles.
+The highest score wins, with tie-break favoring build. Key signals:
+
+| Signal | Effect |
+|--------|--------|
+| 5+ consecutive builds | Triggers **review** |
+| 50+ pending tasks | Triggers **oversee** |
+| 15+ sessions since last strategy | Triggers **strategize** |
+| Autonomy score < 70 | Triggers **achieve** |
+| Urgent tasks in queue | Boosts **build** |
+
+Override with `RECURSIVE_FORCE_ROLE=review` to bypass scoring.
+Full scoring math: `Recursive/ops/ROLE-SCORING.md`.
 
 ## How it keeps context between sessions
 
 Nightshift is designed for stateless agents, so the repo carries the memory:
 
 - **Handoffs**: every session writes a structured summary to `.recursive/handoffs/`, and the next session starts from `LATEST.md`
-- **Learnings**: agents read `.recursive/learnings/INDEX.md` first, then open only the relevant learning files
-- **Task queue**: work lives in `.recursive/tasks/`; urgent pending tasks outrank normal ones, then the queue falls back to lowest-numbered pending internal work
-- **Evaluations**: after each merge, the next session runs Nightshift against Phractal and turns low scores into tracked follow-up work
+- **Learnings**: agents read `.recursive/learnings/INDEX.md` first (90+ hard-won patterns), then open only the relevant learning files
+- **Task queue**: work lives in `.recursive/tasks/`; urgent pending tasks outrank normal ones, then the queue falls back to lowest-numbered pending internal work. GitHub Issues with the `task` label are auto-synced.
+- **Evaluations**: periodically runs Nightshift against Phractal and scores across 10 dimensions; low scores become tracked follow-up tasks
+- **Session index**: every session is logged with timestamp, role, exit code, duration, cost, feature, and PR link
 
 ```bash
 cat .recursive/handoffs/LATEST.md
 cat .recursive/learnings/INDEX.md
 make tasks
 ls .recursive/evaluations/
+cat .recursive/sessions/index.md
 ```
 
 Humans can add work by opening GitHub issues with the `task` label:
@@ -244,35 +273,42 @@ gh issue create --title "Fix CI" --label "task,urgent"
 
 Nightshift does not trust the model to "be careful." It verifies:
 
-- commit + shift-log presence
-- blocked-path and lockfile violations
-- repo verification commands when configured
+- commit + shift-log presence after every cycle
+- blocked-path and lockfile violations (8 blocked paths, 6 lockfile patterns)
+- repo verification commands (auto-inferred or configured)
 - file deletion attempts
-- repeated category or path tunnel vision
+- repeated category or path tunnel vision (category balancing)
 - prompt/control-file modifications during self-maintenance
+- circuit breaker: stops after 3 consecutive failures
 
 ### Diff scorer
 
-Accepted fixes are scored `1-10` for production impact using category, content,
-test, and breadth signals. Below threshold: revert the cycle. Above threshold:
-keep the commit.
+Accepted fixes are scored `1-10` for production impact using category weight
+(Security: 8, Error Handling: 6, Tests: 6, A11y: 5, etc.), diff content
+analysis, test file bonuses, and multi-category bonuses. Below threshold
+(default 3): revert the cycle. Above threshold: keep the commit.
 
 ### Prompt injection protection
 
 Instruction files from target repos (`CLAUDE.md`, `AGENTS.md`, etc.) are wrapped
-in an untrusted boundary before the agent sees them. They are treated as coding
-convention references only, never as behavioral directives.
+in an untrusted boundary before the agent sees them. Symlinks are rejected,
+files > 100KB are truncated, and total instruction context is capped at 200KB.
+They are treated as coding convention references only, never as behavioral
+directives.
 
 ### Self-modification guard
 
-Before builder work starts, Nightshift snapshots control files, runs a pentest
-preflight, and hard-resets back to `origin/main` before the main fixer session.
-Any control-file diff is surfaced explicitly in the next builder prompt.
+Before builder work starts, Nightshift snapshots all framework control files
+(operator SKILL.mds, `daemon.sh`, `autonomous.md`, etc.), runs a red-team
+security-check preflight, and hard-resets back to `origin/main` before the
+main session. After the session, it compares pre/post snapshots and surfaces
+any control-file diff as an alert in the next cycle's prompt.
 
 ### Cost tracking
 
-Session costs are parsed from stream-json logs. Budget enforcement can stop the
-daemon when cumulative spend exceeds `RECURSIVE_BUDGET`.
+Session costs are parsed from agent stream-json logs. Per-session and cumulative
+costs are tracked in `.recursive/sessions/`. Budget enforcement via
+`RECURSIVE_BUDGET` can stop the daemon when cumulative spend exceeds the limit.
 
 ---
 
@@ -280,8 +316,8 @@ daemon when cumulative spend exceeds `RECURSIVE_BUDGET`.
 
 ### Product -- `nightshift/`
 
-The Python package is organized into subdirectories by concern. 34 modules
-across 5 subdirectories. The generated
+The Python package is organized into subdirectories by concern: 23 production
+modules across 5 subdirectories. The generated
 [module map](.recursive/architecture/MODULE_MAP.md) is the authoritative inventory.
 
 ```text
@@ -343,64 +379,68 @@ nightshift/
 
 ### Framework -- `Recursive/`
 
-The autonomous orchestration framework that drives the daemon, role selection,
-operator prompts, and agent lifecycle.
+A portable autonomous orchestration framework that drives the daemon, role
+selection, operator prompts, agent lifecycle, and session memory. Designed to
+work on any codebase -- Nightshift is just the first project it operates on.
 
 ```text
 Recursive/
 ├── engine/                   # Daemon runtime
-│   ├── daemon.sh             # Main daemon loop
-│   ├── lib-agent.sh          # Agent lifecycle helpers
-│   ├── pick-role.py          # Role scoring engine
+│   ├── daemon.sh             # Main daemon loop (hot-reloads each cycle)
+│   ├── lib-agent.sh          # Agent lifecycle, prompt guard, session utils
+│   ├── pick-role.py          # Signal-driven role scoring engine
 │   ├── watchdog.sh           # Process watchdog
 │   └── format-stream.py      # Stream-log formatter
 │
-├── operators/                # Role-specific prompt sets
-│   ├── build/
-│   ├── review/
-│   ├── oversee/
-│   ├── strategize/
-│   ├── achieve/
-│   └── security-check/
+├── operators/                # Role-specific prompt sets (SKILL.md + references/)
+│   ├── build/                # Default workhorse: pick task, build, ship PR
+│   ├── review/               # Deep file-by-file code review
+│   ├── oversee/              # Task queue triage and metadata cleanup
+│   ├── strategize/           # Big-picture health report with auto-created tasks
+│   ├── achieve/              # Autonomy measurement and human-dependency elimination
+│   └── security-check/       # Red-team preflight (read-only, runs before build)
 │
-├── agents/                   # Sub-agent prompts (reviewers)
-│   ├── code-reviewer.md
-│   ├── architecture-reviewer.md
-│   ├── docs-reviewer.md
-│   ├── safety-reviewer.md
-│   └── meta-reviewer.md
+├── agents/                   # Sub-agent prompts (specialist reviewers)
+│   ├── code-reviewer.md      # Structure, types, tests, shell correctness
+│   ├── architecture-reviewer.md  # Dependency flow, module boundaries, design
+│   ├── docs-reviewer.md      # Changelog, handoff, tracker, cross-doc consistency
+│   ├── safety-reviewer.md    # Secrets, subprocess safety, file system safety
+│   └── meta-reviewer.md      # Daemon integrity, prompt health (framework PRs only)
 │
-├── lib/                      # Shared Python helpers
-│   ├── cleanup.py
-│   ├── compact.py
-│   ├── config.py
-│   ├── costs.py
-│   └── evaluation.py
+├── lib/                      # Shared Python helpers (zero nightshift deps)
+│   ├── cleanup.py            # Log rotation, branch pruning, task archival
+│   ├── compact.py            # Handoff compression
+│   ├── config.py             # Project config loader
+│   ├── costs.py              # Session cost tracking and budget enforcement
+│   └── evaluation.py         # Self-evaluation pipeline (10-dimension scoring)
 │
 ├── prompts/                  # System prompts
-│   ├── autonomous.md
-│   └── checkpoints.md
+│   ├── autonomous.md         # Universal rules prepended to every session
+│   └── checkpoints.md        # Optional verification pipeline checkpoints
 │
 ├── ops/                      # Operations documentation
-│   ├── DAEMON.md
-│   ├── OPERATIONS.md
-│   ├── PRE-PUSH-CHECKLIST.md
-│   └── ROLE-SCORING.md
+│   ├── DAEMON.md             # Daemon guide with troubleshooting
+│   ├── OPERATIONS.md         # Complete system map (42KB reference)
+│   ├── PRE-PUSH-CHECKLIST.md # Safety checklist before pushing
+│   └── ROLE-SCORING.md       # Deep dive into scoring math per role
 │
 ├── scripts/                  # Framework utilities
-│   ├── init.sh
-│   ├── list-tasks.sh
-│   ├── rollback.sh
-│   └── validate-tasks.sh
+│   ├── init.sh               # Bootstrap new Recursive project
+│   ├── list-tasks.sh         # Task queue display
+│   ├── rollback.sh           # Revert last N commits (recovery tool)
+│   └── validate-tasks.sh     # Task YAML frontmatter validator
+│
+├── skills/                   # Skill definitions
+│   └── setup/SKILL.md        # Project setup skill
 │
 ├── templates/                # Structured-doc templates
-│   ├── handoff.md
-│   ├── evaluation.md
-│   ├── session-index.md
-│   ├── task.md
-│   └── project-config.json
+│   ├── handoff.md            # Session handoff format
+│   ├── evaluation.md         # Eval report format (10 dimensions)
+│   ├── session-index.md      # Session index table header
+│   ├── task.md               # Task file format (YAML frontmatter)
+│   └── project-config.json   # .recursive.json template
 │
-└── tests/                    # Framework tests
+└── tests/                    # Framework tests (92 tests)
     └── test_pick_role.py
 ```
 
@@ -440,21 +480,25 @@ Type checking is `mypy --strict`. Linting is Ruff. The local gate is
 
 ## Current frontier
 
-Shipped already:
+Shipped:
 
-- hardening loop (Owl) with worktrees, scoring, and guard rails
-- feature builder loop (Raven) with plan/build/resume/status flows
-- multi-repo mode
-- module map generation
-- self-evaluation against Phractal
-- builder pentest preflight and prompt-integrity checks
-- cross-session learnings, handoffs, and cost tracking
+- hardening loop (Owl) with worktrees, diff scoring, and guard rails (99%)
+- feature builder loop (Raven) with plan/build/resume/status/sub-agents (100%)
+- unified daemon with signal-driven role selection across 6 operators
+- red-team security-check preflight with severity-classified pentest reports
+- 5-agent sub-agent review pipeline (code, architecture, docs, safety, meta)
+- self-evaluation against Phractal with 10-dimension scoring
+- multi-repo mode, module map generation, prompt injection boundaries
+- cross-session learnings (90+), structured handoffs, and cost tracking
+- autonomy measurement and human-dependency elimination (score: 85/100)
+- GitHub Issues auto-sync to internal task queue
 
-Still open in the queue:
+Open in the queue (69 pending tasks):
 
-- fix the remaining real-repo evaluation gaps on rejected runs
+- fix remaining real-repo evaluation gaps on rejected runs
 - automate release tagging and changelog/tracker updates
 - improve task queue hygiene and session-index fidelity
+- budget limiter triple-failure fix (daemon cost tracking)
 - add monitoring / alerting integrations
 
 See [.recursive/vision-tracker/TRACKER.md](.recursive/vision-tracker/TRACKER.md) for the