Recusive · fazxes · Apr 9, 2026 · Apr 9, 2026 · Apr 9, 2026
diff --git a/.recursive/ops/DAEMON.md b/.recursive/ops/DAEMON.md
@@ -55,8 +55,8 @@ bash .recursive/engine/daemon.sh claude 60 10
 Arguments are:
 
 1. agent name
-2. pause between sessions in seconds
-3. max sessions (`0` means loop forever)
+2. duration in hours (default: 8)
+3. max sessions (`0` means unlimited, use duration limit)
 
 ### tmux
 
@@ -98,7 +98,7 @@ Valid values are `build`, `review`, `oversee`, `strategize`, `achieve`, `securit
 ## How Role Selection Works
 
 At the start of every cycle, `.recursive/engine/daemon.sh` calls
-[.recursive/engine/pick-role.py](/Users/no9labs/Developer/.recursive/Nightshift/.recursive/engine/pick-role.py).
+`.recursive/engine/pick-role.py`.
 That scorer reads the live system state and prints one winner.
 
 Primary inputs:
@@ -111,8 +111,7 @@ Primary inputs:
 - the latest report in `.recursive/autonomy/`
 - open GitHub issues labeled `needs-human`
 
-The exact math belongs in
-[.recursive/ops/ROLE-SCORING.md](/Users/no9labs/Developer/.recursive/Nightshift/.recursive/ops/ROLE-SCORING.md),
+The exact math belongs in `.recursive/ops/ROLE-SCORING.md`,
 not in this file. Read that file when debugging "why did the daemon pick this
 role?" behavior.
 
@@ -205,8 +204,7 @@ These are the authoritative runtime artifacts:
 | Path | Purpose |
 |------|---------|
 | `.recursive/sessions/index.md` | Unified session history across all roles |
-| `.recursive/sessions/*.log` | Stream-json session logs |
-| `.recursive/sessions/*-pentest.log` | Pentest preflight logs |
+| `.recursive/sessions/raw/*.log` | Stream-json session logs |
 | `.recursive/sessions/costs.json` | Cost ledger used by budget checks |
 | `.recursive/handoffs/LATEST.md` | Short-term memory for the next cycle |
 | `.recursive/evaluations/*.md` | Real-repo evaluation reports |
@@ -321,8 +319,7 @@ resets the repo, and injects that alert into the next cycle.
 The circuit breaker stops the daemon after three failed cycles. Inspect:
 
 - `.recursive/sessions/index.md`
-- the latest session log
-- the latest pentest log
+- the latest session log in `.recursive/sessions/raw/`
 - `.recursive/handoffs/LATEST.md`
 
 ### Budget stop

diff --git a/.recursive/ops/OPERATIONS.md b/.recursive/ops/OPERATIONS.md
@@ -454,7 +454,7 @@ The Python package that IS Nightshift. The overnight hardening runner.
 
 ### Dependency flow (nightshift package)
 ```
-core/errors → settings/eval_targets → core/types → core/constants → core/shell → raven/summary → raven/coordination → infra/module_map → owl/readiness → owl/scoring → core/state → settings/config → infra/multi → raven/e2e → raven/profiler → infra/worktree → owl/cycle → raven/planner → raven/subagent → raven/decomposer → raven/integrator → raven/feature → cli
+core/errors → core/types → core/constants → core/shell → raven/summary → raven/coordination → infra/module_map → owl/readiness → owl/scoring → core/state → settings/config → settings/eval_targets → owl/eval_runner → infra/multi → raven/e2e → raven/profiler → infra/worktree → owl/cycle → raven/planner → raven/subagent → raven/decomposer → raven/integrator → raven/feature → infra/release → cli
 ```
 No circular imports. Each module only imports from modules to its left. `multi.py` receives the `run_nightshift` callable from `cli.py` via dependency injection to avoid circular deps.
 
@@ -472,7 +472,7 @@ Note: `cleanup.py`, `compact.py`, `costs.py`, `evaluation.py`, and `config.py` (
 ## System 7: Tests (`nightshift/tests/`)
 
 ### What it is
-915 pytest tests covering every pure function, config, state, CLI, and integration.
+1156 pytest tests covering every pure function, config, state, CLI, and integration.
 
 ### Files
 | File | Purpose |
@@ -797,6 +797,24 @@ What defines each version. Use this to know when a release is ready.
 - [x] Wave integrator module (`nightshift/integrator.py`)
 - [x] `nightshift build` CLI command (`nightshift/feature.py` -- build/status/resume)
 
+### v0.0.7 — Security Hardening (released 2026-04-05)
+- [x] Prompt injection protection for target repos
+- [x] Prompt self-modification guard across all daemon scripts
+- [x] Cost tracking and budget ceiling for daemon sessions
+- [x] Daemon log rotation and orphan branch pruning
+- [x] Automated handoff compaction in daemon
+- [x] Configurable model/effort/thinking per agent
+
+### v0.0.8 — Self-Maintaining (in progress)
+- [x] Auth-error circuit breaker bypass with notify_human
+- [x] Auto-release module (`nightshift/infra/release.py`)
+- [x] Eval runner CLI (`nightshift/owl/eval_runner.py`)
+- [x] Session index writer rewrite (single-line rows, delegation-aware counters)
+- [x] Worktree cleanup rewrite with self-removal guard
+- [x] Eval staleness signal in dashboard
+- [x] Delegation-aware sessions-since counters (signals.py + pick-role.py)
+- [ ] Wire E2E eval into daemon loop automatically
+
 ### v1.0.0 — Production
 - [ ] Loop 1 runs reliably overnight on real repos
 - [ ] Loop 2 can build a simple feature end-to-end

diff --git a/.recursive/ops/ROLE-SCORING.md b/.recursive/ops/ROLE-SCORING.md
@@ -60,6 +60,8 @@ used when the source file is missing or unreadable.
 | `tracker_moved` | Session index -- any `%` in recent status cells | false |
 | `recent_security_sessions` | Session index + archived pentest tasks (dual-signal) | 0 |
 | `friction_entries` | `.recursive/friction/log.md` -- count `## YYYY-MM-DD` headers | 0 |
+| `pentest_framework_tasks` | `.recursive/tasks/` -- pending tasks with `source: pentest` AND `target: recursive` | 0 |
+| `sessions_since_eval` | `.recursive/evaluations/` vs session index -- sessions since latest eval file was written (dashboard-only, not used in scoring) | 0 |
 
 **Eval file validation**: `read_latest_eval_score()` validates the file before reading the score.
 A file must have a `**Date**:` line and at least 3 scored dimension rows (`N/10` format) outside
@@ -160,9 +162,12 @@ friction_entries >= 5:                +50   (lots of friction accumulated)
 friction_entries >= 3
   AND sessions_since_evolve >= 5:     +30   (moderate friction, hasn't evolved recently)
 sessions_since_evolve >= 20:          +20   (overdue regardless of friction count)
+pentest_framework_tasks >= 1:         +40   (confirmed security vuln in .recursive/ -- security urgency)
 
-Hard cap: capped at 5 if sessions_since_evolve < 5 (don't re-run too frequently)
-Hard cap: capped at 5 if friction_entries == 0 (no friction = nothing to evolve)
+Hard cap: capped at 5 if sessions_since_evolve < 5 AND pentest_framework_tasks == 0
+           (don't re-run too frequently unless pentest tasks pending)
+Hard cap: capped at 5 if friction_entries == 0 AND pentest_framework_tasks == 0
+           (no friction and no pentest tasks = nothing to evolve)
 ```
 
 ### AUDIT -- framework quality review

diff --git a/.recursive/reviews/2026-04-09-audit-126.md b/.recursive/reviews/2026-04-09-audit-126.md
@@ -0,0 +1,111 @@
+# Framework Audit -- Session #126
+
+**Date**: 2026-04-09
+**Trigger**: 18 sessions since last audit (session #107)
+**Auditor**: audit-agent
+
+---
+
+## Quality Audit
+
+Files audited: 12
+- `.recursive/ops/OPERATIONS.md`
+- `.recursive/ops/DAEMON.md`
+- `.recursive/ops/ROLE-SCORING.md`
+- `.recursive/engine/daemon.sh`
+- `.recursive/engine/lib-agent.sh`
+- `.recursive/engine/signals.py`
+- `.recursive/engine/dashboard.py`
+- `.recursive/engine/pick-role.py`
+- `.recursive/agents/brain.md`
+- `.recursive/sessions/index.md`
+- `.recursive/architecture/MODULE_MAP.md`
+- `CLAUDE.md`
+
+Issues found: 8
+Issues fixed in this PR: 6
+Tasks created for remaining issues: 2 (#0249, #0250)
+
+### Issues Fixed
+
+1. **OPERATIONS.md: Stale test count** -- Updated "915 pytest tests" to "1156 pytest tests". The count grew from 915 to 1156 across sessions #107-#125 with 241 new tests added.
+
+2. **OPERATIONS.md: Missing version milestones** -- Added v0.0.7 (Security Hardening, released 2026-04-05) and v0.0.8 (Self-Maintaining, in progress) milestone entries. The doc stopped at v0.0.6 despite both versions having changelog files.
+
+3. **DAEMON.md: Wrong argument description** -- Arg 2 was described as "pause between sessions in seconds" but daemon.sh uses it as `duration_hours` (default 8). The tmux example `daemon.sh claude 60` would mean 60 hours, not 60 second pause. Fixed description to "duration in hours (default: 8)" and arg 3 to clarify 0=unlimited.
+
+4. **DAEMON.md: Hardcoded absolute paths** -- Lines 101 and 115 had absolute links `/Users/no9labs/Developer/.recursive/Nightshift/...` with `.recursive` misplaced in the path. Replaced with relative paths.
+
+5. **DAEMON.md: Stale pentest log references** -- Referenced `.recursive/sessions/*-pentest.log` (v1 era artifact) in the Logs table and circuit breaker recovery section. These files do not exist in the v2 architecture. Replaced with `.recursive/sessions/raw/*.log`.
+
+6. **ROLE-SCORING.md: Missing signals** -- The signals table was missing `pentest_framework_tasks` (added session #109, used to boost evolve +40) and `sessions_since_eval` (added session #124, used for eval staleness alert). Both signals are actively used in pick-role.py and dashboard.py. Added both to the signals table and documented the `pentest_framework_tasks` boost in the EVOLVE scoring section.
+
+7. **sessions/index.md: Corrupted role field** -- Session 20260409-020609 had role `.*'"$LOG_FILE"2>/d` (shell injection artifact from a regex-extraction bug in daemon.sh's role extractor). Corrected to `brain`.
+
+8. **CLAUDE.md + OPERATIONS.md: Divergent dependency flows** -- CLAUDE.md was missing `raven.summary`, `raven.coordination`, `raven.e2e`, `raven.profiler` modules and had wrong ordering of `settings.config`/`settings.eval_targets`. Neither file included `owl.eval_runner` (added session #118). Synchronized both files to a consistent flow that includes all current modules.
+
+### Tasks Created
+
+- **#0249**: Regenerate MODULE_MAP.md -- stale since session #0001, shows only 3 modules, actual package has 20+. Requires `make` or the CLI command which touches nightshift/ (build zone).
+
+- **#0250**: Fix DAEMON.md cycle lifecycle description -- shows `git checkout main` and `git clean -fd` which don't appear in daemon.sh. Framework zone (evolve agent).
+
+---
+
+## Pattern Analysis
+
+Sessions analyzed: 19 (sessions #107-#125)
+Commitment hit rate: 19/19 = **100%** (perfect streak)
+Cost trend: **stable** (~$1.5-2.2 USD/session, with outliers for complex parallel sessions)
+
+### Decision Patterns
+
+**Role distribution (last 19 sessions)**:
+- build: 9 delegations
+- evolve: 12 delegations (many sessions ran both build+evolve in parallel)
+- oversee: 1 delegation (#0122)
+- strategize: 1 delegation (#0123)
+- security: 1 delegation (#0109, #0110)
+- audit: 1 delegation (#0107, this session)
+- review: 0 delegations
+
+**Observations**:
+- Build+evolve parallel pattern is the dominant strategy (used in 8 of 19 sessions). It's efficient and produces good throughput.
+- Security was run twice in rapid succession (#109, #110) -- the pentest->evolve fix cycle worked well.
+- Review role has not been delegated in 19 sessions. The code-reviewer/safety-reviewer sub-agents are used per-PR but the standalone review role (file-by-file quality review) has been skipped.
+- Advisory overrides are common but justified (5 of 19 sessions).
+
+**Override quality**: All overrides were justified with clear rationale. No habitual overrides observed.
+
+### Commitment Quality
+
+All 19 commitments were MET. Specific observations:
+- Predictions are consistently calibrated -- specific and measurable
+- Eval score predictions tend to be conservative (>= threshold) rather than point estimates
+- Test count predictions consistently underestimate actual new tests (e.g., predicted 3+, got 25)
+
+### Cost Analysis
+
+Session costs range from $0.39 to $2.60 USD. Two sessions stood out:
+- Session #0110 ($2.38): Most expensive -- parallel build+evolve with complex security work
+- Session #0114 ($2.60): Most expensive overall -- release module with 3 fix cycles
+
+Cost trend is stable. No drift upward or downward.
+
+### Optimization Opportunities
+
+1. **Review role gap**: 19 sessions without a standalone review. The review role does file-by-file deep quality checks that per-PR reviewers don't do. Consider triggering when consecutive_builds >= 10 (currently 5, but the brain often parallels build+evolve which doesn't change this counter).
+
+2. **Eval regression tracking**: Eval dropped from 86 to 83 between #0016 and #0017. The count-only payload issue in state file is task #0247. Track whether fixing this brings eval back above 86.
+
+3. **Queue not shrinking**: Queue stabilized at 69 after oversee in #0122 but hasn't continued to decrease. With 2 new tasks created per session on average, the queue will grow unless oversee runs more frequently.
+
+4. **MODULE_MAP.md rot**: The module map hasn't been regenerated since session #0001 and shows only 3 modules. Dashboard and brain signals that reference the map get no useful data. Task #0249 addresses this.
+
+---
+
+## Verification
+
+- `make check` passes (1156 tests)
+- No framework files in nightshift/ touched
+- All 8 issues identified; 6 fixed directly; 2 tasked
diff --git a/.recursive/sessions/index.md b/.recursive/sessions/index.md
@@ -85,5 +85,5 @@
 | 2026-04-08 22:41 | 20260408-183951 | brain      | 0    | -        | -        | success                                                        | #0177 eval rerun (53->86) + #0203 ROLE-SCORING v2   | [#198](https://github.com/Recusive/Nightshift/pull/198), [#197](https://github.com/Recusive/Nightshift/pull/197) |
 | 2026-04-09 01:44 | 20260409-011757 | brain | 0 | 26m | $2.158 | success | - | - |
 | 2026-04-09 02:06 | 20260409-014441 | brain | 0 | 21m | $2.0223 | success [PROMPT MODIFIED] [ORIGIN MODIFIED] | - | - |
-| 2026-04-09 02:25 | 20260409-020609 | .*'\"$LOG_FILE\"2>/d | 0 | 18m | $2.2003 | success [PROMPT MODIFIED] | - | - |
+| 2026-04-09 02:25 | 20260409-020609 | brain | 0 | 18m | $2.2003 | success [PROMPT MODIFIED] | - | - |
 | 2026-04-09 02:42 | 20260409-022508 | brain | 0 | 17m | $1.3038 | success [PROMPT MODIFIED] | - | - |
diff --git a/.recursive/tasks/.next-id b/.recursive/tasks/.next-id
@@ -1 +1 @@
-249
+251
diff --git a/.recursive/tasks/0249.md b/.recursive/tasks/0249.md
@@ -0,0 +1,22 @@
+---
+status: pending
+priority: normal
+target: recursive
+source: audit
+created: 2026-04-09
+completed:
+---
+
+# Regenerate MODULE_MAP.md (stale since session #0001)
+
+The `.recursive/architecture/MODULE_MAP.md` was last generated in session #0001 and shows only 3 top-level modules. It is severely stale -- the package has grown to include `core/`, `settings/`, `owl/`, `raven/`, and `infra/` subpackages with 20+ modules. The stale map gives future sessions incorrect orientation data.
+
+## Acceptance Criteria
+- [ ] Run `python3 -m nightshift module-map --write` from the repo root
+- [ ] Verify the new MODULE_MAP.md shows all subpackages and modules
+- [ ] Verify the dependency order matches the flow in CLAUDE.md
+- [ ] Commit the updated MODULE_MAP.md
+- [ ] PR passes code-reviewer
+
+## Notes
+This is a build-zone task (touches nightshift/). The command auto-generates the file -- no manual editing needed.
diff --git a/.recursive/tasks/0250.md b/.recursive/tasks/0250.md
@@ -0,0 +1,35 @@
+---
+status: pending
+priority: normal
+target: recursive
+source: audit
+created: 2026-04-09
+completed:
+---
+
+# Fix DAEMON.md cycle lifecycle description (git commands inaccurate)
+
+The DAEMON.md "Cycle Lifecycle" section shows these commands:
+
+```
+git fetch origin
+git checkout main
+git reset --hard origin/main
+git clean -fd
+```
+
+But the actual `daemon.sh` only runs:
+
+```
+git -C "$REPO_DIR" fetch origin main --quiet
+git -C "$REPO_DIR" reset --hard origin/main --quiet
+```
+
+No `git checkout main` and no `git clean -fd`. The doc is misleading agents who read it to understand the reset behavior.
+
+## Acceptance Criteria
+- [ ] Update DAEMON.md "1. Reset and housekeeping" section to match actual daemon.sh reset commands
+- [ ] PR passes docs-reviewer
+
+## Notes
+Framework-zone task (touches `.recursive/ops/DAEMON.md`). Delegate to evolve agent.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -167,7 +167,7 @@ These are enforced by CI. Non-negotiable.
 - **One concern per module.** If you're adding >50 lines of new logic to an existing file, it belongs in its own module. cycle.py handles cycle logic -- not scoring. cli.py handles CLI -- not business logic.
 - **No hardcoded data in logic files.** Regex patterns, score maps, category weights, thresholds -- these go in `core/constants.py` or a dedicated `*_patterns.py`. Logic files import them.
 - **New module checklist:** create the `.py` file in the appropriate subpackage (`core/`, `settings/`, `owl/`, `raven/`, `infra/`), add to the subpackage's `__init__.py` re-exports, add to `nightshift/scripts/install.sh` PACKAGE_FILES, add to this file's structure tree.
-- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> core.state -> settings.config -> settings.eval_targets -> owl.cycle -> owl.scoring -> owl.readiness -> raven.planner -> raven.decomposer -> raven.subagent -> raven.integrator -> raven.feature -> infra.worktree -> infra.module_map -> infra.multi -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.)
+- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> raven.summary -> raven.coordination -> infra.module_map -> owl.readiness -> owl.scoring -> core.state -> settings.config -> settings.eval_targets -> owl.eval_runner -> infra.multi -> raven.e2e -> raven.profiler -> infra.worktree -> owl.cycle -> raven.planner -> raven.subagent -> raven.decomposer -> raven.integrator -> raven.feature -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.)
 - **Functions over inline code.** If a block of code does one thing and is >10 lines, extract it into a named function. The function name documents the intent.
 - **Config over magic numbers.** If a value might change (thresholds, limits, timeouts), put it in `DEFAULT_CONFIG` and `core/types.py`, not inline.