Recusive · fazxes · Apr 6, 2026 · Apr 6, 2026
diff --git a/docs/prompt/achieve.md b/docs/prompt/achieve.md
@@ -103,6 +103,7 @@ Read these files and compute the autonomy score. Do this EVERY session before fi
 - Each check is 0 (not present), 3 (partially working), or 5 (fully working).
 - Use ONLY evidence from the files. Do not guess. If you cannot verify a check, score it 0.
 - Partial credit (3): the mechanism exists but has never been triggered, or it exists but has known bugs.
+- If a verification file does not exist (e.g., no healer log yet), score that check 0.
 
 Output your score:
 
@@ -279,6 +280,20 @@ Next ACHIEVE session should target: [recommendation]
 
 </process>
 
+<examples>
+<example>
+A good ACHIEVE session:
+
+1. Agent measures autonomy: 62/100. Self-Validating is lowest at 10/25 (eval runs but score stuck at 66, no post-merge smoke test, no coverage tracking).
+2. Root cause analysis: post-merge smoke test exists in evolve.md Step 5 but is marked "optional." Agents skip it every session because the word "optional" gives them permission.
+3. Proposal: change "Optional but recommended" to "Required" in evolve.md Step 9, add a verification that dry-run was executed before the session report.
+4. Builds: edits evolve.md (2 lines), adds test verifying the dry-run instruction is non-optional, runs make check.
+5. Verifies: searches last 5 session logs — confirms agents skip dry-run. After the fix, the instruction is mandatory.
+6. Updates: autonomy report (62 -> 67), handoff, learnings ("optional in prompts means never").
+7. Commits, PRs, merges. Autonomy score: +5 points.
+</example>
+</examples>
+
 <important>
 You are not a feature builder pretending to care about autonomy. You are the immune system. Your job is to find every place where this system would stop working if the human walked away, and fix it. Not with hacks. Not with TODO comments. Not with "we will automate this later." With production-grade, tested, documented changes that a senior engineer would approve.
 

diff --git a/docs/prompt/evolve-auto.md b/docs/prompt/evolve-auto.md
@@ -86,10 +86,17 @@ before v0.0.7). Multiple releases per session is fine. This prevents the
 pattern where versions fall behind because "release" tasks keep getting
 deprioritized by lower-numbered feature tasks.
 
-The rules above (TASK SELECTION, EVAL SCORE GATE, TASK VALUE SCORING,
-VERIFICATION, PRODUCTION-READINESS, CI FAILURE, REVIEW NOTES, RELEASE) apply
-to BUILD sessions only. For REVIEW, OVERSEE, and STRATEGIZE sessions, follow
-the role-specific prompt you read in unified.md Phase 3.
+BUILD-ONLY RULES: TASK SELECTION, EVAL SCORE GATE, TASK VALUE SCORING,
+and RELEASE apply to BUILD sessions only.
+
+UNIVERSAL RULES (apply to ALL roles that produce code changes — BUILD,
+REVIEW, ACHIEVE): VERIFICATION, PRODUCTION-READINESS, CI FAILURE, and
+REVIEW NOTES. These quality gates are non-negotiable for any role that
+commits code to the repo.
+
+For OVERSEE and STRATEGIZE sessions (which do not produce code), follow
+the role-specific prompt you read in unified.md Phase 3. The universal
+rules do not apply since these roles do not create PRs with code changes.
 
 STRATEGIZE AUTONOMOUS OVERRIDE: When the unified daemon picks STRATEGIZE,
 do NOT wait for human input. Write the strategy report, then auto-create
@@ -107,7 +114,7 @@ DAEMON CONTEXT: You are running inside the unified daemon (`scripts/daemon.sh`)
 - A monitor agent or human may be reading your log in real-time
 - The daemon will hard-reset to origin/main before your next session starts
 - If you leave an open PR, the next session will detect it and finish it
-- The daemon auto-picks BUILD/REVIEW/OVERSEE/STRATEGIZE each cycle based on system signals
+- The daemon auto-picks BUILD/REVIEW/OVERSEE/STRATEGIZE/ACHIEVE each cycle based on system signals
 - Full daemon docs: `docs/ops/DAEMON.md`
 
 ---
diff --git a/docs/prompt/unified.md b/docs/prompt/unified.md
@@ -146,12 +146,15 @@ System signals:
   pending_tasks:           N
   stale_tasks:             N
   healer_status:           [status]
+  autonomy_score:          NN/100
+  needs_human_issues:      N
 
 Scoring:
   BUILD:      NN  (breakdown)
   REVIEW:     NN  (breakdown)
   OVERSEE:    NN  (breakdown)
   STRATEGIZE: NN  (breakdown)
+  ACHIEVE:    NN  (breakdown)
 
 -> [ROLE] this session because [one sentence reason]
 ```
@@ -172,12 +175,12 @@ Based on your decision, read ONE of these prompt files and follow it end-to-end:
 
 **Read the ENTIRE prompt file and follow it step by step.** The role prompts are 100-650 lines. You MUST read the full file, not just the first 200 lines. If using shell commands to read, use `cat` not `sed -n '1,220p'`. Do NOT read the other role prompts. One role per session.
 
-**Post-execution requirement (ALL roles):** After completing the role prompt's steps, update `docs/handoffs/LATEST.md` with what you did this session. BUILD's evolve.md already requires this. For REVIEW, OVERSEE, and STRATEGIZE: write a brief handoff noting your role, what you did, and what the next session should know. The next cycle reads LATEST.md first -- stale data causes bad decisions.
+**Post-execution requirement (ALL roles):** After completing the role prompt's steps, update `docs/handoffs/LATEST.md` with what you did this session. BUILD's evolve.md already requires this. For REVIEW, OVERSEE, STRATEGIZE, and ACHIEVE: write a brief handoff noting your role, what you did, and what the next session should know. The next cycle reads LATEST.md first -- stale data causes bad decisions.
 
 After reading the role prompt, announce which role you adopted so the session log is traceable:
 
 ```
-EXECUTING ROLE: [BUILD/REVIEW/OVERSEE/STRATEGIZE]
+EXECUTING ROLE: [BUILD/REVIEW/OVERSEE/STRATEGIZE/ACHIEVE]
 ```
 
 ---
@@ -197,14 +200,17 @@ System signals:
   pending_tasks:           45
   stale_tasks:             1
   healer_status:           caution
+  autonomy_score:          55/100
+  needs_human_issues:      1
 
 Scoring:
   BUILD:      10  (50 base -40 eval gate = 10, no urgent tasks)
-  REVIEW:     10  (10 base, builds < 5, no healer concern, review < 10)
+  REVIEW:     10  (10 base, builds < 5, no healer concern, review < 5)
   OVERSEE:    10  (10 base, tasks < 50, stale < 3)
-  STRATEGIZE:  5  (5 base, strategy < 15 sessions ago)
+  STRATEGIZE:  5  (5 base, strategy < 15)
+  ACHIEVE:    55  (5 +50 autonomy 55 < 70)
 
--> BUILD this session because eval score 66 < 80 gates me to eval-related tasks. Picking the highest-impact eval fix to push toward 80.
+-> ACHIEVE this session because autonomy score 55 < 70 and eval is gated. Fixing the highest-impact human dependency pushes autonomy up while eval tasks are handled by future BUILD sessions.
 </example>
 
 <example>
@@ -223,9 +229,10 @@ System signals:
 
 Scoring:
   BUILD:      80  (50 +30 eval healthy)
-  REVIEW:     50  (10 +40 consecutive builds >= 5)
+  REVIEW:     60  (10 +40 consecutive >= 5 +10 review >= 5)
   OVERSEE:    100 (10 +50 pending >= 50 +40 stale >= 3)
   STRATEGIZE:  5  (5 base, strategy < 15)
+  ACHIEVE:    55  (5 +50 autonomy 0 < 70)
 
 -> OVERSEE this session because 62 pending tasks with 4 stale. Queue needs cleanup before more building adds noise.
 </example>
@@ -243,12 +250,15 @@ System signals:
   pending_tasks:           38
   stale_tasks:             1
   healer_status:           concern
+  autonomy_score:          72/100
+  needs_human_issues:      0
 
 Scoring:
   BUILD:      80  (50 +30 eval healthy)
-  REVIEW:     90  (10 +40 consecutive >= 5 +30 healer concern +10 review overdue)
+  REVIEW:     90  (10 +40 consecutive >= 5 +30 healer concern +10 review >= 5)
   OVERSEE:    10  (10 base, tasks < 50, stale < 3)
   STRATEGIZE:  5  (5 base, strategy < 15)
+  ACHIEVE:     5  (5 base, autonomy 72 >= 70)
 
 -> REVIEW this session because 6 consecutive builds with healer flagging quality concerns. REVIEW scores 90 vs BUILD 80.
 </example>
@@ -266,12 +276,15 @@ System signals:
   pending_tasks:           35
   stale_tasks:             0
   healer_status:           good
+  autonomy_score:          85/100
+  needs_human_issues:      0
 
 Scoring:
   BUILD:      80  (50 +30 eval healthy)
   REVIEW:     10  (10 base)
   OVERSEE:    10  (10 base)
   STRATEGIZE: 65  (5 +60 overdue by 3 sessions)
+  ACHIEVE:     5  (5 base, autonomy 85 >= 70)
 
 -> STRATEGIZE this session because 18 sessions without strategic review. Everything else is healthy -- time for big picture analysis.
 </example>

diff --git a/scripts/daemon.sh b/scripts/daemon.sh
@@ -34,7 +34,6 @@ LOG_DIR="$REPO_DIR/docs/sessions"
 INDEX_FILE="$LOG_DIR/index.md"
 AUTO_PREFIX="$REPO_DIR/docs/prompt/evolve-auto.md"
 UNIFIED_PROMPT="$REPO_DIR/docs/prompt/unified.md"
-EVOLVE_PROMPT="$REPO_DIR/docs/prompt/evolve.md"
 PENTEST_PROMPT_FILE="$REPO_DIR/docs/prompt/pentest.md"
 LOCKFILE="$REPO_DIR/.nightshift-daemon.lock"
 PROMPT_ALERT="$LOG_DIR/prompt-alert.md"

diff --git a/scripts/format-stream.py b/scripts/format-stream.py
@@ -71,6 +71,8 @@ def format_codex(event: dict) -> str | None:
                 "SYSTEM SIGNALS", "ROLE DECISION", "EXECUTING ROLE",
                 "SESSION STATUS", "PROPOSAL", "PRE-PUSH CHECKLIST",
                 "SESSION COMPLETE", "Session Complete", "GENERATED TASKS",
+                "AUTONOMY SCORE", "ACHIEVE PROPOSAL", "ACHIEVE SESSION COMPLETE",
+                "OVERSEER AUDIT",
             ]:
                 if marker in text:
                     return f"  >>>   {marker}"
@@ -112,8 +114,9 @@ def main() -> None:
                 result = format_codex(event)
             if result is not None:
                 print(result, flush=True)
-        except Exception:
-            # Never crash the pipeline — log and continue
+        except Exception as exc:
+            # Never crash the pipeline — show error and continue
+            print(f"  ERR   formatter: {type(exc).__name__}", flush=True)
             continue