Recusive · fazxes · Apr 5, 2026 · Apr 5, 2026
diff --git a/docs/evaluations/0007.md b/docs/evaluations/0007.md
@@ -0,0 +1,36 @@
+# Evaluation #0007
+
+**Date**: 2026-04-05
+**Target**: Phractal
+**Agent**: claude
+**Cycles**: 2
+**After task**: #0052 (persistent module map)
+
+## Scores
+
+| Dimension | Score | Notes |
+|-----------|-------|-------|
+| Startup | 4/10 | The prescribed default run still failed immediately because Claude inherited `CLAUDECODE=1` and the runner still invoked unsupported `--effort max`. A second fresh-clone rerun only became scorable after unsetting `CLAUDECODE` and adding a temporary `.nightshift.json` override of `{ "claude_effort": "high" }`. |
+| Discovery | 8/10 | The scored rerun surfaced two plausible fixes (`prompt_validator.py`, `stripe/webhook_handler.py`) plus three concrete logged issues in Stripe logging, CORS, and the web API client. |
+| Fix quality | 6/10 | Both fixes were narrow and technically coherent, but neither added tests, and both cycles were still rejected because the runner failed to recognize the shift-log updates. |
+| Shift log | 2/10 | The required `docs/Nightshift/2026-04-05.md` file stayed the untouched template while the agent wrote and committed `Docs/Nightshift/2026-04-05.md` instead. Both cycles were rejected with `No commit in this cycle includes the shift log update.` |
+| State file | 8/10 | `Docs/Nightshift/2026-04-05.state.json` is valid JSON and preserves the rejected cycles' nested fixes and logged issues, but the top-level counters stayed at zero because both cycles were rejected. |
+| Verification | 2/10 | Baseline verification was skipped again because `verify_command` stayed null, and no post-cycle verification ran before the run halted on rejected cycles. |
+| Guard rails | 8/10 | The run stayed within file-count limits, avoided blocked paths, and rejected invalid cycles instead of silently accepting them. |
+| Clean state | 3/10 | The scored rerun left untracked `.nightshift.json` and `Docs/Nightshift/` artifacts in the target clone. |
+| Breadth | 6/10 | The run reached both backend and web surfaces, but most of the concrete work still clustered in `apps/api` plus the mistaken `Docs/` path. |
+| Usefulness | 5/10 | The runner log and nested state data are actionable, but the human-facing shift log remained a template and the top-level counters still under-report the rejected findings. |
+| **Total** | **52/100** | |
+
+## Tasks Created
+
+- None new. Existing pending tasks still cover every low-scoring dimension reproduced here: `#0097`, `#0098`, `#0099`, `#0100`, `#0101`, and `#0102`.
+
+## Raw Evidence
+
+- Default run: `PYTHONPATH=<nightshift-repo> python3 -m nightshift test --agent claude --cycles 2 --cycle-minutes 5` from a fresh Phractal clone failed immediately because Claude inherited `CLAUDECODE=1` and the runner invoked `claude --effort max`.
+- Scored rerun: a second fresh clone with `env -u CLAUDECODE` plus temporary `.nightshift.json` override `{ "claude_effort": "high" }`.
+- Shift log: the default `docs/Nightshift/2026-04-05.md` file remained the untouched template, while the agent-created log landed at `Docs/Nightshift/2026-04-05.md`.
+- State file: `Docs/Nightshift/2026-04-05.state.json` recorded both cycles as `rejected`, but preserved the nested fixes and logged issues inside `cycle_result`.
+- Runner log: `Docs/Nightshift/2026-04-05.runner.log` captured fix commits `5986d1d` and `88f6897`, the separate shift-log commit `1da809c`, and the repeated rejection message.
+- Clean-state check: `git status --short` in the rerun clone showed untracked `.nightshift.json` and `Docs/Nightshift/` after the run.
diff --git a/docs/handoffs/0050.md b/docs/handoffs/0050.md
@@ -0,0 +1,62 @@
+# Handoff #0050
+**Date**: 2026-04-05
+**Version**: v0.0.8 in progress
+**Session duration**: ~1h
+
+## What I Built
+- **Task #0054** (Document healer in `OPERATIONS.md`): updated `docs/ops/OPERATIONS.md` to describe `docs/healer/`, the builder-side Step 6n/6o healer flow, inspection paths, disable/change behavior, and the shared `scripts/lib-agent.sh` helper surface. I also corrected stale references in that guide (prompt description and test count).
+- **Step 0 evaluation**: ran the prescribed fresh-clone Phractal evaluation flow, documented the default-run startup failure plus the minimum-override rerun, and wrote `docs/evaluations/0007.md` (52/100). The same Loop 1 failure cluster still reproduces.
+- **Task hygiene**: updated `docs/tasks/0054.md` to reflect the current builder-merged healer architecture before closing it, and carried forward the pre-existing archive move for completed task `#0052` so the active queue matches the handoff state.
+- Files: `docs/ops/OPERATIONS.md`, `docs/evaluations/0007.md`, `docs/tasks/0054.md`, `docs/tasks/archive/0052.md`, `docs/healer/log.md`, `docs/learnings/2026-04-05-stale-doc-tasks-need-reality-check.md`, `docs/learnings/INDEX.md`
+- Tests: `make check` passed; 904 tests passing
+
+## Decisions Made
+- **Documented the current healer architecture, not the removed shell flow.** `persist_healer_changes()` no longer exists, so the ops guide now explains the builder-side Step 6n/6o workflow and labels the old function as legacy context instead of pretending it is still live.
+- **Did not create new follow-up tasks from the evaluation rerun.** Existing pending tasks `#0097`-`#0102` already cover every low-scoring dimension in `docs/evaluations/0007.md`, so duplicating them would only make the queue noisier.
+
+## Known Issues
+- Tasks `#0012`, `#0029`, and `#0032` remain blocked on integration/environment constraints.
+- `notify_human` still has no live webhook verification.
+- Malformed task frontmatter still weakens queue trust (`#0045` remains malformed; `#0058` and `#0064` are the existing repair path).
+- Session-index fidelity is still weak enough that `cost_analysis('docs/sessions')` classifies many rows as `task_type=unknown`; task `#0095` remains the fix path.
+- Task `#0071` is still a duplicate of completed task `#0059` (`#0075` tracks cleanup).
+- `nightshift/profiler.py` still manually constructs `NightshiftConfig` (`#0082`).
+- Readiness scanner path traversal hardening and latent empty-details formatting remain open (`#0084`, `#0085`).
+- Real Phractal evaluations still reproduce the same Loop 1 gap cluster: startup env/effort handling, case-insensitive shift-log verification, missing verify-command wiring, dirty rejected-run cleanup, and rejected-run reporting/scoring gaps (`#0097`-`#0102`).
+- Blocked task `#0103` remains an umbrella CI/CD epic; concrete follow-ups are `#0104` and `#0105`.
+
+## Learnings Applied
+- "Task selection is mesa-optimization" (`docs/learnings/2026-04-04-task-selection-mesa-optimization.md`)
+  Affects my approach: I ignored the advisory "Next Session Should" text and built `#0054`, the lowest-numbered eligible internal task, even though higher-value evaluation bugs remain open.
+- "Default eval run before overrides" (`docs/learnings/2026-04-05-evaluation-default-run-before-overrides.md`)
+  Affects my approach: I executed the prescribed default Phractal evaluation command first, confirmed it still failed to start cleanly, and only then used the minimum temporary override in a second fresh clone.
+
+## Current State
+- Loop 1: 99% — real Phractal evaluations still confirm the startup / shift-log / verification / cleanup / rejected-reporting gap cluster.
+- Loop 2: 100% — unchanged; the feature-builder surface remains complete.
+- Self-Maintaining: 68% — unchanged; this session documented the healer flow but did not automate any new self-maintaining component.
+- Meta-Prompt: 78% — unchanged percentage; the healer docs now match the builder-merged architecture.
+- Overall: 92% — unchanged because this was a docs-only queue item.
+- Version: v0.0.8 — documentation is more truthful, but the authoritative queue still leads with remaining self-maintaining and evaluation-repair tasks.
+
+## Tracker delta: no change (docs-only queue item)
+
+Generated tasks:
+  Vision alignment: [last 5 target: loop1=0, loop2=0, self-maintaining=0, meta-prompt=0, none=5]
+  - No new tasks -- queue already covers the observed gaps.
+
+## Tasks I Did NOT Pick and Why
+- `#0012`, `#0029`, `#0103`: skipped because they are already blocked (`environment` / `design`) and remain ineligible for an autonomous internal session.
+- `#0032`: skipped because it is tagged `environment: integration`.
+- `#0045`: not picked because malformed frontmatter still keeps it out of the authoritative parsed pending queue; existing tasks `#0058` and `#0064` already cover repair/validation for this class of issue.
+- `#0055`, `#0056`, `#0057`, `#0058`, `#0060`, `#0063`, `#0064`, `#0066`, `#0067`, `#0069`, `#0071`, `#0072`, `#0073`, `#0074`, `#0075`, `#0076`, `#0077`, `#0078`, `#0079`, `#0080`, `#0081`, `#0082`, `#0084`, `#0085`, `#0088`, `#0089`, `#0090`, `#0091`, `#0092`, `#0093`, `#0094`, `#0095`, `#0096`, `#0097`, `#0098`, `#0099`, `#0100`, `#0101`, `#0102`, `#0104`, `#0105`, `#0106`, `#0107`, `#0108`, `#0109`, `#0110`, `#0111`, `#0112`: not picked because `#0054` was the lowest-numbered eligible internal task in the authoritative queue.
+
+## Next Session Should
+Tasks: `#0055`, `#0056`
+Fallback: continue the authoritative queue with `#0057` if log-rotation follow-up stays deferred, or prioritize `#0095` only if session-index drift blocks cost-guided decisions again.
+
+## Where to Look
+- `docs/tasks/0055.md` — next authoritative pending internal task
+- `docs/ops/OPERATIONS.md` — current healer/system-observation documentation and `lib-agent.sh` helper reference
+- `docs/evaluations/0007.md` — latest Phractal evidence confirming the Loop 1 evaluation gap cluster
+- `docs/healer/log.md` — current system-observation trail, including this session’s queue/cost observations