Recusive · fazxes · Apr 6, 2026 · Apr 6, 2026
diff --git a/README.md b/README.md
@@ -139,6 +139,11 @@ python3 -m nightshift multi /repo1 /repo2 --agent claude --test --cycles 1
 python3 -m nightshift module-map --write
 ```
 
+`python3 -m nightshift test ...` now keeps its state files, runner logs, and
+linked worktree under `$TMPDIR/nightshift-test-runs/...` so evaluation clones
+stay clean. Full `run` mode still writes repo-local runtime artifacts under
+`docs/Nightshift/`.
+
 ### From the installed skill bundle
 
 Use the bundled wrapper scripts:

diff --git a/docs/architecture/MODULE_MAP.md b/docs/architecture/MODULE_MAP.md
@@ -1,6 +1,6 @@
 # Module Map
 
-Last updated: 2026-04-05 by session #0059
+Last updated: 2026-04-06 by session #0062
 Generated via: `python3 -m nightshift module-map --write`
 Stale after: 5 newer sessions without a refresh
 
@@ -9,37 +9,39 @@ Read it before opening modules one by one when you need fast orientation.
 
 ## Modules (29)
 
-| Module | Lines | Purpose | Key symbols | Last changed |
-|---|---:|---|---|---|
-| `errors.py` | 7 | Nightshift error types. | `NightshiftError` | 2802c51 |
-| `eval_targets.py` | 96 | Known evaluation targets and their repo-specific verification settings. | `infer_target_verify_command`, `_KNOWN_TARGET_VERIFY_COMMANDS` | session #0059 |
-| `types.py` | 561 | Strict type definitions for all Nightshift data structures. | `NightshiftConfig`, `DiffScore`, `Counters`, `Baseline` | PR #88 (7e36fa5) |
-| `constants.py` | 745 | Module-level constants and tiny utilities used across the package. | `now_local`, `print_status`, `DATA_VERSION`, `SUPPORTED_AGENTS` | PR #88 (7e36fa5) |
-| `shell.py` | 161 | Subprocess execution: streaming runner, git helper, shell utilities. | `run_command`, `run_capture`, `git`, `command_exists` | PR #27 (9e953eb) |
-| `summary.py` | 141 | Feature summary generation for Loop 2 build output. | `generate_feature_summary`, `_API_DIR_SEGMENTS`, `_CLI_DIR_SEGMENTS`, `_CONFIG_DIR_SEGMENTS` | PR #67 (89f8cd6) |
-| `cleanup.py` | 337 | Daemon housekeeping -- log rotation, healer archiving, and branch pruning. | `rotate_healer_log`, `rotate_logs`, `prune_orphan_branches`, `_HEALER_ENTRY_RE` | PR #88 (7e36fa5) |
-| `compact.py` | 318 | Handoff compaction -- merges numbered handoff files into weekly summaries. | `compact_handoffs`, `_NUMBERED_RE`, `_SECTION_RE`, `_DATE_RE` | PR #83 (56e0c97) |
-| `coordination.py` | 192 | Sub-agent coordination for Loop 2 -- detects file overlaps and generates hints. | `extract_file_references`, `detect_overlaps`, `generate_coordination_hints`, `inject_hints` | PR #72 (a5a3e47) |
-| `costs.py` | 672 | Cost tracking for daemon sessions -- parse token usage from logs and maintain a ledger. | `parse_session_tokens`, `calculate_cost`, `read_ledger`, `write_ledger` | PR #89 (7211bd4) |
-| `module_map.py` | 298 | Generate a persistent module map for fast cross-session orientation. | `module_map_path`, `generate_module_map`, `render_module_map`, `write_module_map` | PR #86 (77e5c25) |
-| `readiness.py` | 211 | Production-readiness checks for Loop 2 feature builds. | `collect_changed_files`, `check_secrets`, `check_debug_prints`, `check_test_coverage` | PR #69 (3877225) |
-| `scoring.py` | 113 | Post-cycle diff scoring: evaluates production impact of cycle changes. | `score_diff`, `log_score` | PR #10 (3e5f98f) |
-| `state.py` | 187 | Shift state: read, write, mutate counters, JSON I/O. | `load_json`, `write_json`, `read_state`, `top_path` | PR #28 (60e4ed5) |
-| `config.py` | 241 | Configuration loading, agent resolution, and environment detection. | `merge_config`, `prompt_for_agent`, `resolve_agent`, `infer_package_manager` | session #0059 |
-| `multi.py` | 117 | Multi-repo shift orchestration: run hardening loops across multiple repos. | `validate_repos`, `format_multi_summary`, `run_multi_shift` | PR #22 (12ac402) |
-| `e2e.py` | 113 | End-to-end test runner for Loop 2 feature builds. | `infer_test_command`, `detect_smoke_test`, `run_e2e_tests`, `_MAKEFILE_TEST_TARGET` | PR #70 (95ef827) |
-| `profiler.py` | 569 | Repo profiling for Loop 2 -- detects language, framework, dependencies, structure. | `profile_repo` | PR #78 (5cc11a3) |
-| `worktree.py` | 213 | Git worktree lifecycle: create, shift log, sync, revert, cleanup. | `canonical_repo_relative_path`, `resolve_nightshift_dir`, `validate_worktree`, `validate_repo_checkout` | PR #96 (34244ff) |
-| `cycle.py` | 855 | Per-cycle logic: prompt building, agent dispatch, verification, evaluation. | `extract_json`, `read_repo_instructions`, `wrap_repo_instructions`, `command_for_agent` | PR #96 (34244ff) |
-| `evaluation.py` | 874 | Self-evaluation loop: score nightshift runs against real repos. | `clone_target_repo`, `run_test_shift`, `parse_shift_artifacts`, `score_startup` | PR #96 (34244ff) |
-| `planner.py` | 483 | Feature planner for Loop 2 -- builds structured plans from repo profiles. | `build_plan_prompt`, `validate_plan`, `parse_plan`, `execution_order` | PR #78 (5cc11a3) |
-| `subagent.py` | 281 | Sub-agent spawner for Loop 2 -- executes work orders via codex or claude CLI. | `spawn_task`, `spawn_wave`, `format_wave_result`, `_TASK_COMPLETION_REQUIRED_KEYS` | PR #33 (bd23cc4) |
-| `decomposer.py` | 175 | Task decomposer for Loop 2 -- converts FeaturePlans into sub-agent work orders. | `build_work_order_prompt`, `decompose_plan`, `format_work_orders` | PR #78 (5cc11a3) |
-| `integrator.py` | 325 | Wave integrator for Loop 2 -- merges sub-agent work, runs tests, handles failures. | `collect_wave_files`, `stage_files`, `run_test_suite`, `diagnose_failure` | PR #33 (bd23cc4) |
-| `feature.py` | 696 | Loop 2 feature-build orchestration and persisted build state. | `feature_state_path`, `feature_log_dir`, `read_feature_state`, `write_feature_state` | PR #78 (5cc11a3) |
-| `cli.py` | 543 | CLI entry points: run, test, summarize, verify-cycle, module-map. | `run_nightshift`, `summarize`, `verify_cycle_cli`, `plan_feature` | PR #96 (34244ff) |
-| `__main__.py` | 5 | Entry point for python3 -m nightshift. | `main` | 2802c51 |
-| `__init__.py` | 537 | Nightshift -- autonomous overnight codebase improvement agent. | `AGENT_DEFAULT_MODELS`, `BACKEND_DIR_NAMES`, `BACKEND_EXTENSIONS`, `CATEGORY_ORDER` | session #0059 |
+
+| Module            | Lines | Purpose                                                                                 | Key symbols                                                                                                            | Last changed      |
+| ----------------- | ----- | --------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | ----------------- |
+| `errors.py`       | 7     | Nightshift error types.                                                                 | `NightshiftError`                                                                                                      | 2802c51           |
+| `eval_targets.py` | 96    | Known evaluation targets and their repo-specific verification settings.                 | `infer_target_verify_command`, `_KNOWN_TARGET_VERIFY_COMMANDS`                                                         | PR #106 (e2d235c) |
+| `types.py`        | 561   | Strict type definitions for all Nightshift data structures.                             | `NightshiftConfig`, `DiffScore`, `Counters`, `Baseline`                                                                | PR #88 (7e36fa5)  |
+| `constants.py`    | 749   | Module-level constants and tiny utilities used across the package.                      | `now_local`, `print_status`, `DATA_VERSION`, `SUPPORTED_AGENTS`                                                        | session #0062     |
+| `shell.py`        | 161   | Subprocess execution: streaming runner, git helper, shell utilities.                    | `run_command`, `run_capture`, `git`, `command_exists`                                                                  | PR #27 (9e953eb)  |
+| `summary.py`      | 141   | Feature summary generation for Loop 2 build output.                                     | `generate_feature_summary`, `_API_DIR_SEGMENTS`, `_CLI_DIR_SEGMENTS`, `_CONFIG_DIR_SEGMENTS`                           | PR #67 (89f8cd6)  |
+| `cleanup.py`      | 337   | Daemon housekeeping -- log rotation, healer archiving, and branch pruning.              | `rotate_healer_log`, `rotate_logs`, `prune_orphan_branches`, `_HEALER_ENTRY_RE`                                        | PR #88 (7e36fa5)  |
+| `compact.py`      | 318   | Handoff compaction -- merges numbered handoff files into weekly summaries.              | `compact_handoffs`, `_NUMBERED_RE`, `_SECTION_RE`, `_DATE_RE`                                                          | PR #83 (56e0c97)  |
+| `coordination.py` | 192   | Sub-agent coordination for Loop 2 -- detects file overlaps and generates hints.         | `extract_file_references`, `detect_overlaps`, `generate_coordination_hints`, `inject_hints`                            | PR #72 (a5a3e47)  |
+| `costs.py`        | 672   | Cost tracking for daemon sessions -- parse token usage from logs and maintain a ledger. | `parse_session_tokens`, `calculate_cost`, `read_ledger`, `write_ledger`                                                | PR #89 (7211bd4)  |
+| `module_map.py`   | 298   | Generate a persistent module map for fast cross-session orientation.                    | `module_map_path`, `generate_module_map`, `render_module_map`, `write_module_map`                                      | PR #86 (77e5c25)  |
+| `readiness.py`    | 211   | Production-readiness checks for Loop 2 feature builds.                                  | `collect_changed_files`, `check_secrets`, `check_debug_prints`, `check_test_coverage`                                  | PR #69 (3877225)  |
+| `scoring.py`      | 113   | Post-cycle diff scoring: evaluates production impact of cycle changes.                  | `score_diff`, `log_score`                                                                                              | PR #10 (3e5f98f)  |
+| `state.py`        | 187   | Shift state: read, write, mutate counters, JSON I/O.                                    | `load_json`, `write_json`, `read_state`, `top_path`                                                                    | PR #28 (60e4ed5)  |
+| `config.py`       | 241   | Configuration loading, agent resolution, and environment detection.                     | `merge_config`, `prompt_for_agent`, `resolve_agent`, `infer_package_manager`                                           | PR #106 (e2d235c) |
+| `multi.py`        | 117   | Multi-repo shift orchestration: run hardening loops across multiple repos.              | `validate_repos`, `format_multi_summary`, `run_multi_shift`                                                            | PR #22 (12ac402)  |
+| `e2e.py`          | 113   | End-to-end test runner for Loop 2 feature builds.                                       | `infer_test_command`, `detect_smoke_test`, `run_e2e_tests`, `_MAKEFILE_TEST_TARGET`                                    | PR #70 (95ef827)  |
+| `profiler.py`     | 569   | Repo profiling for Loop 2 -- detects language, framework, dependencies, structure.      | `profile_repo`                                                                                                         | PR #78 (5cc11a3)  |
+| `worktree.py`     | 232   | Git worktree lifecycle: create, shift log, sync, revert, cleanup.                       | `canonical_repo_relative_path`, `resolve_nightshift_dir`, `resolve_shift_log_relative_dir`, `resolve_test_runtime_dir` | session #0062     |
+| `cycle.py`        | 855   | Per-cycle logic: prompt building, agent dispatch, verification, evaluation.             | `extract_json`, `read_repo_instructions`, `wrap_repo_instructions`, `command_for_agent`                                | PR #96 (34244ff)  |
+| `evaluation.py`   | 906   | Self-evaluation loop: score nightshift runs against real repos.                         | `clone_target_repo`, `run_test_shift`, `parse_shift_artifacts`, `score_startup`                                        | session #0062     |
+| `planner.py`      | 483   | Feature planner for Loop 2 -- builds structured plans from repo profiles.               | `build_plan_prompt`, `validate_plan`, `parse_plan`, `execution_order`                                                  | PR #78 (5cc11a3)  |
+| `subagent.py`     | 281   | Sub-agent spawner for Loop 2 -- executes work orders via codex or claude CLI.           | `spawn_task`, `spawn_wave`, `format_wave_result`, `_TASK_COMPLETION_REQUIRED_KEYS`                                     | PR #33 (bd23cc4)  |
+| `decomposer.py`   | 175   | Task decomposer for Loop 2 -- converts FeaturePlans into sub-agent work orders.         | `build_work_order_prompt`, `decompose_plan`, `format_work_orders`                                                      | PR #78 (5cc11a3)  |
+| `integrator.py`   | 325   | Wave integrator for Loop 2 -- merges sub-agent work, runs tests, handles failures.      | `collect_wave_files`, `stage_files`, `run_test_suite`, `diagnose_failure`                                              | PR #33 (bd23cc4)  |
+| `feature.py`      | 696   | Loop 2 feature-build orchestration and persisted build state.                           | `feature_state_path`, `feature_log_dir`, `read_feature_state`, `write_feature_state`                                   | PR #78 (5cc11a3)  |
+| `cli.py`          | 550   | CLI entry points: run, test, summarize, verify-cycle, module-map.                       | `run_nightshift`, `summarize`, `verify_cycle_cli`, `plan_feature`                                                      | session #0062     |
+| `__main__.py`     | 5     | Entry point for python3 -m nightshift.                                                  | `main`                                                                                                                 | 2802c51           |
+| `__init__.py`     | 547   | Nightshift -- autonomous overnight codebase improvement agent.                          | `AGENT_DEFAULT_MODELS`, `BACKEND_DIR_NAMES`, `BACKEND_EXTENSIONS`, `CATEGORY_ORDER`                                    | session #0062     |
+
 
 ## Dependency Order
 
@@ -50,8 +52,9 @@ Topological order derived from internal `nightshift.*` imports.
 
 ## Recent Shipped Sessions
 
-- PR #105: docs: close stale eval startup task
-- PR #104: fix: gate autonomous queue on eval score
-- PR #99: test: cover malformed task frontmatter edge case
-- PR #98: docs: track task parser review follow-up
-- PR #97: feat: add task frontmatter validator
+- PR #125: feat: watchdog + natural task creation (no artificial caps)
+- PR #124: fix: round 6 audit — 9 remaining issues patched
+- PR #123: feat: overseer rewrite — ticket closer, not process auditor
+- PR #122: overseer: dedupe auto-release queue
+- PR #121: overseer: fix unified-daemon operator docs
+
diff --git a/docs/changelog/v0.0.8.md b/docs/changelog/v0.0.8.md
@@ -14,6 +14,7 @@ Closing the self-maintaining gap: auto-release, auto-changelog, evaluation CLI,
 - **[docs]** Refreshed `README.md` against the live repo so it now documents the real `python3 -m nightshift` entry points, installed wrapper scripts, current tracker snapshot, current config surface, and the current handoff/learnings/task workflow instead of stale marketing-era commands and percentages. (tasks `#0118`, `#0067`)
 
 ## Fixed
+- **[fix]** `nightshift test` now keeps evaluation state, runner logs, and linked worktrees under an isolated temp-root runtime directory, so rejected Phractal eval runs no longer dirty the cloned target repo while evaluation artifact parsing still finds the shifted state/log files. (task `#0100`)
 - **[fix]** Shell scripts in `scripts/` now use ASCII-only section dividers and restart/status text, removing box-drawing and em-dash characters that violated repo conventions and rendered inconsistently across terminals/filesystems. (task #0038)
 - **[meta]** Corrected the authoritative Step 0 evaluation command in `docs/prompt/evolve.md` so fresh-clone Phractal evaluations pass `--repo-dir /tmp/nightshift-eval` from the Nightshift repo root instead of accidentally targeting the Nightshift checkout. (task `#0117`)
 - **[fix]** Nightshift now resolves the repo's actual `docs/` casing across runtime artifacts, shift-log verification, and evaluation artifact parsing, so repos that use `Docs/Nightshift/` no longer get false rejected cycles or mis-targeted self-evaluation reads, and legitimate final-cycle shift-log summary commits no longer trip the extra-commit guard rail. (tasks `#0098`, `#0121`)
@@ -22,6 +23,7 @@ Closing the self-maintaining gap: auto-release, auto-changelog, evaluation CLI,
 ## Removed
 
 ## Internal
+- **[test]** Added regression coverage for isolated test-mode runtime artifacts and for rejected test-mode runs leaving the cloned target repo clean. Test suite is now 992 passing.
 - **[test]** Added regression coverage for repo-URL-based evaluation verifier selection, percent-bearing git remote URLs, and the documentation contract for known target metadata. Test suite is now 943 passing.
 - **[meta]** Recorded `docs/evaluations/0014.md` from a fresh-clone Phractal run, confirmed the default Claude startup path still launches cleanly without `CLAUDECODE` or effort overrides, and closed stale eval task `#0097` so the eval gate now points at the remaining verification/cleanup gaps instead of obsolete startup drift.
 - **[meta]** Added an eval-score gate to `docs/prompt/evolve-auto.md` and mirrored it in the builder operations docs so, after Step 0, any latest real-repo evaluation below `80/100` forces the autonomous builder to prefer eval-related normal-priority tasks over unrelated queue cleanup. Added prompt-contract regression coverage for the new rule and recorded fresh Phractal evaluation `docs/evaluations/0013.md` at `70/100`. (task `#0131`)