From e7bdf231d20210db484fcc95d5e0d125ab42a2c1 Mon Sep 17 00:00:00 2001 From: No9 Labs Date: Thu, 9 Apr 2026 02:53:11 -0400 Subject: [PATCH 1/2] audit: session-126 framework audit -- fix 6 doc issues, create 2 tasks Quality audit findings and fixes (18 sessions since last audit): 1. OPERATIONS.md: test count 915 -> 1156 (stale by 241 tests) 2. OPERATIONS.md: add v0.0.7 and v0.0.8 version milestones (both missing) 3. OPERATIONS.md + CLAUDE.md: sync dependency flows -- both were divergent; add owl/eval_runner; correct settings/eval_targets ordering 4. DAEMON.md: arg 2 was "pause seconds" but daemon uses it as duration_hours 5. DAEMON.md: remove hardcoded absolute paths (/Users/no9labs/...) with wrong .recursive placement; replace with relative paths 6. DAEMON.md: remove stale pentest log references (v1 artifact, files don't exist) 7. ROLE-SCORING.md: add pentest_framework_tasks and sessions_since_eval signals (both active in pick-role.py since sessions #109 and #124 respectively) 8. sessions/index.md: fix corrupted role field in session 20260409-020609 (shell injection artifact from regex extractor: .*'"$LOG_FILE"2>/d -> brain) Tasks created: - #0249: Regenerate MODULE_MAP.md (stale since session #0001, shows only 3 modules) - #0250: Fix DAEMON.md cycle lifecycle git commands (shows checkout+clean-fd, actual daemon only does fetch+reset) make check: 1156 tests pass --- .recursive/ops/DAEMON.md | 15 ++- .recursive/ops/OPERATIONS.md | 22 +++- .recursive/ops/ROLE-SCORING.md | 9 +- .recursive/reviews/2026-04-09-audit-126.md | 111 +++++++++++++++++++++ .recursive/sessions/index.md | 2 +- .recursive/tasks/.next-id | 2 +- .recursive/tasks/0249.md | 22 ++++ .recursive/tasks/0250.md | 35 +++++++ CLAUDE.md | 2 +- 9 files changed, 204 insertions(+), 16 deletions(-) create mode 100644 .recursive/reviews/2026-04-09-audit-126.md create mode 100644 .recursive/tasks/0249.md create mode 100644 .recursive/tasks/0250.md diff --git a/.recursive/ops/DAEMON.md b/.recursive/ops/DAEMON.md index 52ba639..d84fdae 100644 --- a/.recursive/ops/DAEMON.md +++ b/.recursive/ops/DAEMON.md @@ -55,8 +55,8 @@ bash .recursive/engine/daemon.sh claude 60 10 Arguments are: 1. agent name -2. pause between sessions in seconds -3. max sessions (`0` means loop forever) +2. duration in hours (default: 8) +3. max sessions (`0` means unlimited, use duration limit) ### tmux @@ -98,7 +98,7 @@ Valid values are `build`, `review`, `oversee`, `strategize`, `achieve`, `securit ## How Role Selection Works At the start of every cycle, `.recursive/engine/daemon.sh` calls -[.recursive/engine/pick-role.py](/Users/no9labs/Developer/.recursive/Nightshift/.recursive/engine/pick-role.py). +`.recursive/engine/pick-role.py`. That scorer reads the live system state and prints one winner. Primary inputs: @@ -111,8 +111,7 @@ Primary inputs: - the latest report in `.recursive/autonomy/` - open GitHub issues labeled `needs-human` -The exact math belongs in -[.recursive/ops/ROLE-SCORING.md](/Users/no9labs/Developer/.recursive/Nightshift/.recursive/ops/ROLE-SCORING.md), +The exact math belongs in `.recursive/ops/ROLE-SCORING.md`, not in this file. Read that file when debugging "why did the daemon pick this role?" behavior. @@ -205,8 +204,7 @@ These are the authoritative runtime artifacts: | Path | Purpose | |------|---------| | `.recursive/sessions/index.md` | Unified session history across all roles | -| `.recursive/sessions/*.log` | Stream-json session logs | -| `.recursive/sessions/*-pentest.log` | Pentest preflight logs | +| `.recursive/sessions/raw/*.log` | Stream-json session logs | | `.recursive/sessions/costs.json` | Cost ledger used by budget checks | | `.recursive/handoffs/LATEST.md` | Short-term memory for the next cycle | | `.recursive/evaluations/*.md` | Real-repo evaluation reports | @@ -321,8 +319,7 @@ resets the repo, and injects that alert into the next cycle. The circuit breaker stops the daemon after three failed cycles. Inspect: - `.recursive/sessions/index.md` -- the latest session log -- the latest pentest log +- the latest session log in `.recursive/sessions/raw/` - `.recursive/handoffs/LATEST.md` ### Budget stop diff --git a/.recursive/ops/OPERATIONS.md b/.recursive/ops/OPERATIONS.md index 26c6b40..3d5fda5 100644 --- a/.recursive/ops/OPERATIONS.md +++ b/.recursive/ops/OPERATIONS.md @@ -454,7 +454,7 @@ The Python package that IS Nightshift. The overnight hardening runner. ### Dependency flow (nightshift package) ``` -core/errors → settings/eval_targets → core/types → core/constants → core/shell → raven/summary → raven/coordination → infra/module_map → owl/readiness → owl/scoring → core/state → settings/config → infra/multi → raven/e2e → raven/profiler → infra/worktree → owl/cycle → raven/planner → raven/subagent → raven/decomposer → raven/integrator → raven/feature → cli +core/errors → core/types → core/constants → core/shell → raven/summary → raven/coordination → infra/module_map → owl/readiness → owl/scoring → owl/eval_runner → core/state → settings/config → settings/eval_targets → infra/multi → raven/e2e → raven/profiler → infra/worktree → owl/cycle → raven/planner → raven/subagent → raven/decomposer → raven/integrator → raven/feature → infra/release → cli ``` No circular imports. Each module only imports from modules to its left. `multi.py` receives the `run_nightshift` callable from `cli.py` via dependency injection to avoid circular deps. @@ -472,7 +472,7 @@ Note: `cleanup.py`, `compact.py`, `costs.py`, `evaluation.py`, and `config.py` ( ## System 7: Tests (`nightshift/tests/`) ### What it is -915 pytest tests covering every pure function, config, state, CLI, and integration. +1156 pytest tests covering every pure function, config, state, CLI, and integration. ### Files | File | Purpose | @@ -797,6 +797,24 @@ What defines each version. Use this to know when a release is ready. - [x] Wave integrator module (`nightshift/integrator.py`) - [x] `nightshift build` CLI command (`nightshift/feature.py` -- build/status/resume) +### v0.0.7 — Security Hardening (released 2026-04-05) +- [x] Prompt injection protection for target repos +- [x] Prompt self-modification guard across all daemon scripts +- [x] Cost tracking and budget ceiling for daemon sessions +- [x] Daemon log rotation and orphan branch pruning +- [x] Automated handoff compaction in daemon +- [x] Configurable model/effort/thinking per agent + +### v0.0.8 — Self-Maintaining (in progress) +- [x] Auth-error circuit breaker bypass with notify_human +- [x] Auto-release module (`nightshift/infra/release.py`) +- [x] Eval runner CLI (`nightshift/owl/eval_runner.py`) +- [x] Session index writer rewrite (single-line rows, delegation-aware counters) +- [x] Worktree cleanup rewrite with self-removal guard +- [x] Eval staleness signal in dashboard +- [x] Delegation-aware sessions-since counters (signals.py + pick-role.py) +- [ ] Wire E2E eval into daemon loop automatically + ### v1.0.0 — Production - [ ] Loop 1 runs reliably overnight on real repos - [ ] Loop 2 can build a simple feature end-to-end diff --git a/.recursive/ops/ROLE-SCORING.md b/.recursive/ops/ROLE-SCORING.md index 11df446..181a92b 100644 --- a/.recursive/ops/ROLE-SCORING.md +++ b/.recursive/ops/ROLE-SCORING.md @@ -60,6 +60,8 @@ used when the source file is missing or unreadable. | `tracker_moved` | Session index -- any `%` in recent status cells | false | | `recent_security_sessions` | Session index + archived pentest tasks (dual-signal) | 0 | | `friction_entries` | `.recursive/friction/log.md` -- count `## YYYY-MM-DD` headers | 0 | +| `pentest_framework_tasks` | `.recursive/tasks/` -- pending tasks with `source: pentest` AND `target: recursive` | 0 | +| `sessions_since_eval` | `.recursive/evaluations/` vs session index -- sessions since latest eval file was written | 0 | **Eval file validation**: `read_latest_eval_score()` validates the file before reading the score. A file must have a `**Date**:` line and at least 3 scored dimension rows (`N/10` format) outside @@ -160,9 +162,12 @@ friction_entries >= 5: +50 (lots of friction accumulated) friction_entries >= 3 AND sessions_since_evolve >= 5: +30 (moderate friction, hasn't evolved recently) sessions_since_evolve >= 20: +20 (overdue regardless of friction count) +pentest_framework_tasks >= 1: +40 (confirmed security vuln in .recursive/ -- security urgency) -Hard cap: capped at 5 if sessions_since_evolve < 5 (don't re-run too frequently) -Hard cap: capped at 5 if friction_entries == 0 (no friction = nothing to evolve) +Hard cap: capped at 5 if sessions_since_evolve < 5 AND pentest_framework_tasks == 0 + (don't re-run too frequently unless pentest tasks pending) +Hard cap: capped at 5 if friction_entries == 0 AND pentest_framework_tasks == 0 + (no friction and no pentest tasks = nothing to evolve) ``` ### AUDIT -- framework quality review diff --git a/.recursive/reviews/2026-04-09-audit-126.md b/.recursive/reviews/2026-04-09-audit-126.md new file mode 100644 index 0000000..236968f --- /dev/null +++ b/.recursive/reviews/2026-04-09-audit-126.md @@ -0,0 +1,111 @@ +# Framework Audit -- Session #126 + +**Date**: 2026-04-09 +**Trigger**: 18 sessions since last audit (session #107) +**Auditor**: audit-agent + +--- + +## Quality Audit + +Files audited: 12 +- `.recursive/ops/OPERATIONS.md` +- `.recursive/ops/DAEMON.md` +- `.recursive/ops/ROLE-SCORING.md` +- `.recursive/engine/daemon.sh` +- `.recursive/engine/lib-agent.sh` +- `.recursive/engine/signals.py` +- `.recursive/engine/dashboard.py` +- `.recursive/engine/pick-role.py` +- `.recursive/agents/brain.md` +- `.recursive/sessions/index.md` +- `.recursive/architecture/MODULE_MAP.md` +- `CLAUDE.md` + +Issues found: 8 +Issues fixed in this PR: 6 +Tasks created for remaining issues: 2 (#0249, #0250) + +### Issues Fixed + +1. **OPERATIONS.md: Stale test count** -- Updated "915 pytest tests" to "1156 pytest tests". The count grew from 915 to 1156 across sessions #107-#125 with 241 new tests added. + +2. **OPERATIONS.md: Missing version milestones** -- Added v0.0.7 (Security Hardening, released 2026-04-05) and v0.0.8 (Self-Maintaining, in progress) milestone entries. The doc stopped at v0.0.6 despite both versions having changelog files. + +3. **DAEMON.md: Wrong argument description** -- Arg 2 was described as "pause between sessions in seconds" but daemon.sh uses it as `duration_hours` (default 8). The tmux example `daemon.sh claude 60` would mean 60 hours, not 60 second pause. Fixed description to "duration in hours (default: 8)" and arg 3 to clarify 0=unlimited. + +4. **DAEMON.md: Hardcoded absolute paths** -- Lines 101 and 115 had absolute links `/Users/no9labs/Developer/.recursive/Nightshift/...` with `.recursive` misplaced in the path. Replaced with relative paths. + +5. **DAEMON.md: Stale pentest log references** -- Referenced `.recursive/sessions/*-pentest.log` (v1 era artifact) in the Logs table and circuit breaker recovery section. These files do not exist in the v2 architecture. Replaced with `.recursive/sessions/raw/*.log`. + +6. **ROLE-SCORING.md: Missing signals** -- The signals table was missing `pentest_framework_tasks` (added session #109, used to boost evolve +40) and `sessions_since_eval` (added session #124, used for eval staleness alert). Both signals are actively used in pick-role.py and dashboard.py. Added both to the signals table and documented the `pentest_framework_tasks` boost in the EVOLVE scoring section. + +7. **sessions/index.md: Corrupted role field** -- Session 20260409-020609 had role `.*'"$LOG_FILE"2>/d` (shell injection artifact from a regex-extraction bug in daemon.sh's role extractor). Corrected to `brain`. + +8. **CLAUDE.md + OPERATIONS.md: Divergent dependency flows** -- CLAUDE.md was missing `raven.summary`, `raven.coordination`, `raven.e2e`, `raven.profiler` modules and had wrong ordering of `settings.config`/`settings.eval_targets`. Neither file included `owl.eval_runner` (added session #118). Synchronized both files to a consistent flow that includes all current modules. + +### Tasks Created + +- **#0249**: Regenerate MODULE_MAP.md -- stale since session #0001, shows only 3 modules, actual package has 20+. Requires `make` or the CLI command which touches nightshift/ (build zone). + +- **#0250**: Fix DAEMON.md cycle lifecycle description -- shows `git checkout main` and `git clean -fd` which don't appear in daemon.sh. Framework zone (evolve agent). + +--- + +## Pattern Analysis + +Sessions analyzed: 19 (sessions #107-#125) +Commitment hit rate: 19/19 = **100%** (perfect streak) +Cost trend: **stable** (~$1.5-2.2 USD/session, with outliers for complex parallel sessions) + +### Decision Patterns + +**Role distribution (last 19 sessions)**: +- build: 9 delegations +- evolve: 12 delegations (many sessions ran both build+evolve in parallel) +- oversee: 1 delegation (#0122) +- strategize: 1 delegation (#0123) +- security: 1 delegation (#0109, #0110) +- audit: 1 delegation (#0107, this session) +- review: 0 delegations + +**Observations**: +- Build+evolve parallel pattern is the dominant strategy (used in 8 of 19 sessions). It's efficient and produces good throughput. +- Security was run twice in rapid succession (#109, #110) -- the pentest->evolve fix cycle worked well. +- Review role has not been delegated in 19 sessions. The code-reviewer/safety-reviewer sub-agents are used per-PR but the standalone review role (file-by-file quality review) has been skipped. +- Advisory overrides are common but justified (5 of 19 sessions). + +**Override quality**: All overrides were justified with clear rationale. No habitual overrides observed. + +### Commitment Quality + +All 19 commitments were MET. Specific observations: +- Predictions are consistently calibrated -- specific and measurable +- Eval score predictions tend to be conservative (>= threshold) rather than point estimates +- Test count predictions consistently underestimate actual new tests (e.g., predicted 3+, got 25) + +### Cost Analysis + +Session costs range from $0.39 to $2.60 USD. Two sessions stood out: +- Session #0110 ($2.38): Most expensive -- parallel build+evolve with complex security work +- Session #0114 ($2.60): Most expensive overall -- release module with 3 fix cycles + +Cost trend is stable. No drift upward or downward. + +### Optimization Opportunities + +1. **Review role gap**: 19 sessions without a standalone review. The review role does file-by-file deep quality checks that per-PR reviewers don't do. Consider triggering when consecutive_builds >= 10 (currently 5, but the brain often parallels build+evolve which doesn't change this counter). + +2. **Eval regression tracking**: Eval dropped from 86 to 83 between #0016 and #0017. The count-only payload issue in state file is task #0247. Track whether fixing this brings eval back above 86. + +3. **Queue not shrinking**: Queue stabilized at 69 after oversee in #0122 but hasn't continued to decrease. With 2 new tasks created per session on average, the queue will grow unless oversee runs more frequently. + +4. **MODULE_MAP.md rot**: The module map hasn't been regenerated since session #0001 and shows only 3 modules. Dashboard and brain signals that reference the map get no useful data. Task #0249 addresses this. + +--- + +## Verification + +- `make check` passes (1156 tests) +- No framework files in nightshift/ touched +- All 8 issues identified; 6 fixed directly; 2 tasked diff --git a/.recursive/sessions/index.md b/.recursive/sessions/index.md index 70ce23c..8b80c2a 100644 --- a/.recursive/sessions/index.md +++ b/.recursive/sessions/index.md @@ -85,5 +85,5 @@ | 2026-04-08 22:41 | 20260408-183951 | brain | 0 | - | - | success | #0177 eval rerun (53->86) + #0203 ROLE-SCORING v2 | [#198](https://github.com/Recusive/Nightshift/pull/198), [#197](https://github.com/Recusive/Nightshift/pull/197) | | 2026-04-09 01:44 | 20260409-011757 | brain | 0 | 26m | $2.158 | success | - | - | | 2026-04-09 02:06 | 20260409-014441 | brain | 0 | 21m | $2.0223 | success [PROMPT MODIFIED] [ORIGIN MODIFIED] | - | - | -| 2026-04-09 02:25 | 20260409-020609 | .*'\"$LOG_FILE\"2>/d | 0 | 18m | $2.2003 | success [PROMPT MODIFIED] | - | - | +| 2026-04-09 02:25 | 20260409-020609 | brain | 0 | 18m | $2.2003 | success [PROMPT MODIFIED] | - | - | | 2026-04-09 02:42 | 20260409-022508 | brain | 0 | 17m | $1.3038 | success [PROMPT MODIFIED] | - | - | diff --git a/.recursive/tasks/.next-id b/.recursive/tasks/.next-id index 720fe95..1f7e0d6 100644 --- a/.recursive/tasks/.next-id +++ b/.recursive/tasks/.next-id @@ -1 +1 @@ -249 +251 diff --git a/.recursive/tasks/0249.md b/.recursive/tasks/0249.md new file mode 100644 index 0000000..524665b --- /dev/null +++ b/.recursive/tasks/0249.md @@ -0,0 +1,22 @@ +--- +status: pending +priority: normal +target: recursive +source: audit +created: 2026-04-09 +completed: +--- + +# Regenerate MODULE_MAP.md (stale since session #0001) + +The `.recursive/architecture/MODULE_MAP.md` was last generated in session #0001 and shows only 3 top-level modules. It is severely stale -- the package has grown to include `core/`, `settings/`, `owl/`, `raven/`, and `infra/` subpackages with 20+ modules. The stale map gives future sessions incorrect orientation data. + +## Acceptance Criteria +- [ ] Run `python3 -m nightshift module-map --write` from the repo root +- [ ] Verify the new MODULE_MAP.md shows all subpackages and modules +- [ ] Verify the dependency order matches the flow in CLAUDE.md +- [ ] Commit the updated MODULE_MAP.md +- [ ] PR passes code-reviewer + +## Notes +This is a build-zone task (touches nightshift/). The command auto-generates the file -- no manual editing needed. diff --git a/.recursive/tasks/0250.md b/.recursive/tasks/0250.md new file mode 100644 index 0000000..15a14ea --- /dev/null +++ b/.recursive/tasks/0250.md @@ -0,0 +1,35 @@ +--- +status: pending +priority: normal +target: recursive +source: audit +created: 2026-04-09 +completed: +--- + +# Fix DAEMON.md cycle lifecycle description (git commands inaccurate) + +The DAEMON.md "Cycle Lifecycle" section shows these commands: + +``` +git fetch origin +git checkout main +git reset --hard origin/main +git clean -fd +``` + +But the actual `daemon.sh` only runs: + +``` +git -C "$REPO_DIR" fetch origin main --quiet +git -C "$REPO_DIR" reset --hard origin/main --quiet +``` + +No `git checkout main` and no `git clean -fd`. The doc is misleading agents who read it to understand the reset behavior. + +## Acceptance Criteria +- [ ] Update DAEMON.md "1. Reset and housekeeping" section to match actual daemon.sh reset commands +- [ ] PR passes docs-reviewer + +## Notes +Framework-zone task (touches `.recursive/ops/DAEMON.md`). Delegate to evolve agent. diff --git a/CLAUDE.md b/CLAUDE.md index f4cbccf..94ada92 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -167,7 +167,7 @@ These are enforced by CI. Non-negotiable. - **One concern per module.** If you're adding >50 lines of new logic to an existing file, it belongs in its own module. cycle.py handles cycle logic -- not scoring. cli.py handles CLI -- not business logic. - **No hardcoded data in logic files.** Regex patterns, score maps, category weights, thresholds -- these go in `core/constants.py` or a dedicated `*_patterns.py`. Logic files import them. - **New module checklist:** create the `.py` file in the appropriate subpackage (`core/`, `settings/`, `owl/`, `raven/`, `infra/`), add to the subpackage's `__init__.py` re-exports, add to `nightshift/scripts/install.sh` PACKAGE_FILES, add to this file's structure tree. -- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> core.state -> settings.config -> settings.eval_targets -> owl.cycle -> owl.scoring -> owl.readiness -> raven.planner -> raven.decomposer -> raven.subagent -> raven.integrator -> raven.feature -> infra.worktree -> infra.module_map -> infra.multi -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.) +- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> raven.summary -> raven.coordination -> infra.module_map -> owl.readiness -> owl.scoring -> owl.eval_runner -> core.state -> settings.config -> settings.eval_targets -> infra.multi -> raven.e2e -> raven.profiler -> infra.worktree -> owl.cycle -> raven.planner -> raven.subagent -> raven.decomposer -> raven.integrator -> raven.feature -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.) - **Functions over inline code.** If a block of code does one thing and is >10 lines, extract it into a named function. The function name documents the intent. - **Config over magic numbers.** If a value might change (thresholds, limits, timeouts), put it in `DEFAULT_CONFIG` and `core/types.py`, not inline. From cc85939b3906812f0b4a0778a07e7cc7d4eed2e0 Mon Sep 17 00:00:00 2001 From: No9 Labs Date: Thu, 9 Apr 2026 03:04:00 -0400 Subject: [PATCH 2/2] fix: correct dependency flow ordering and ROLE-SCORING signal docs Move owl.eval_runner after settings.eval_targets in the dependency chain in both CLAUDE.md and OPERATIONS.md, since eval_runner imports settings.config. Mark sessions_since_eval in ROLE-SCORING.md as dashboard-only (not used in pick-role.py scoring) to prevent misleading documentation. --- .recursive/ops/OPERATIONS.md | 2 +- .recursive/ops/ROLE-SCORING.md | 2 +- CLAUDE.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.recursive/ops/OPERATIONS.md b/.recursive/ops/OPERATIONS.md index 3d5fda5..e09dd46 100644 --- a/.recursive/ops/OPERATIONS.md +++ b/.recursive/ops/OPERATIONS.md @@ -454,7 +454,7 @@ The Python package that IS Nightshift. The overnight hardening runner. ### Dependency flow (nightshift package) ``` -core/errors → core/types → core/constants → core/shell → raven/summary → raven/coordination → infra/module_map → owl/readiness → owl/scoring → owl/eval_runner → core/state → settings/config → settings/eval_targets → infra/multi → raven/e2e → raven/profiler → infra/worktree → owl/cycle → raven/planner → raven/subagent → raven/decomposer → raven/integrator → raven/feature → infra/release → cli +core/errors → core/types → core/constants → core/shell → raven/summary → raven/coordination → infra/module_map → owl/readiness → owl/scoring → core/state → settings/config → settings/eval_targets → owl/eval_runner → infra/multi → raven/e2e → raven/profiler → infra/worktree → owl/cycle → raven/planner → raven/subagent → raven/decomposer → raven/integrator → raven/feature → infra/release → cli ``` No circular imports. Each module only imports from modules to its left. `multi.py` receives the `run_nightshift` callable from `cli.py` via dependency injection to avoid circular deps. diff --git a/.recursive/ops/ROLE-SCORING.md b/.recursive/ops/ROLE-SCORING.md index 181a92b..5c85849 100644 --- a/.recursive/ops/ROLE-SCORING.md +++ b/.recursive/ops/ROLE-SCORING.md @@ -61,7 +61,7 @@ used when the source file is missing or unreadable. | `recent_security_sessions` | Session index + archived pentest tasks (dual-signal) | 0 | | `friction_entries` | `.recursive/friction/log.md` -- count `## YYYY-MM-DD` headers | 0 | | `pentest_framework_tasks` | `.recursive/tasks/` -- pending tasks with `source: pentest` AND `target: recursive` | 0 | -| `sessions_since_eval` | `.recursive/evaluations/` vs session index -- sessions since latest eval file was written | 0 | +| `sessions_since_eval` | `.recursive/evaluations/` vs session index -- sessions since latest eval file was written (dashboard-only, not used in scoring) | 0 | **Eval file validation**: `read_latest_eval_score()` validates the file before reading the score. A file must have a `**Date**:` line and at least 3 scored dimension rows (`N/10` format) outside diff --git a/CLAUDE.md b/CLAUDE.md index 94ada92..a46cb1f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -167,7 +167,7 @@ These are enforced by CI. Non-negotiable. - **One concern per module.** If you're adding >50 lines of new logic to an existing file, it belongs in its own module. cycle.py handles cycle logic -- not scoring. cli.py handles CLI -- not business logic. - **No hardcoded data in logic files.** Regex patterns, score maps, category weights, thresholds -- these go in `core/constants.py` or a dedicated `*_patterns.py`. Logic files import them. - **New module checklist:** create the `.py` file in the appropriate subpackage (`core/`, `settings/`, `owl/`, `raven/`, `infra/`), add to the subpackage's `__init__.py` re-exports, add to `nightshift/scripts/install.sh` PACKAGE_FILES, add to this file's structure tree. -- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> raven.summary -> raven.coordination -> infra.module_map -> owl.readiness -> owl.scoring -> owl.eval_runner -> core.state -> settings.config -> settings.eval_targets -> infra.multi -> raven.e2e -> raven.profiler -> infra.worktree -> owl.cycle -> raven.planner -> raven.subagent -> raven.decomposer -> raven.integrator -> raven.feature -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.) +- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> raven.summary -> raven.coordination -> infra.module_map -> owl.readiness -> owl.scoring -> core.state -> settings.config -> settings.eval_targets -> owl.eval_runner -> infra.multi -> raven.e2e -> raven.profiler -> infra.worktree -> owl.cycle -> raven.planner -> raven.subagent -> raven.decomposer -> raven.integrator -> raven.feature -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.) - **Functions over inline code.** If a block of code does one thing and is >10 lines, extract it into a named function. The function name documents the intent. - **Config over magic numbers.** If a value might change (thresholds, limits, timeouts), put it in `DEFAULT_CONFIG` and `core/types.py`, not inline.