Skip to content

audit: session-126 framework audit (18 sessions since last)#248

Merged
fazxes merged 2 commits intomainfrom
audit/session-126-framework
Apr 9, 2026
Merged

audit: session-126 framework audit (18 sessions since last)#248
fazxes merged 2 commits intomainfrom
audit/session-126-framework

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 9, 2026

Summary

Framework audit covering 18 sessions since the last audit (session #107). Audited 12 framework files. Found 8 issues; fixed 6 directly; created 2 tasks for issues requiring project-zone work.

Issues Fixed

  • OPERATIONS.md test count: 915 -> 1156 (stale by 241 tests added across sessions feat: unified daemon — agent picks its own role each cycle #107-feat: watchdog + natural task creation #125)
  • OPERATIONS.md version milestones: Added v0.0.7 (Security Hardening, released) and v0.0.8 (Self-Maintaining, in progress) entries -- doc stopped at v0.0.6 despite both versions having changelog files
  • DAEMON.md argument description: Arg 2 described as "pause between sessions in seconds" but daemon.sh uses it as DURATION_HOURS (default 8). The tmux example daemon.sh claude 60 would be interpreted as 60 hours by the daemon.
  • DAEMON.md hardcoded paths: Lines 101+115 had absolute paths /Users/no9labs/Developer/.recursive/Nightshift/... with .recursive misplaced. Replaced with plain relative references.
  • DAEMON.md stale pentest log references: Referenced .recursive/sessions/*-pentest.log (v1 artifact, no longer exists in v2). Removed from Logs table and circuit breaker recovery instructions.
  • ROLE-SCORING.md missing signals: pentest_framework_tasks (added session feat: live formatted output in daemon tmux pane #109, +40 boost to evolve) and sessions_since_eval (added session fix: round 6 — all 9 remaining issues patched #124, staleness alert) were not documented. Both are actively used in pick-role.py.
  • sessions/index.md corrupted row: Session 20260409-020609 had role field .*'"$LOG_FILE"2>/d (shell injection artifact from daemon.sh role extractor). Corrected to brain.
  • CLAUDE.md + OPERATIONS.md divergent dependency flows: The two files had different, inconsistent dependency chains. Neither included owl/eval_runner (added session Rewrite DAEMON.md for unified daemon + pick-role.py architecture #118). Synchronized both to a consistent flow.

Tasks Created

Pattern Analysis

Sessions #107-#125 analyzed:

  • Commitment hit rate: 100% (19/19 MET, perfect streak)
  • Cost trend: stable (~$1.5-2.2 USD/session)
  • Review role: 0 delegations in 19 sessions (standalone review, not per-PR reviewers)
  • Queue trend: not shrinking -- stabilized at 69 after oversee but ~2 new tasks per session means slow growth

Test Plan

  • make check passes (1156 tests, no regressions)
  • No nightshift/ files modified (framework zone only)
  • All contradictions resolved
  • Dead paths removed
  • Stale counts updated

Quality audit findings and fixes (18 sessions since last audit):

1. OPERATIONS.md: test count 915 -> 1156 (stale by 241 tests)
2. OPERATIONS.md: add v0.0.7 and v0.0.8 version milestones (both missing)
3. OPERATIONS.md + CLAUDE.md: sync dependency flows -- both were divergent;
   add owl/eval_runner; correct settings/eval_targets ordering
4. DAEMON.md: arg 2 was "pause seconds" but daemon uses it as duration_hours
5. DAEMON.md: remove hardcoded absolute paths (/Users/no9labs/...) with wrong
   .recursive placement; replace with relative paths
6. DAEMON.md: remove stale pentest log references (v1 artifact, files don't exist)
7. ROLE-SCORING.md: add pentest_framework_tasks and sessions_since_eval signals
   (both active in pick-role.py since sessions #109 and #124 respectively)
8. sessions/index.md: fix corrupted role field in session 20260409-020609
   (shell injection artifact from regex extractor: .*'"$LOG_FILE"2>/d -> brain)

Tasks created:
- #249: Regenerate MODULE_MAP.md (stale since session #1, shows only 3 modules)
- #250: Fix DAEMON.md cycle lifecycle git commands (shows checkout+clean-fd, actual
         daemon only does fetch+reset)

make check: 1156 tests pass
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7bdf231d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread CLAUDE.md Outdated
- **No hardcoded data in logic files.** Regex patterns, score maps, category weights, thresholds -- these go in `core/constants.py` or a dedicated `*_patterns.py`. Logic files import them.
- **New module checklist:** create the `.py` file in the appropriate subpackage (`core/`, `settings/`, `owl/`, `raven/`, `infra/`), add to the subpackage's `__init__.py` re-exports, add to `nightshift/scripts/install.sh` PACKAGE_FILES, add to this file's structure tree.
- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> core.state -> settings.config -> settings.eval_targets -> owl.cycle -> owl.scoring -> owl.readiness -> raven.planner -> raven.decomposer -> raven.subagent -> raven.integrator -> raven.feature -> infra.worktree -> infra.module_map -> infra.multi -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.)
- **Follow the dependency flow:** `core.errors -> core.types -> core.constants -> core.shell -> raven.summary -> raven.coordination -> infra.module_map -> owl.readiness -> owl.scoring -> owl.eval_runner -> core.state -> settings.config -> settings.eval_targets -> infra.multi -> raven.e2e -> raven.profiler -> infra.worktree -> owl.cycle -> raven.planner -> raven.subagent -> raven.decomposer -> raven.integrator -> raven.feature -> infra.release -> cli`. New modules slot into this chain. No circular imports. (`infra/multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct settings dependency direction in architecture flow

The new dependency flow now says settings.config -> settings.eval_targets, but nightshift/settings/config.py directly imports infer_target_verify_command from nightshift.settings.eval_targets (line 16), so settings.eval_targets must appear earlier in the chain for the documented “imports only from the left” rule to remain true. Keeping this reversed in the contributor contract can drive future module placement/import decisions that violate the intended layering and reintroduce circular-import risk.

Useful? React with 👍 / 👎.

Move owl.eval_runner after settings.eval_targets in the dependency chain
in both CLAUDE.md and OPERATIONS.md, since eval_runner imports settings.config.
Mark sessions_since_eval in ROLE-SCORING.md as dashboard-only (not used in
pick-role.py scoring) to prevent misleading documentation.
@fazxes fazxes merged commit 6a0c94e into main Apr 9, 2026
7 checks passed
@fazxes fazxes deleted the audit/session-126-framework branch April 9, 2026 07:06
fazxes added a commit that referenced this pull request Apr 9, 2026
…hift test

When `nightshift test --repo-dir /tmp/nonexistent` is run and the
directory does not exist, the tool now auto-clones from PHRACTAL_URL
instead of crashing with a FileNotFoundError. If the clone fails, a
NightshiftError is raised with the exact `git clone` command so the user
can run it manually.

- Add PHRACTAL_URL constant to settings/eval_targets.py
- Add clone_repo() to infra/worktree.py (S603/S607 already suppressed there)
- Add _ensure_repo_dir() to cli.py, called at the top of run_nightshift()
- Export clone_repo and PHRACTAL_URL from nightshift/__init__.py
- Add TestCloneRepo and TestRunNightshiftMissingRepoDir test classes
fazxes added a commit that referenced this pull request Apr 9, 2026
…n mode

The previous code called _ensure_repo_dir(repo_dir) unconditionally,
meaning a user running `nightshift run --repo-dir /nonexistent` would
silently clone Phractal into that path.  Guard the call behind
`if test_mode and not repo_dir.exists()` so auto-clone only fires for
`nightshift test`.  Add a regression test that asserts run mode does
NOT create the missing directory.
fazxes added a commit that referenced this pull request Apr 9, 2026
feat(#248): auto-clone eval target when repo-dir is missing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant