feat: unified daemon — agent picks its own role each cycle#107
Conversation
One daemon loop replaces four. Each cycle the agent reads system signals (eval scores, task queue size, session history, healer status) and scores four roles: BUILD, REVIEW, OVERSEE, STRATEGIZE. The highest score wins. - Created docs/prompt/unified.md with assessment, scoring, and execution phases - daemon.sh loads unified.md instead of evolve.md directly - Agent reads the role-specific prompt (evolve/review/overseer/strategist) during session - Session index now includes a Role column for traceability - Role extraction works for both Claude and Codex log formats - unified.md added to prompt guard watched files - Updated CLAUDE.md, evolve-auto.md, overseer.md, strategist.md - No changes to the role prompts themselves — they're battle-tested
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c59bb3056
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fi | ||
|
|
||
| echo "| $(date '+%Y-%m-%d %H:%M') | $SESSION_ID | $EXIT_CODE | ${DURATION_MIN}m | \$$COST_USD | ${STATUS}${PROMPT_TAMPERED} | $FEATURE | $PR_URL |" >> "$INDEX_FILE" | ||
| echo "| $(date '+%Y-%m-%d %H:%M') | $SESSION_ID | $SESSION_ROLE | $EXIT_CODE | ${DURATION_MIN}m | \$$COST_USD | ${STATUS}${PROMPT_TAMPERED} | $FEATURE | $PR_URL |" >> "$INDEX_FILE" |
There was a problem hiding this comment.
Keep session-index schema compatible with analytics parser
This row writer now inserts a Role column, but nightshift/costs.py::_parse_session_index still reads duration from cells[3] and feature from cells[6] (legacy layout). With the new layout, those indexes map to exit and status, so new sessions are parsed with duration 0 and feature values like success, which corrupts cost_analysis('docs/sessions') outputs and any cost/health decisions based on them.
Useful? React with 👍 / 👎.
| build_prompt() { | ||
| cat "$AUTO_PREFIX" | ||
| cat "$EVOLVE_PROMPT" | ||
| cat "$UNIFIED_PROMPT" |
There was a problem hiding this comment.
Remove BUILD-only preface from unified role prompt
The unified daemon still prepends docs/prompt/evolve-auto.md before docs/prompt/unified.md; that preface contains mandatory BUILD-specific instructions (e.g., task-selection/build-step directives and “Follow the evolve prompt exactly”). When scoring picks REVIEW/OVERSEE/STRATEGIZE, the agent receives conflicting hard instructions, which can cause it to run the BUILD workflow instead of the selected role and undermine the unified role scheduler.
Useful? React with 👍 / 👎.
…le tasks Pentest findings (both confirmed by source inspection): - #142 (urgent): $after_task shell injection in run_evaluation() — agent- controlled text interpolated into Python -c string; fix via env var - #143 (urgent): PR title injected raw into builder and pentest prompts — sanitize or wrap in XML data label before injection Closed: - #77 (wontfix): code-reviewer.md no longer exists; system unified into review.md in PR #107; the described issue is absent from the codebase - #129 (wontfix/superseded): absorbed into #106 (lower-numbered, picked first); #106 updated to include the 50-task hard cap explicitly Queue: 53 pending (same count; +2 urgent, -2 wontfix)
…y report Tasks created from analysis of sessions #107-#122 and 7 human-filed issues: - #241 (urgent): fix worktree cleanup -- .claude/worktrees/agent-* leaking - #242 (urgent): add sessions_since_eval signal + brain eval cadence rule - #243 (normal): run nightshift against Phractal immediately (eval #17) Root causes identified: eval loop broken (14 sessions stale), worktree leak confirmed live, Phractal E2E never runs in daemon cadence.
…one) Queue before: 72 pending + 9 wontfix-in-active-dir Queue after: 65 pending + 0 wontfix (all converted to done for archiving) Merged into primary tasks (5 closures): - #175 -> #174: both add tests to TestAuthFailureDetection, same PR - #163 -> #162: both are scoring module tests from PR #158 review, same PR - #124 -> #122: both validate doc snapshot consistency, same PR scope - #196 -> #173: both add entries to PROMPT_GUARD_FILES in lib-agent.sh - #180 -> #179: both touch _is_valid_eval_file() in pick-role.py, same PR Closed as obsolete (1): - #78: references non-existent "evolve.md Step 8" and the multi-agent review panel replaced by unified review in PR #107 Closed as low-value (1): - #230: _DELEGATION_ROLE_MAP covers all 8 current agent types; new agent types require major framework work making the map update obvious Converted wontfix -> done for archiving (9): - #77, #80, #107, #111, #115, #119, #127, #129, #134 All had wontfix status with rationale already documented; changed to done so daemon's archive_done_tasks() housekeeping removes them
Summary
docs/prompt/unified.md— assessment protocol, deterministic scoring rules, decision output format, 3 examplesWhy
Four separate daemons with a shared lockfile required human intervention to switch modes. The agent should decide what the system needs, not the human. This is the AGI move — one loop, self-directed.
Scoring logic
Test plan
make checkpasses (943 tests)