feat: Python role selector — works for both Claude and Codex by fazxes · Pull Request #114 · Recusive/Nightshift

fazxes · 2026-04-06T05:26:31Z

Summary

Replaces the LLM-based unified prompt with a deterministic Python scoring engine.

Why

The unified prompt (unified.md) relied on the agent following structured output to pick its role. Audit found Codex ignores these instructions 100% of the time (900 sed reads vs 0 cat reads). The role decision must happen in Python, not in the LLM.

What changed

scripts/pick-role.py — 372-line scoring engine, reads signal files, computes scores, prints winner
tests/test_pick_role.py — 38 unit tests covering 10 stress test scenarios
daemon.sh — calls pick-role.py at cycle start, loads winning role prompt
unified.md → docs/ops/ROLE-SCORING.md (reference doc, no longer active prompt)
review.md Step 4 fixed to use make check
scripts/ added to prompt guard (trojan detection)
Pentest report wrapped in XML data tags (injection resistance)
Session index header fixed to 9 columns

How it works

daemon.sh → pick-role.py reads signals → prints "oversee" → daemon loads overseer.md → agent executes

Both Claude and Codex get the same treatment. No structured output required.

Test plan

38 pick-role unit tests pass
make check passes (981 tests)
Shell syntax valid
pick-role.py runs live against the real repo
Start daemon, verify first cycle picks correct role

The unified prompt relied on the agent following structured output instructions to pick its role. Codex ignores these 100% of the time (900 sed reads vs 0 cat reads across 23 sessions). The role decision now happens in Python before the agent starts. scripts/pick-role.py: - Reads signal files (eval scores, task counts, session history, etc.) - Computes deterministic scores for 5 roles - Prints winning role to stdout, reasoning to stderr - Works identically for Claude and Codex - 38 unit tests covering 10 stress test scenarios daemon.sh: - pick_session_role() calls pick-role.py, loads the winning role prompt - build_prompt() cats evolve-auto.md + role prompt (not unified.md) - Removed old Python role extraction from logs (always defaulted to "build") - Removed FORCE_ROLE prompt injection (pick-role.py handles env var) - Pentest report wrapped in <pentest_data> XML tags (injection resistance) Also: - unified.md moved to docs/ops/ROLE-SCORING.md (reference doc, not active prompt) - review.md Step 4 fixed: make check replaces individual tool runs - scripts/ added to PROMPT_GUARD_FILES and PROMPT_GUARD_DIRS (trojan detection) - Session index header fixed to 9 columns - Existing pentest test updated for new XML framing

Closed 20 tasks with evidence: - DONE (2): #73 (AGENTS.md created), #181 (docs/prompt/ deleted) - WONTFIX-OBSOLETE (5): #78, #89, #128, #141, #157 (reference docs/prompt/ or docs/ops/ paths deleted in session #103) - WONTFIX-DUPLICATE (1): #88 (subset of #69) - WONTFIX-NEVER-PICKED (12): #66, #69, #90, #96, #112, #114, #120, #123, #132, #133, #138, #145 (low priority, 20-80+ sessions without being picked, speculative) Priority fix: #103 downgraded from urgent to normal (umbrella epic, not an actionable urgent fix).

fazxes merged commit 9b0b9b8 into main Apr 6, 2026

fazxes deleted the feat/pick-role-engine branch April 6, 2026 05:26

fazxes mentioned this pull request Apr 9, 2026

oversee: triage task queue — close 16 duplicates and superseded tasks #252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Python role selector — works for both Claude and Codex#114

feat: Python role selector — works for both Claude and Codex#114
fazxes merged 1 commit intomainfrom
feat/pick-role-engine

fazxes commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fazxes commented Apr 6, 2026

Summary

Why

What changed

How it works

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant