Skip to content

feat: Python role selector — works for both Claude and Codex#114

Merged
fazxes merged 1 commit intomainfrom
feat/pick-role-engine
Apr 6, 2026
Merged

feat: Python role selector — works for both Claude and Codex#114
fazxes merged 1 commit intomainfrom
feat/pick-role-engine

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 6, 2026

Summary

Replaces the LLM-based unified prompt with a deterministic Python scoring engine.

Why

The unified prompt (unified.md) relied on the agent following structured output to pick its role. Audit found Codex ignores these instructions 100% of the time (900 sed reads vs 0 cat reads). The role decision must happen in Python, not in the LLM.

What changed

  • scripts/pick-role.py — 372-line scoring engine, reads signal files, computes scores, prints winner
  • tests/test_pick_role.py — 38 unit tests covering 10 stress test scenarios
  • daemon.sh — calls pick-role.py at cycle start, loads winning role prompt
  • unified.mddocs/ops/ROLE-SCORING.md (reference doc, no longer active prompt)
  • review.md Step 4 fixed to use make check
  • scripts/ added to prompt guard (trojan detection)
  • Pentest report wrapped in XML data tags (injection resistance)
  • Session index header fixed to 9 columns

How it works

daemon.sh → pick-role.py reads signals → prints "oversee" → daemon loads overseer.md → agent executes

Both Claude and Codex get the same treatment. No structured output required.

Test plan

  • 38 pick-role unit tests pass
  • make check passes (981 tests)
  • Shell syntax valid
  • pick-role.py runs live against the real repo
  • Start daemon, verify first cycle picks correct role

The unified prompt relied on the agent following structured output
instructions to pick its role. Codex ignores these 100% of the time
(900 sed reads vs 0 cat reads across 23 sessions). The role decision
now happens in Python before the agent starts.

scripts/pick-role.py:
- Reads signal files (eval scores, task counts, session history, etc.)
- Computes deterministic scores for 5 roles
- Prints winning role to stdout, reasoning to stderr
- Works identically for Claude and Codex
- 38 unit tests covering 10 stress test scenarios

daemon.sh:
- pick_session_role() calls pick-role.py, loads the winning role prompt
- build_prompt() cats evolve-auto.md + role prompt (not unified.md)
- Removed old Python role extraction from logs (always defaulted to "build")
- Removed FORCE_ROLE prompt injection (pick-role.py handles env var)
- Pentest report wrapped in <pentest_data> XML tags (injection resistance)

Also:
- unified.md moved to docs/ops/ROLE-SCORING.md (reference doc, not active prompt)
- review.md Step 4 fixed: make check replaces individual tool runs
- scripts/ added to PROMPT_GUARD_FILES and PROMPT_GUARD_DIRS (trojan detection)
- Session index header fixed to 9 columns
- Existing pentest test updated for new XML framing
@fazxes fazxes merged commit 9b0b9b8 into main Apr 6, 2026
@fazxes fazxes deleted the feat/pick-role-engine branch April 6, 2026 05:26
fazxes added a commit that referenced this pull request Apr 7, 2026
Closed 20 tasks with evidence:
- DONE (2): #73 (AGENTS.md created), #181 (docs/prompt/ deleted)
- WONTFIX-OBSOLETE (5): #78, #89, #128, #141, #157
  (reference docs/prompt/ or docs/ops/ paths deleted in session #103)
- WONTFIX-DUPLICATE (1): #88 (subset of #69)
- WONTFIX-NEVER-PICKED (12): #66, #69, #90, #96, #112,
  #114, #120, #123, #132, #133, #138, #145
  (low priority, 20-80+ sessions without being picked, speculative)

Priority fix: #103 downgraded from urgent to normal (umbrella epic,
not an actionable urgent fix).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant