Skip to content

docs: rewrite ROLE-SCORING.md for v2 brain-delegates-to-sub-agents architecture#197

Merged
fazxes merged 2 commits intomainfrom
docs/role-scoring-v2-rewrite
Apr 8, 2026
Merged

docs: rewrite ROLE-SCORING.md for v2 brain-delegates-to-sub-agents architecture#197
fazxes merged 2 commits intomainfrom
docs/role-scoring-v2-rewrite

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 8, 2026

Summary

  • Rewrites .recursive/ops/ROLE-SCORING.md to accurately describe the v2 architecture where pick-role.py computes an advisory recommendation automatically -- not a manual scoring process for the brain agent
  • Removes the v1 PHASE 1/2/3 structure (ASSESS / DECIDE / EXECUTE) that was misleading in v2
  • Documents all 8 roles (build, review, oversee, strategize, achieve, security-check, evolve, audit) with scoring rules sourced directly from pick-role.py's compute_scores() function
  • Updates examples to show the v2 brain 4-checkpoint analysis flow (Signal Analysis, Forced Tradeoff, Pre-Commitment, Commitment Check)
  • Removes all references to EXECUTING ROLE: and direct SKILL.md reading instructions
  • Marks task fix: pentest_framework_tasks already in safe_signals, mark task #0210 done #203 as done

Changes

  • .recursive/ops/ROLE-SCORING.md -- complete rewrite
  • .recursive/tasks/0203.md -- status: pending -> done

Test plan

  • make check passes (all 882 tests, ruff, mypy, ASCII-clean, install.sh refs)
  • No nightshift/ files modified (framework-zone only)
  • New document accurately reflects pick-role.py scoring algorithm
  • All 8 roles documented with threshold values matching the source code
  • No references to EXECUTING ROLE: or manual SKILL.md reading

…ents architecture

Remove the v1 PHASE 1/2/3 manual scoring process that was misleading in v2.
The document now accurately describes that pick-role.py computes an advisory
recommendation automatically, and the brain reads it as one input among several.

All 8 roles (build, review, oversee, strategize, achieve, security-check, evolve,
audit) are documented with their scoring rules sourced directly from pick-role.py.
Examples updated to show the v2 brain checkpoint analysis flow. Removed all
references to EXECUTING ROLE: and manual SKILL.md reading instructions.

Closes task #203.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a22476098

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .recursive/ops/ROLE-SCORING.md Outdated
```
BUILD: 80 (50 +30 eval healthy)
REVIEW: 60 (10 +40 consecutive >= 5 +10 review >= 5)
OVERSEE: 100 (5 +60 pending >= 80... wait: 62 < 80, so +45 pending >= 50 with so >= 3 = 50)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix Example 2 advisory math to match pick-role.py

This example claims OVERSEE: 100 and an oversee recommendation for pending_tasks: 62, but compute_scores() in .recursive/engine/pick-role.py gives oversee=50 for that range and build=80, so pick_role() would choose build. Because this file is presented as the scoring reference, this mismatch will mislead anyone validating daemon role selection against the documented examples.

Useful? React with 👍 / 👎.

Comment thread .recursive/ops/ROLE-SCORING.md Outdated
STRATEGIZE: 5 (5 base, strategy < 10)
ACHIEVE: 5 (5 base)
SECURITY: 5 (5 base)
EVOLVE: 80 (5 +50 friction >=5... wait: 4 < 5 so +0; +30 friction >= 3 AND se >= 5 = 35)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct Example 4 impossible evolve tie scenario

The line states EVOLVE: 80 and a tie with BUILD, but with the shown signals (friction_entries=4, sessions_since_evolve=8) the EVOLVE formula in compute_scores() only reaches 35, so this advisory path cannot occur as written. Keeping an impossible selector outcome in the v2 reference can cause false incident/debugging conclusions when operators compare real advisory output to this doc.

Useful? React with 👍 / 👎.

- Add "(elif, mutually exclusive)" to ACHIEVE autonomy_score < 90 bonus
  to clarify it cannot stack with the < 70 bonus (code uses elif)
- Fix Example 2: update signals (pending=85, stale=6, so=4, tracker_moved=true)
  and JSON score (100->90) to match breakdown math; OVERSEE=90 correctly beats BUILD=80
- Fix Example 3: ACHIEVE score 20->25 (auto=72 triggers elif auto<90: 5+20=25);
  add sessions_since_achieve to signals to prevent hard-cap ineligibility
- Fix Example 4: friction_entries 4->5 so EVOLVE=85 (5+50+30) beats BUILD=80;
  add tracker_moved=true to prevent STRATEGIZE from incorrectly winning at 95;
  fix JSON score (80->85); fix ACHIEVE to -1 (ineligible per hard cap)
- Fix Example 1: OVERSEE 10->5 (so defaults 0, hard cap applies);
  add sessions_since_achieve to signals to prevent hard-cap ineligibility
@fazxes fazxes merged commit e85f0a0 into main Apr 8, 2026
6 checks passed
fazxes added a commit that referenced this pull request Apr 8, 2026
Eval gate cleared: #177 done (PR #198), score 86/100.
ROLE-SCORING.md rewritten for v2: #203 done (PR #197).
Follow-up task #207 created (eval Breadth improvement).
@fazxes fazxes deleted the docs/role-scoring-v2-rewrite branch April 8, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant