Skip to content

feat: ACHIEVE role — autonomy engineer for zero human intervention#111

Merged
fazxes merged 1 commit intomainfrom
feat/achieve-role
Apr 6, 2026
Merged

feat: ACHIEVE role — autonomy engineer for zero human intervention#111
fazxes merged 1 commit intomainfrom
feat/achieve-role

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 6, 2026

Summary

5th role for the unified daemon: ACHIEVE — measures autonomy score (0-100), identifies human dependencies, eliminates the highest-impact one each session.

Autonomy Score Framework (20 checks, 5 points each)

  • Self-Healing (25): crash recovery, prompt guard, CI auto-fix, eval gate, self-restart
  • Self-Directing (25): role selection, task generation, auto-release, stale culling, strategy triggers
  • Self-Validating (25): E2E eval, score trend, smoke test, code review, test regression
  • Self-Improving (25): learnings, healer trends, prompt refinement, cost stability, success rate

Anti-slop quality gates

Every fix must pass: Linus Test, New Hire Test, 3 AM Test, Pride Test

Files

  • docs/prompt/achieve.md — 288-line prompt (blueprint-first pattern)
  • docs/autonomy/README.md — score framework docs
  • Wired into unified.md, evolve-auto.md, daemon.sh, lib-agent.sh, format-stream.py, CLAUDE.md

Test plan

  • make check passes
  • Shell syntax valid
  • ACHIEVE appears in unified.md scoring table
  • Role extraction regex includes ACHIEVE
  • Prompt guard watches achieve.md
  • First ACHIEVE session should compute autonomy score and fix one dependency

…cies

5th role for the unified daemon. Measures autonomy score (0-100) across
4 categories (self-healing, self-directing, self-validating, self-improving),
identifies the highest-impact human dependency, and fixes it with
production-grade changes.

- docs/prompt/achieve.md: 288-line prompt following blueprint-first patterns
  (identity, context, 7 rules, 10-step process, anti-slop quality gates)
- docs/autonomy/README.md: score framework documentation
- Wired into unified.md scoring (triggers when autonomy < 70 or needs-human issues)
- Added to prompt guard, role extraction, format-stream markers
- evolve-auto.md gets ACHIEVE context
- CLAUDE.md documents the 5th role
- Max once per 5 sessions to prevent over-introspection

Prompt designed using Anthropic's 12-dimension framework:
clarity 10, role 10, data separation 9, output format 10,
chain of thought 9, examples (via scorecard), hallucination prevention 9,
structure 10, polish 9.
@fazxes fazxes merged commit 2a5f67c into main Apr 6, 2026
@fazxes fazxes deleted the feat/achieve-role branch April 6, 2026 04:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a5f67c9a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/prompt/unified.md

**Hard constraints:**
- STRATEGIZE max once per 10 sessions (cap prevents hiding in strategy mode)
- ACHIEVE max once per 5 sessions (autonomy work is high-value but infrequent)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make ACHIEVE cooldown computable before enforcing it

The new hard constraint ACHIEVE max once per 5 sessions is not operationalized by the signal set in PHASE 1, which means the agent has no explicit sessions_since_achieve value to apply this cap during scoring. In low-score states (e.g., no autonomy report + eval gate active), ACHIEVE can be repeatedly selected across consecutive sessions, starving BUILD/REVIEW despite the intended cooldown. Add a tracked signal (from docs/sessions/index.md) and apply it in the scoring/constraint logic so this limit is enforceable rather than advisory.

Useful? React with 👍 / 👎.

fazxes added a commit that referenced this pull request Apr 6, 2026
Pentest found daemon.sh crashes on bash 3.2 due to `local` outside
function (PR #143 regression). Created urgent tasks #154 and #155.

Done: #116 (PR #126), #151 (tracker count fixed by PR #142)
Wontfix: #80, #107, #111, #115, #127, #134 (speculative/superseded)
fazxes added a commit that referenced this pull request Apr 9, 2026
…one)

Queue before: 72 pending + 9 wontfix-in-active-dir
Queue after: 65 pending + 0 wontfix (all converted to done for archiving)

Merged into primary tasks (5 closures):
- #175 -> #174: both add tests to TestAuthFailureDetection, same PR
- #163 -> #162: both are scoring module tests from PR #158 review, same PR
- #124 -> #122: both validate doc snapshot consistency, same PR scope
- #196 -> #173: both add entries to PROMPT_GUARD_FILES in lib-agent.sh
- #180 -> #179: both touch _is_valid_eval_file() in pick-role.py, same PR

Closed as obsolete (1):
- #78: references non-existent "evolve.md Step 8" and the multi-agent
  review panel replaced by unified review in PR #107

Closed as low-value (1):
- #230: _DELEGATION_ROLE_MAP covers all 8 current agent types; new agent
  types require major framework work making the map update obvious

Converted wontfix -> done for archiving (9):
- #77, #80, #107, #111, #115, #119, #127, #129, #134
  All had wontfix status with rationale already documented; changed to
  done so daemon's archive_done_tasks() housekeeping removes them
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant