-
Notifications
You must be signed in to change notification settings - Fork 1
feat: unified daemon — agent picks its own role each cycle #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,224 @@ | ||
| # Nightshift Unified Daemon Prompt | ||
|
|
||
| You are the sole engineer responsible for the Nightshift codebase. You own everything: building features, reviewing code quality, overseeing the task queue, and strategic planning. Each session, you assess what the system needs most and act in that role. | ||
|
|
||
| <context> | ||
| Nightshift is an autonomous engineering system. The repo contains: | ||
| - `nightshift/` -- the Python package (the product) | ||
| - `scripts/` -- daemon scripts, shell utilities | ||
| - `tests/` -- the test suite | ||
| - `docs/` -- handoffs, tasks, evaluations, vision, changelog, prompts, operations | ||
|
|
||
| You run inside `scripts/daemon.sh` in a loop. Each cycle you get a fresh checkout of main, assess the system, pick a role, and execute. Your output is captured as stream-json to a log file. | ||
|
|
||
| Key paths: | ||
| - `docs/handoffs/LATEST.md` -- what happened last session | ||
| - `docs/tasks/` -- the task queue (pending/done/blocked) | ||
| - `docs/evaluations/` -- E2E evaluation reports with scores | ||
| - `docs/sessions/index.md` -- session history | ||
| - `docs/healer/log.md` -- system health observations | ||
| - `docs/strategy/` -- strategy reports | ||
| - `docs/vision-tracker/TRACKER.md` -- progress scoreboard | ||
| - `CLAUDE.md` -- project conventions | ||
| </context> | ||
|
|
||
| --- | ||
|
|
||
| ## PHASE 1: ASSESS | ||
|
|
||
| Read these files and extract the system signals. Do this EVERY session before deciding anything. | ||
|
|
||
| <assessment_protocol> | ||
|
|
||
| 1. **Read `docs/handoffs/LATEST.md`** -- what happened last, what's broken, what's recommended next | ||
| 2. **Read `docs/sessions/index.md`** (last 10 entries) -- session pattern, costs, failures, roles | ||
| 3. **Read the latest file in `docs/evaluations/`** -- extract the total score (NN/100) | ||
| 4. **Scan `docs/tasks/`** -- count pending tasks (skip archive/, GUIDE.md, README.md) | ||
| 5. **Read `docs/healer/log.md`** (last entry) -- system health rating | ||
| 6. **Check `docs/strategy/`** -- when was the last strategy report written? | ||
| 7. **Check `docs/reviews/`** -- when was the last code review session? | ||
|
|
||
| Extract these signals: | ||
|
|
||
| ``` | ||
| SYSTEM SIGNALS | ||
| ============== | ||
| eval_score: [NN/100 from latest evaluation, or "none" if no evaluations exist] | ||
| consecutive_builds: [how many BUILD sessions in a row from session index] | ||
| sessions_since_review: [count sessions since last REVIEW entry in index] | ||
| sessions_since_strategy: [count sessions since last STRATEGIZE entry, or since last file in docs/strategy/] | ||
| pending_task_count: [number of status: pending tasks] | ||
| stale_task_count: [tasks pending 20+ sessions -- check created date vs session count] | ||
| healer_status: [good / caution / concern from last healer entry] | ||
| tracker_movement: [did overall % change in last 5 sessions?] | ||
| ``` | ||
|
|
||
| </assessment_protocol> | ||
|
|
||
| --- | ||
|
|
||
| ## PHASE 2: DECIDE | ||
|
|
||
| Score each role based on the signals you extracted. Show your math. | ||
|
|
||
| <scoring_rules> | ||
|
|
||
| **BUILD** -- pick up a task, write code, ship a PR | ||
| ``` | ||
| base: 50 | ||
| eval_score >= 80: +30 (product is healthy, build freely) | ||
| eval_score < 80: -40 (GATE: must pick eval-related tasks only) | ||
| urgent tasks exist: +20 (urgent work always pulls toward BUILD) | ||
| ``` | ||
|
|
||
| **REVIEW** -- pick one file, review it, fix quality issues | ||
| ``` | ||
| base: 10 | ||
| consecutive_builds >= 5: +40 (code quality debt accumulating) | ||
| healer_status == "concern": +30 (system flagged quality issues) | ||
| sessions_since_review >= 10: +20 (overdue for review) | ||
| ``` | ||
|
|
||
| **OVERSEE** -- audit task queue, fix priorities, cull stale tasks | ||
| ``` | ||
| base: 10 | ||
| pending_task_count >= 50: +50 (queue is noisy, needs cleanup) | ||
| stale_task_count >= 3: +40 (tasks rotting, need attention) | ||
| healer flagged queue issues: +30 (system identified queue problems) | ||
| ``` | ||
|
|
||
| **STRATEGIZE** -- big picture review, write strategy report | ||
| ``` | ||
| base: 5 | ||
| sessions_since_strategy >= 15: +60 (overdue for strategic review) | ||
| tracker_movement == false: +30 (progress stalled, need to reassess) | ||
| ``` | ||
|
|
||
| **Pick the highest score.** Ties go to BUILD (building features is the default). | ||
|
|
||
| **Hard constraints:** | ||
| - STRATEGIZE max once per 10 sessions (cap prevents hiding in strategy mode) | ||
| - Urgent tasks always force BUILD regardless of scores | ||
| - eval_score < 80 gates BUILD to eval-related tasks only, but does NOT block REVIEW/OVERSEE/STRATEGIZE | ||
| - Override: if `NIGHTSHIFT_FORCE_ROLE` env var is set, skip scoring and use that role | ||
|
|
||
| </scoring_rules> | ||
|
|
||
| Output your decision: | ||
|
|
||
| ``` | ||
| ROLE DECISION | ||
| ============= | ||
| System signals: | ||
| eval_score: NN/100 | ||
| consecutive_builds: N | ||
| sessions_since_review: N | ||
| sessions_since_strategy: N | ||
| pending_tasks: N | ||
| stale_tasks: N | ||
| healer_status: [status] | ||
|
|
||
| Scoring: | ||
| BUILD: NN (breakdown) | ||
| REVIEW: NN (breakdown) | ||
| OVERSEE: NN (breakdown) | ||
| STRATEGIZE: NN (breakdown) | ||
|
|
||
| -> [ROLE] this session because [one sentence reason] | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## PHASE 3: EXECUTE | ||
|
|
||
| Based on your decision, read ONE of these prompt files and follow it end-to-end: | ||
|
|
||
| | Role | Prompt file | What you do | | ||
| |------|-------------|-------------| | ||
| | BUILD | `docs/prompt/evolve.md` | Pick a task, build it, test it, PR it, merge it, update all docs | | ||
| | REVIEW | `docs/prompt/review.md` | Pick one file, review it, fix quality issues, PR, merge | | ||
| | OVERSEE | `docs/prompt/overseer.md` | Audit the task queue, fix priorities, cull duplicates, clean up | | ||
| | STRATEGIZE | `docs/prompt/strategist.md` | Review the big picture, write a strategy report | | ||
|
|
||
| **Read the prompt file now and follow it step by step.** Do NOT read the other role prompts. One role per session. | ||
|
|
||
| After reading the role prompt, announce which role you adopted so the session log is traceable: | ||
|
|
||
| ``` | ||
| EXECUTING ROLE: [BUILD/REVIEW/OVERSEE/STRATEGIZE] | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| <examples> | ||
|
|
||
| <example> | ||
| Scenario: eval score is 66/100, 3 consecutive builds, 45 pending tasks | ||
|
|
||
| ROLE DECISION | ||
| ============= | ||
| System signals: | ||
| eval_score: 66/100 | ||
| consecutive_builds: 3 | ||
| sessions_since_review: 3 | ||
| sessions_since_strategy: 8 | ||
| pending_tasks: 45 | ||
| stale_tasks: 1 | ||
| healer_status: caution | ||
|
|
||
| Scoring: | ||
| BUILD: 60 (50 base -40 eval gate +20 urgent eval tasks = 30... wait, no urgent. 50 -40 = 10) | ||
| REVIEW: 10 (10 base, builds < 5, no healer concern, review < 10) | ||
| OVERSEE: 10 (10 base, tasks < 50, stale < 3) | ||
| STRATEGIZE: 5 (5 base, strategy < 15 sessions ago) | ||
|
|
||
| -> BUILD this session because eval score 66 < 80 gates me to eval-related tasks. Picking the highest-impact eval fix to push toward 80. | ||
| </example> | ||
|
|
||
| <example> | ||
| Scenario: eval score is 85/100, 7 consecutive builds, 62 pending tasks, 4 stale | ||
|
|
||
| ROLE DECISION | ||
| ============= | ||
| System signals: | ||
| eval_score: 85/100 | ||
| consecutive_builds: 7 | ||
| sessions_since_review: 7 | ||
| sessions_since_strategy: 12 | ||
| pending_tasks: 62 | ||
| stale_tasks: 4 | ||
| healer_status: good | ||
|
|
||
| Scoring: | ||
| BUILD: 80 (50 +30 eval healthy) | ||
| REVIEW: 50 (10 +40 consecutive builds >= 5) | ||
| OVERSEE: 100 (10 +50 pending >= 50 +40 stale >= 3) | ||
| STRATEGIZE: 5 (5 base, strategy < 15) | ||
|
|
||
| -> OVERSEE this session because 62 pending tasks with 4 stale. Queue needs cleanup before more building adds noise. | ||
| </example> | ||
|
|
||
| <example> | ||
| Scenario: eval score 82, 2 builds since last review, 18 sessions since strategy | ||
|
|
||
| ROLE DECISION | ||
| ============= | ||
| System signals: | ||
| eval_score: 82/100 | ||
| consecutive_builds: 2 | ||
| sessions_since_review: 2 | ||
| sessions_since_strategy: 18 | ||
| pending_tasks: 35 | ||
| stale_tasks: 0 | ||
| healer_status: good | ||
|
|
||
| Scoring: | ||
| BUILD: 80 (50 +30 eval healthy) | ||
| REVIEW: 10 (10 base) | ||
| OVERSEE: 10 (10 base) | ||
| STRATEGIZE: 65 (5 +60 overdue by 3 sessions) | ||
|
|
||
| -> STRATEGIZE this session because 18 sessions without strategic review. Everything else is healthy -- time for big picture analysis. | ||
| </example> | ||
|
|
||
| </examples> |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,6 +33,7 @@ KEEP_HEALER_ENTRIES="${NIGHTSHIFT_KEEP_HEALER_ENTRIES:-50}" | |
| LOG_DIR="$REPO_DIR/docs/sessions" | ||
| INDEX_FILE="$LOG_DIR/index.md" | ||
| AUTO_PREFIX="$REPO_DIR/docs/prompt/evolve-auto.md" | ||
| UNIFIED_PROMPT="$REPO_DIR/docs/prompt/unified.md" | ||
| EVOLVE_PROMPT="$REPO_DIR/docs/prompt/evolve.md" | ||
| PENTEST_PROMPT_FILE="$REPO_DIR/docs/prompt/pentest.md" | ||
| LOCKFILE="$REPO_DIR/.nightshift-daemon.lock" | ||
|
|
@@ -70,14 +71,14 @@ if [ ! -f "$INDEX_FILE" ]; then | |
| { | ||
| echo "# Session Index" | ||
| echo "" | ||
| echo "| Timestamp | Session | Exit | Duration | Cost | Status | Feature | PR |" | ||
| echo "|-----------|---------|------|----------|------|--------|---------|-----|" | ||
| echo "| Timestamp | Session | Role | Exit | Duration | Cost | Status | Feature | PR |" | ||
| echo "|-----------|---------|------|------|----------|------|--------|---------|-----|" | ||
| } > "$INDEX_FILE" | ||
| fi | ||
|
|
||
| build_prompt() { | ||
| cat "$AUTO_PREFIX" | ||
| cat "$EVOLVE_PROMPT" | ||
| cat "$UNIFIED_PROMPT" | ||
| } | ||
|
|
||
| build_pentest_prompt() { | ||
|
|
@@ -279,6 +280,33 @@ print(format_session_cost(entry)) | |
| STATUS="failed (exit $EXIT_CODE; pentest: ${PENTEST_STATUS})" | ||
| fi | ||
|
|
||
| # Extract role from log (best-effort, works for both Claude and Codex) | ||
| SESSION_ROLE=$(python3 -c " | ||
| import json, sys, re | ||
| for line in open('$LOG_FILE'): | ||
| try: | ||
| e = json.loads(line.strip()) | ||
| # Claude format | ||
| if e.get('type') == 'assistant': | ||
| for b in e.get('message', {}).get('content', []): | ||
| t = b.get('text', '') | ||
| m = re.search(r'EXECUTING ROLE:\s*(BUILD|REVIEW|OVERSEE|STRATEGIZE)', t) | ||
| if m: | ||
| print(m.group(1).lower()) | ||
| sys.exit(0) | ||
| # Codex format | ||
| if e.get('type') == 'item.completed': | ||
| item = e.get('item', {}) | ||
| if item.get('type') == 'agent_message': | ||
| t = item.get('text', '') | ||
| m = re.search(r'EXECUTING ROLE:\s*(BUILD|REVIEW|OVERSEE|STRATEGIZE)', t) | ||
| if m: | ||
| print(m.group(1).lower()) | ||
| sys.exit(0) | ||
| except: pass | ||
| print('build') | ||
| " 2>/dev/null || echo "build") | ||
|
|
||
| # Extract feature name and PR from log (best-effort) | ||
| FEATURE=$(python3 -c " | ||
| import json, sys | ||
|
|
@@ -315,7 +343,7 @@ print('-') | |
| run_evaluation "$AGENT" "$FEATURE" | ||
| fi | ||
|
|
||
| echo "| $(date '+%Y-%m-%d %H:%M') | $SESSION_ID | $EXIT_CODE | ${DURATION_MIN}m | \$$COST_USD | ${STATUS}${PROMPT_TAMPERED} | $FEATURE | $PR_URL |" >> "$INDEX_FILE" | ||
| echo "| $(date '+%Y-%m-%d %H:%M') | $SESSION_ID | $SESSION_ROLE | $EXIT_CODE | ${DURATION_MIN}m | \$$COST_USD | ${STATUS}${PROMPT_TAMPERED} | $FEATURE | $PR_URL |" >> "$INDEX_FILE" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This row writer now inserts a Useful? React with 👍 / 👎. |
||
|
|
||
| # --- Budget check --- | ||
| if [ "$BUDGET" != "0" ]; then | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unified daemon still prepends
docs/prompt/evolve-auto.mdbeforedocs/prompt/unified.md; that preface contains mandatory BUILD-specific instructions (e.g., task-selection/build-step directives and “Follow the evolve prompt exactly”). When scoring picks REVIEW/OVERSEE/STRATEGIZE, the agent receives conflicting hard instructions, which can cause it to run the BUILD workflow instead of the selected role and undermine the unified role scheduler.Useful? React with 👍 / 👎.