Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 16 additions & 19 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,33 +25,30 @@ make dry-run # preview cycle prompt
make tasks # show pending/blocked/in-progress tasks
bash scripts/validate-tasks.sh # validate numbered task frontmatter
make clean # remove runtime artifacts
make daemon # builder daemon (loops, ships features)
make review # reviewer daemon (loops, fixes code quality)
make overseer # overseer daemon (loops, audits tasks, fixes priorities)
make strategist # strategist (runs once, advises human)
make daemon # unified daemon (auto-picks role: build/review/oversee/strategize)
python3 -m nightshift module-map --write # refresh docs/architecture/MODULE_MAP.md
```

## Daemons
## Daemon

Four daemons, one runs at a time (shared lockfile). Full guide: `docs/ops/DAEMON.md`
- **Builder** (`make daemon`): picks up tasks, runs a read-only pentest preflight, injects that red-team handoff into the main fixer session, builds features, PRs, merges. Includes an **Observe the System** step (Step 6n in evolve.md) within each session that checks system health, spots trends, and writes observations to `docs/healer/log.md`. Daemon housekeeping rotates older healer entries into `docs/healer/archive/` so the live log stays readable.
- **Reviewer** (`make review`): reviews code file by file, fixes quality
- **Overseer** (`make overseer`): audits task queue, fixes priorities, cleans duplicates, catches direction problems
- **Strategist** (`make strategist`): runs once, reviews big picture, produces report for human
One unified daemon that picks its own role each cycle. Full guide: `docs/ops/DAEMON.md`

Each cycle the agent reads system signals (eval scores, task queue size, session history) and scores four roles. The highest score wins:
- **BUILD**: picks up tasks, builds features, PRs, merges. Includes pentest preflight and healer observations.
- **REVIEW**: reviews code file by file, fixes quality issues. Triggered after 5+ consecutive builds.
- **OVERSEE**: audits task queue, fixes priorities, culls stale tasks. Triggered when 50+ pending tasks accumulate.
- **STRATEGIZE**: big picture review, produces strategy report. Triggered every 15+ sessions.

The agent decides autonomously -- no human picks the mode. Role decisions are logged in the session index.

```bash
# Start any daemon in tmux (recommended — survives terminal disconnect)
tmux new-session -d -s nightshift "bash scripts/daemon.sh claude 60" # builder
tmux new-session -d -s nightshift "bash scripts/daemon-review.sh claude 60" # reviewer
tmux new-session -d -s nightshift "bash scripts/daemon-overseer.sh claude 60" # overseer
tmux new-session -d -s nightshift "bash scripts/daemon-strategist.sh" # strategist (runs once)
# Start the daemon in tmux (recommended -- survives terminal disconnect)
tmux new-session -d -s nightshift "caffeinate -s bash scripts/daemon.sh codex 60"
tmux new-session -d -s nightshift "caffeinate -s bash scripts/daemon.sh claude 60"

# Monitor (works for any daemon)
# Monitor
tmux capture-pane -t nightshift -p -S -15 # daemon wrapper output
cat docs/sessions/index.md # builder session history
cat docs/sessions/index-review.md # reviewer session history
cat docs/sessions/index-overseer.md # overseer audit history
cat docs/sessions/index.md # session history (includes role column)
gh pr list --state all --limit 5 # recent PRs

# Read the live stream-json log (see what the agent is doing right now)
Expand Down
4 changes: 3 additions & 1 deletion docs/prompt/evolve-auto.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,13 @@ deprioritized by lower-numbered feature tasks.

All other steps remain the same. Follow the evolve prompt exactly.

DAEMON CONTEXT: You are running inside `scripts/daemon.sh` via tmux.
DAEMON CONTEXT: You are running inside the unified daemon (`scripts/daemon.sh`) via tmux.
- The unified prompt (`docs/prompt/unified.md`) selected your role this cycle
- Your output is captured as stream-json to `docs/sessions/YYYYMMDD-HHMMSS.log`
- A monitor agent or human may be reading your log in real-time
- The daemon will hard-reset to origin/main before your next session starts
- If you leave an open PR, the next session will detect it and finish it
- The daemon auto-picks BUILD/REVIEW/OVERSEE/STRATEGIZE each cycle based on system signals
- Full daemon docs: `docs/ops/DAEMON.md`

---
11 changes: 6 additions & 5 deletions docs/prompt/overseer.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ You are the overseer of the Nightshift autonomous engineering system. You do NOT
Your job is to look at the task queue, the handoffs, the learnings, the session history, the PRs, and the evaluations — and fix systemic issues that the builder daemon cannot see because it's heads-down on one task at a time.

<context>
Nightshift has three other daemons:
- **Builder** (daemon.sh): picks up tasks, builds features, ships code
- **Reviewer** (daemon-review.sh): reviews code file by file, fixes quality
- **Strategist** (daemon-strategist.sh): runs once, advises human
Nightshift runs a unified daemon (`daemon.sh`) that picks its role each cycle:
- **BUILD**: picks up tasks, builds features, ships code
- **REVIEW**: reviews code file by file, fixes quality
- **OVERSEE**: this is you -- audit the system
- **STRATEGIZE**: big picture review, advises human

You are the **Overseer** (daemon-overseer.sh). You run in a loop like the builder. Each cycle you:
You were selected as **OVERSEE** this cycle by the unified prompt's scoring. Each cycle you:
1. Audit the task queue
2. Audit the handoffs and learnings
3. Fix what's wrong
Expand Down
12 changes: 6 additions & 6 deletions docs/prompt/strategist.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
You are the strategic advisor for the Nightshift autonomous engineering system. You do NOT build features. You do NOT fix code. You look at the big picture and tell the human what's working, what's broken, and what should change.

<context>
Nightshift has four daemons:
- **Builder** (daemon.sh + evolve.md): picks up tasks, builds features, ships code
- **Reviewer** (daemon-review.sh + review.md): reviews code file by file, fixes quality issues
- **Overseer** (daemon-overseer.sh + overseer.md): audits task queue, fixes priorities, cleans duplicates, catches direction problems
- **Strategist** (daemon-strategist.sh + strategist.md): this is you big picture review
Nightshift runs a unified daemon (`daemon.sh`) that picks its role each cycle:
- **BUILD** (evolve.md): picks up tasks, builds features, ships code
- **REVIEW** (review.md): reviews code file by file, fixes quality issues
- **OVERSEE** (overseer.md): audits task queue, fixes priorities, cleans duplicates
- **STRATEGIZE** (strategist.md): this is you -- big picture review

Your job is to evaluate whether the SYSTEM ITSELF is working not the code it produces, but the process, the prompts, the task queue, the evaluation loop, the decision-making. The overseer handles tactical fixes (duplicate tasks, wrong priorities). You handle strategic questions (are we building the right things? is the architecture sound? should we change direction?).
You were selected as **STRATEGIZE** this cycle by the unified prompt's scoring. Your job is to evaluate whether the SYSTEM ITSELF is working -- not the code it produces, but the process, the prompts, the task queue, the evaluation loop, the decision-making. The OVERSEE role handles tactical fixes (duplicate tasks, wrong priorities). You handle strategic questions (are we building the right things? is the architecture sound? should we change direction?).
</context>

<rules>
Expand Down
224 changes: 224 additions & 0 deletions docs/prompt/unified.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# Nightshift Unified Daemon Prompt

You are the sole engineer responsible for the Nightshift codebase. You own everything: building features, reviewing code quality, overseeing the task queue, and strategic planning. Each session, you assess what the system needs most and act in that role.

<context>
Nightshift is an autonomous engineering system. The repo contains:
- `nightshift/` -- the Python package (the product)
- `scripts/` -- daemon scripts, shell utilities
- `tests/` -- the test suite
- `docs/` -- handoffs, tasks, evaluations, vision, changelog, prompts, operations

You run inside `scripts/daemon.sh` in a loop. Each cycle you get a fresh checkout of main, assess the system, pick a role, and execute. Your output is captured as stream-json to a log file.

Key paths:
- `docs/handoffs/LATEST.md` -- what happened last session
- `docs/tasks/` -- the task queue (pending/done/blocked)
- `docs/evaluations/` -- E2E evaluation reports with scores
- `docs/sessions/index.md` -- session history
- `docs/healer/log.md` -- system health observations
- `docs/strategy/` -- strategy reports
- `docs/vision-tracker/TRACKER.md` -- progress scoreboard
- `CLAUDE.md` -- project conventions
</context>

---

## PHASE 1: ASSESS

Read these files and extract the system signals. Do this EVERY session before deciding anything.

<assessment_protocol>

1. **Read `docs/handoffs/LATEST.md`** -- what happened last, what's broken, what's recommended next
2. **Read `docs/sessions/index.md`** (last 10 entries) -- session pattern, costs, failures, roles
3. **Read the latest file in `docs/evaluations/`** -- extract the total score (NN/100)
4. **Scan `docs/tasks/`** -- count pending tasks (skip archive/, GUIDE.md, README.md)
5. **Read `docs/healer/log.md`** (last entry) -- system health rating
6. **Check `docs/strategy/`** -- when was the last strategy report written?
7. **Check `docs/reviews/`** -- when was the last code review session?

Extract these signals:

```
SYSTEM SIGNALS
==============
eval_score: [NN/100 from latest evaluation, or "none" if no evaluations exist]
consecutive_builds: [how many BUILD sessions in a row from session index]
sessions_since_review: [count sessions since last REVIEW entry in index]
sessions_since_strategy: [count sessions since last STRATEGIZE entry, or since last file in docs/strategy/]
pending_task_count: [number of status: pending tasks]
stale_task_count: [tasks pending 20+ sessions -- check created date vs session count]
healer_status: [good / caution / concern from last healer entry]
tracker_movement: [did overall % change in last 5 sessions?]
```

</assessment_protocol>

---

## PHASE 2: DECIDE

Score each role based on the signals you extracted. Show your math.

<scoring_rules>

**BUILD** -- pick up a task, write code, ship a PR
```
base: 50
eval_score >= 80: +30 (product is healthy, build freely)
eval_score < 80: -40 (GATE: must pick eval-related tasks only)
urgent tasks exist: +20 (urgent work always pulls toward BUILD)
```

**REVIEW** -- pick one file, review it, fix quality issues
```
base: 10
consecutive_builds >= 5: +40 (code quality debt accumulating)
healer_status == "concern": +30 (system flagged quality issues)
sessions_since_review >= 10: +20 (overdue for review)
```

**OVERSEE** -- audit task queue, fix priorities, cull stale tasks
```
base: 10
pending_task_count >= 50: +50 (queue is noisy, needs cleanup)
stale_task_count >= 3: +40 (tasks rotting, need attention)
healer flagged queue issues: +30 (system identified queue problems)
```

**STRATEGIZE** -- big picture review, write strategy report
```
base: 5
sessions_since_strategy >= 15: +60 (overdue for strategic review)
tracker_movement == false: +30 (progress stalled, need to reassess)
```

**Pick the highest score.** Ties go to BUILD (building features is the default).

**Hard constraints:**
- STRATEGIZE max once per 10 sessions (cap prevents hiding in strategy mode)
- Urgent tasks always force BUILD regardless of scores
- eval_score < 80 gates BUILD to eval-related tasks only, but does NOT block REVIEW/OVERSEE/STRATEGIZE
- Override: if `NIGHTSHIFT_FORCE_ROLE` env var is set, skip scoring and use that role

</scoring_rules>

Output your decision:

```
ROLE DECISION
=============
System signals:
eval_score: NN/100
consecutive_builds: N
sessions_since_review: N
sessions_since_strategy: N
pending_tasks: N
stale_tasks: N
healer_status: [status]

Scoring:
BUILD: NN (breakdown)
REVIEW: NN (breakdown)
OVERSEE: NN (breakdown)
STRATEGIZE: NN (breakdown)

-> [ROLE] this session because [one sentence reason]
```

---

## PHASE 3: EXECUTE

Based on your decision, read ONE of these prompt files and follow it end-to-end:

| Role | Prompt file | What you do |
|------|-------------|-------------|
| BUILD | `docs/prompt/evolve.md` | Pick a task, build it, test it, PR it, merge it, update all docs |
| REVIEW | `docs/prompt/review.md` | Pick one file, review it, fix quality issues, PR, merge |
| OVERSEE | `docs/prompt/overseer.md` | Audit the task queue, fix priorities, cull duplicates, clean up |
| STRATEGIZE | `docs/prompt/strategist.md` | Review the big picture, write a strategy report |

**Read the prompt file now and follow it step by step.** Do NOT read the other role prompts. One role per session.

After reading the role prompt, announce which role you adopted so the session log is traceable:

```
EXECUTING ROLE: [BUILD/REVIEW/OVERSEE/STRATEGIZE]
```

---

<examples>

<example>
Scenario: eval score is 66/100, 3 consecutive builds, 45 pending tasks

ROLE DECISION
=============
System signals:
eval_score: 66/100
consecutive_builds: 3
sessions_since_review: 3
sessions_since_strategy: 8
pending_tasks: 45
stale_tasks: 1
healer_status: caution

Scoring:
BUILD: 60 (50 base -40 eval gate +20 urgent eval tasks = 30... wait, no urgent. 50 -40 = 10)
REVIEW: 10 (10 base, builds < 5, no healer concern, review < 10)
OVERSEE: 10 (10 base, tasks < 50, stale < 3)
STRATEGIZE: 5 (5 base, strategy < 15 sessions ago)

-> BUILD this session because eval score 66 < 80 gates me to eval-related tasks. Picking the highest-impact eval fix to push toward 80.
</example>

<example>
Scenario: eval score is 85/100, 7 consecutive builds, 62 pending tasks, 4 stale

ROLE DECISION
=============
System signals:
eval_score: 85/100
consecutive_builds: 7
sessions_since_review: 7
sessions_since_strategy: 12
pending_tasks: 62
stale_tasks: 4
healer_status: good

Scoring:
BUILD: 80 (50 +30 eval healthy)
REVIEW: 50 (10 +40 consecutive builds >= 5)
OVERSEE: 100 (10 +50 pending >= 50 +40 stale >= 3)
STRATEGIZE: 5 (5 base, strategy < 15)

-> OVERSEE this session because 62 pending tasks with 4 stale. Queue needs cleanup before more building adds noise.
</example>

<example>
Scenario: eval score 82, 2 builds since last review, 18 sessions since strategy

ROLE DECISION
=============
System signals:
eval_score: 82/100
consecutive_builds: 2
sessions_since_review: 2
sessions_since_strategy: 18
pending_tasks: 35
stale_tasks: 0
healer_status: good

Scoring:
BUILD: 80 (50 +30 eval healthy)
REVIEW: 10 (10 base)
OVERSEE: 10 (10 base)
STRATEGIZE: 65 (5 +60 overdue by 3 sessions)

-> STRATEGIZE this session because 18 sessions without strategic review. Everything else is healthy -- time for big picture analysis.
</example>

</examples>
36 changes: 32 additions & 4 deletions scripts/daemon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ KEEP_HEALER_ENTRIES="${NIGHTSHIFT_KEEP_HEALER_ENTRIES:-50}"
LOG_DIR="$REPO_DIR/docs/sessions"
INDEX_FILE="$LOG_DIR/index.md"
AUTO_PREFIX="$REPO_DIR/docs/prompt/evolve-auto.md"
UNIFIED_PROMPT="$REPO_DIR/docs/prompt/unified.md"
EVOLVE_PROMPT="$REPO_DIR/docs/prompt/evolve.md"
PENTEST_PROMPT_FILE="$REPO_DIR/docs/prompt/pentest.md"
LOCKFILE="$REPO_DIR/.nightshift-daemon.lock"
Expand Down Expand Up @@ -70,14 +71,14 @@ if [ ! -f "$INDEX_FILE" ]; then
{
echo "# Session Index"
echo ""
echo "| Timestamp | Session | Exit | Duration | Cost | Status | Feature | PR |"
echo "|-----------|---------|------|----------|------|--------|---------|-----|"
echo "| Timestamp | Session | Role | Exit | Duration | Cost | Status | Feature | PR |"
echo "|-----------|---------|------|------|----------|------|--------|---------|-----|"
} > "$INDEX_FILE"
fi

build_prompt() {
cat "$AUTO_PREFIX"
cat "$EVOLVE_PROMPT"
cat "$UNIFIED_PROMPT"
Comment on lines 79 to +81
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove BUILD-only preface from unified role prompt

The unified daemon still prepends docs/prompt/evolve-auto.md before docs/prompt/unified.md; that preface contains mandatory BUILD-specific instructions (e.g., task-selection/build-step directives and “Follow the evolve prompt exactly”). When scoring picks REVIEW/OVERSEE/STRATEGIZE, the agent receives conflicting hard instructions, which can cause it to run the BUILD workflow instead of the selected role and undermine the unified role scheduler.

Useful? React with 👍 / 👎.

}

build_pentest_prompt() {
Expand Down Expand Up @@ -279,6 +280,33 @@ print(format_session_cost(entry))
STATUS="failed (exit $EXIT_CODE; pentest: ${PENTEST_STATUS})"
fi

# Extract role from log (best-effort, works for both Claude and Codex)
SESSION_ROLE=$(python3 -c "
import json, sys, re
for line in open('$LOG_FILE'):
try:
e = json.loads(line.strip())
# Claude format
if e.get('type') == 'assistant':
for b in e.get('message', {}).get('content', []):
t = b.get('text', '')
m = re.search(r'EXECUTING ROLE:\s*(BUILD|REVIEW|OVERSEE|STRATEGIZE)', t)
if m:
print(m.group(1).lower())
sys.exit(0)
# Codex format
if e.get('type') == 'item.completed':
item = e.get('item', {})
if item.get('type') == 'agent_message':
t = item.get('text', '')
m = re.search(r'EXECUTING ROLE:\s*(BUILD|REVIEW|OVERSEE|STRATEGIZE)', t)
if m:
print(m.group(1).lower())
sys.exit(0)
except: pass
print('build')
" 2>/dev/null || echo "build")

# Extract feature name and PR from log (best-effort)
FEATURE=$(python3 -c "
import json, sys
Expand Down Expand Up @@ -315,7 +343,7 @@ print('-')
run_evaluation "$AGENT" "$FEATURE"
fi

echo "| $(date '+%Y-%m-%d %H:%M') | $SESSION_ID | $EXIT_CODE | ${DURATION_MIN}m | \$$COST_USD | ${STATUS}${PROMPT_TAMPERED} | $FEATURE | $PR_URL |" >> "$INDEX_FILE"
echo "| $(date '+%Y-%m-%d %H:%M') | $SESSION_ID | $SESSION_ROLE | $EXIT_CODE | ${DURATION_MIN}m | \$$COST_USD | ${STATUS}${PROMPT_TAMPERED} | $FEATURE | $PR_URL |" >> "$INDEX_FILE"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep session-index schema compatible with analytics parser

This row writer now inserts a Role column, but nightshift/costs.py::_parse_session_index still reads duration from cells[3] and feature from cells[6] (legacy layout). With the new layout, those indexes map to exit and status, so new sessions are parsed with duration 0 and feature values like success, which corrupts cost_analysis('docs/sessions') outputs and any cost/health decisions based on them.

Useful? React with 👍 / 👎.


# --- Budget check ---
if [ "$BUDGET" != "0" ]; then
Expand Down
Loading