Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ make strategist # strategist (runs once, advises human)
## Daemons

Four daemons, one runs at a time (shared lockfile). Full guide: `docs/ops/DAEMON.md`
- **Builder** (`make daemon`): picks up tasks, builds features, PRs, merges
- **Builder** (`make daemon`): picks up tasks, builds features, PRs, merges. Includes a **healer** step between sessions that observes system health, identifies patterns, and creates follow-up tasks (`docs/prompt/healer.md`).
- **Reviewer** (`make review`): reviews code file by file, fixes quality
- **Overseer** (`make overseer`): audits task queue, fixes priorities, cleans duplicates, catches direction problems
- **Strategist** (`make strategist`): runs once, reviews big picture, produces report for human
Expand Down
3 changes: 3 additions & 0 deletions docs/changelog/v0.0.7.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ Security hardening for running against untrusted target repositories.

- **[security]** New-file detection in prompt self-modification guard: `save_prompt_snapshots()` now captures directory listings for watched directories (`docs/prompt/`). After a cycle, `check_prompt_integrity()` compares the listings and flags any new files the agent created. Detects both new files in existing directories and entirely new directories. New `PROMPT_GUARD_DIRS` array in `scripts/lib-agent.sh` makes it easy to add more watched directories. (task #0037)

- **[feat]** Healer -- meta-layer observer inside the daemon loop. A lightweight agent call that runs between sessions in the builder daemon, after housekeeping and before the builder starts. Reads the last handoff, session index, task queue, vision tracker, and learnings to identify patterns, trends, and upcoming problems. Writes observations to `docs/healer/log.md` and creates follow-up tasks in `docs/tasks/` using `.next-id`. Persists outputs via branch+PR+merge through `persist_healer_changes()`. Max 15 turns to stay fast and cheap. Protected by prompt guard. (`docs/prompt/healer.md`, `scripts/daemon.sh`, `scripts/lib-agent.sh`; task #0046)

## Fixed

- **[cost]** Codex/OpenAI sessions now produce non-zero cost estimates. Added pricing for gpt-5.4 ($2.50/$15.00 per MTok), gpt-5.4-mini ($0.75/$4.50), and gpt-5.4-nano ($0.20/$1.25) to `MODEL_PRICING`. `parse_session_tokens()` now handles Codex `turn.completed` events (field mapping: `cached_input_tokens` -> cache_read, input adjusted to exclude cached). `record_session()` uses `AGENT_DEFAULT_MODELS` as model fallback when logs lack model identifiers. (`nightshift/constants.py`, `nightshift/costs.py`; task #0039)
Expand All @@ -46,3 +48,4 @@ Security hardening for running against untrusted target repositories.
- 19 new tests covering `read_repo_instructions()` (empty dir, single file, multiple files, missing files, empty files, content preservation, nested paths), `wrap_repo_instructions()` (empty input, whitespace, preamble/suffix ordering, behavioral warnings), and prompt injection protection in `build_prompt()` (omitted when empty, wrapped when present, old instruction removed, shift log preserved, ordering, adversarial content handling)
- 20 new tests for Codex/OpenAI cost tracking: `parse_session_tokens()` Codex format (turn.completed parsing, model_hint fallback, no-cached tokens, missing usage, mixed events, missing file with hint), `calculate_cost()` for gpt-5.4/mini/nano pricing (input, output, cache read, mixed), `record_session()` with Codex agent (default model resolution, non-zero cost), constants (pricing entries, AGENT_DEFAULT_MODELS)
- 14 new tests for handoff compaction: below threshold no-op, at/above threshold compacts, weekly summary format, decisions/issues preservation, ignores non-numbered files, duplicate weekly suffix, nonexistent/empty dir, custom threshold, weekly dir auto-creation, session range ordering, ISO week in filename
- 9 new tests for healer infrastructure: prompt file existence, required sections (handoffs, sessions, tasks, tracker, log), boundary definitions, log file existence, daemon.sh wiring (references healer, calls persist, healer before prompt guard), prompt guard inclusion, persist function definition, max-turns cap
73 changes: 73 additions & 0 deletions docs/handoffs/0028.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Handoff #0028
**Date**: 2026-04-05
**Version**: v0.0.7 in progress

## What I Built
- **Task #0046** (Build the Healer -- meta-layer observer inside the daemon loop): A lightweight agent call that runs between sessions in the builder daemon. Observes system health by reading the last handoff, session index, task queue, vision tracker, and learnings. Identifies patterns and trends, writes observations to `docs/healer/log.md`, and creates follow-up tasks in `docs/tasks/` when it spots issues. Persists its outputs via branch+PR+merge through `persist_healer_changes()` in `lib-agent.sh`.
- Files created: `docs/prompt/healer.md`, `docs/healer/log.md`
- Files modified: `scripts/daemon.sh`, `scripts/lib-agent.sh`, `tests/test_nightshift.py`, `docs/tasks/0046.md`
- Tests: +9 new, 659 total passing

## Decisions Made
- Healer runs only in builder daemon (not reviewer/overseer) -- those have their own observation models
- Max turns = 15 (fast, cheap -- the healer should think, not build)
- Persistence via branch+PR+merge (follows project workflow, no direct push to main)
- Healer prompt is self-contained: agent reads files itself, creates tasks using .next-id
- Added healer.md to PROMPT_GUARD_FILES so it cannot be self-modified by agents
- Healer log is committed to repo (not gitignored) so observations persist in git history

## Known Issues
- Task #0012 (Phractal re-validation) still pending -- needs API access
- v0.0.6 release not yet tagged
- Codex `.git/` sandbox issue untested
- OpenAI pricing should be re-verified periodically; rates change
- Healer has not been tested in a real daemon run yet (first real test will be next daemon cycle)

## Current State
- Loop 1: 100% (22/22)
- Loop 2: 63% (7/11) -- unchanged
- Self-Maintaining: 57% (was 54%) -- feedback loop 10% -> 40% (healer is the feedback loop)
- Meta-Prompt: 61% (was 57%) -- session history/learning 0% -> 25% (healer reads history, writes observations)
- Overall: 77% (was 76%) (weighted)
- Version: v0.0.7 in progress
- Test count: 659

## Tracker delta: 76% -> 77%

## Evaluate
Run evaluation against Phractal for the changes merged this session.

## Tasks I Did NOT Pick and Why
- #0012: blocked (environment: integration -- needs API access)
- #0018: low priority, v0.0.6 target
- #0028: blocked (environment: integration)
- #0029: blocked (environment: integration)
- #0031: normal priority -- skipped because #0046 is urgent
- #0032: environment: integration -- skipped per rules
- #0033: normal priority -- skipped because #0046 is urgent
- #0038: low priority, v0.0.8
- #0040: normal priority -- skipped because #0046 is urgent
- #0041: low priority, v0.0.8
- #0042: low priority, v0.0.8
- #0044: low priority, v0.0.8
- #0045: low priority, v0.0.8
- #0047: normal priority, v0.0.8
- #0048: urgent v0.0.8 but higher number than #0046
- #0049: normal priority, v0.0.8
- #0050: normal priority, v0.0.8
- #0051: low priority, v0.0.9
- #0052: normal priority, v0.0.8
- #0053: urgent v0.0.8 but higher number than #0046

## Next Session Should
Tasks: #0048, #0053, #0031
1. **Task #0048** (urgent) -- Human escalation channel. Wire notify_human into healer.
2. **Task #0053** (urgent) -- Agent generates its own tasks across all dimensions.
3. **Task #0031** (normal, v0.0.7) -- Task queue vision-alignment check.

## Where to Look
- `docs/prompt/healer.md` -- full healer prompt
- `docs/healer/log.md` -- observation journal (empty until first real run)
- `scripts/daemon.sh` lines 128-144 -- healer call in main loop
- `scripts/lib-agent.sh` lines 462-497 -- persist_healer_changes function
- `scripts/lib-agent.sh` line 30 -- healer.md in PROMPT_GUARD_FILES
74 changes: 50 additions & 24 deletions docs/handoffs/LATEST.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,73 @@
# Handoff #0027
**Date**: 2026-04-04
# Handoff #0028
**Date**: 2026-04-05
**Version**: v0.0.7 in progress

## What I Built
- **Task #0030** (Automated handoff compaction in daemon): New Python module `nightshift/compact.py` with `compact_handoffs()` function that auto-compacts numbered handoff files into weekly summaries when 7+ accumulate. Shell wrapper `compact_handoffs()` in `lib-agent.sh` called by all 3 looping daemon scripts (builder, reviewer, overseer) before each cycle. Parses handoff markdown to extract session numbers, dates, versions, what was built, decisions, known issues, and state. Generates weekly summaries following the exact format in `docs/handoffs/README.md`. Handles duplicate weekly filenames with letter suffixes.
- Also merged PR #42 (README rewrite) from previous session.
- Files created: `nightshift/compact.py`
- Files modified: `nightshift/types.py`, `nightshift/constants.py`, `nightshift/__init__.py`, `scripts/lib-agent.sh`, `scripts/daemon.sh`, `scripts/daemon-review.sh`, `scripts/daemon-overseer.sh`, `scripts/install.sh`, `CLAUDE.md`, `tests/test_nightshift.py`
- Tests: +14 new, 650 total passing
- **Task #0046** (Build the Healer -- meta-layer observer inside the daemon loop): A lightweight agent call that runs between sessions in the builder daemon. Observes system health by reading the last handoff, session index, task queue, vision tracker, and learnings. Identifies patterns and trends, writes observations to `docs/healer/log.md`, and creates follow-up tasks in `docs/tasks/` when it spots issues. Persists its outputs via branch+PR+merge through `persist_healer_changes()` in `lib-agent.sh`.
- Files created: `docs/prompt/healer.md`, `docs/healer/log.md`
- Files modified: `scripts/daemon.sh`, `scripts/lib-agent.sh`, `tests/test_nightshift.py`, `docs/tasks/0046.md`
- Tests: +9 new, 659 total passing

## Decisions Made
- Compaction logic in Python (not bash) for testability and markdown parsing robustness
- `compact.py` at same dependency level as `cleanup.py` (depends only on constants and types)
- Threshold is a constant (`HANDOFF_COMPACTION_THRESHOLD = 7`) in constants.py
- Weekly filename collision handled with letter suffixes (b, c, d...) matching existing pattern (week-2026-W14b.md exists)
- Summary uses "Start/End of batch" format for progress section to preserve both endpoints
- Healer runs only in builder daemon (not reviewer/overseer) -- those have their own observation models
- Max turns = 15 (fast, cheap -- the healer should think, not build)
- Persistence via branch+PR+merge (follows project workflow, no direct push to main)
- Healer prompt is self-contained: agent reads files itself, creates tasks using .next-id
- Added healer.md to PROMPT_GUARD_FILES so it cannot be self-modified by agents
- Healer log is committed to repo (not gitignored) so observations persist in git history

## Known Issues
- Task #0012 (Phractal re-validation) still pending -- needs API access
- v0.0.6 release not yet tagged
- Codex `.git/` sandbox issue untested
- OpenAI pricing should be re-verified periodically; rates change
- Healer has not been tested in a real daemon run yet (first real test will be next daemon cycle)

## Current State
- Loop 1: 100% (22/22)
- Loop 2: 63% (7/11) -- unchanged
- Self-Maintaining: 54% (7/13) -- unchanged
- Meta-Prompt: 57% (4/7) -- unchanged
- Overall: 76% (weighted) -- unchanged
- Self-Maintaining: 57% (was 54%) -- feedback loop 10% -> 40% (healer is the feedback loop)
- Meta-Prompt: 61% (was 57%) -- session history/learning 0% -> 25% (healer reads history, writes observations)
- Overall: 77% (was 76%) (weighted)
- Version: v0.0.7 in progress
- Test count: 650
- Test count: 659

## Tracker delta: 76% -> 77%

## Evaluate
Run evaluation against Phractal for the changes merged this session.

## Tasks I Did NOT Pick and Why
- #0012: blocked (environment: integration -- needs API access)
- #0018: low priority, v0.0.6 target
- #0028: blocked (environment: integration)
- #0029: blocked (environment: integration)
- #0031: normal priority -- skipped because #0046 is urgent
- #0032: environment: integration -- skipped per rules
- #0033: normal priority -- skipped because #0046 is urgent
- #0038: low priority, v0.0.8
- #0040: normal priority -- skipped because #0046 is urgent
- #0041: low priority, v0.0.8
- #0042: low priority, v0.0.8
- #0044: low priority, v0.0.8
- #0045: low priority, v0.0.8
- #0047: normal priority, v0.0.8
- #0048: urgent v0.0.8 but higher number than #0046
- #0049: normal priority, v0.0.8
- #0050: normal priority, v0.0.8
- #0051: low priority, v0.0.9
- #0052: normal priority, v0.0.8
- #0053: urgent v0.0.8 but higher number than #0046

## Next Session Should
Tasks: #0029, #0031, #0033
1. **Task #0029** (normal) -- Pin Phractal to a known-good commit for eval stability
2. **Task #0031** (normal) -- Task queue vision-alignment check
3. **Task #0033** (normal) -- Learnings verification in status report
Tasks: #0048, #0053, #0031
1. **Task #0048** (urgent) -- Human escalation channel. Wire notify_human into healer.
2. **Task #0053** (urgent) -- Agent generates its own tasks across all dimensions.
3. **Task #0031** (normal, v0.0.7) -- Task queue vision-alignment check.

## Where to Look
- `nightshift/compact.py` -- entire module, compaction logic
- `scripts/lib-agent.sh` lines 230-255 -- compact_handoffs shell wrapper
- `scripts/daemon.sh` line 125 -- compaction call in daemon loop
- `nightshift/constants.py` line 633 -- HANDOFF_COMPACTION_THRESHOLD
- `docs/prompt/healer.md` -- full healer prompt
- `docs/healer/log.md` -- observation journal (empty until first real run)
- `scripts/daemon.sh` lines 128-144 -- healer call in main loop
- `scripts/lib-agent.sh` lines 462-497 -- persist_healer_changes function
- `scripts/lib-agent.sh` line 30 -- healer.md in PROMPT_GUARD_FILES
5 changes: 5 additions & 0 deletions docs/healer/log.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Healer Log

Observations from the meta-layer observer. Newest entries first.

---
22 changes: 22 additions & 0 deletions docs/learnings/2026-04-05-healer-persistence-needs-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
type: pattern
date: 2026-04-05
session: 0028
---

# Daemon sub-agent outputs need explicit persistence

When adding a lightweight agent call inside the daemon loop (like the healer),
its file outputs (task files, log entries) will be wiped by `git reset --hard
origin/main` at the start of the next cycle unless you persist them.

Options considered:
- Gitignore the output dir: works for logs but task files must be committed
- Direct push to main: violates project rules
- Let builder commit them: fragile, builder might miss untracked files
- Branch+PR+merge: follows project workflow, every step fails gracefully

The branch+PR+merge approach via `persist_healer_changes()` is the right call.
Each step (`checkout -b`, `add`, `commit`, `push`, `pr create`, `pr merge`)
returns 0 on failure so the daemon never crashes. The cost is ~6 git/gh API
calls per cycle, but only when the sub-agent actually created output.
1 change: 1 addition & 0 deletions docs/learnings/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Read this file FIRST. Only open individual learning files when they are relevant
- [Thread config through callers](2026-04-04-thread-config-through-callers.md) — Pass NightshiftConfig down the call chain, don't read inside builders
- [Pre-load instructions in runner not agent](2026-04-04-preload-instructions-not-agent-read.md) — Runner reads repo instruction files and injects into prompt
- [Prompt guard in shared lib](2026-04-04-prompt-guard-in-shared-lib.md) — Cross-cutting daemon concerns go in lib-agent.sh
- [Healer persistence needs workflow](2026-04-05-healer-persistence-needs-workflow.md) — Daemon sub-agent outputs need branch+PR+merge; git reset wipes uncommitted files
- [Reuse planner functions](2026-04-03-reuse-existing-functions.md) — Don't reimplement; import from existing modules
- [Code structure rules work](2026-04-03-code-structure-rules-work.md) — CLAUDE.md structure rules catch real violations at review time
- [Plan agent is simpler than cycle agent](2026-04-04-plan-agent-simpler-than-cycle.md) — Plan invocation needs fewer args than full cycle
Expand Down
118 changes: 118 additions & 0 deletions docs/prompt/healer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Healer -- Meta-Layer Observer

You are the Healer, a lightweight observer inside the Nightshift daemon loop.
You run between sessions, after housekeeping and before the builder starts.

Your job: be the human eye on the system. Notice patterns, trends, and upcoming
problems that no single builder session would catch.

You are NOT a builder. You observe, think, create tasks. The builder fixes.

## Step 1 -- Read

Read these files to understand the current state:

1. `docs/handoffs/LATEST.md` -- What the last session built, decisions, known issues
2. `docs/sessions/index.md` -- Session history (timestamps, exit codes, costs, features)
3. `docs/tasks/*.md` -- Task queue (scan frontmatter: status, priority, created date)
4. `docs/vision-tracker/TRACKER.md` -- Progress toward the vision
5. `docs/learnings/INDEX.md` -- Hard-won knowledge (one-line summaries)
6. `docs/healer/log.md` -- Your previous observations (read to avoid repeating yourself)

Do NOT read the full codebase. Do NOT read every learning file. Skim headers.

## Step 2 -- Think

Ask yourself these questions. Connect dots across sessions:

- **What just happened?** Did the last session build something useful or spin its wheels?
- **What is the trend?** Are sessions getting slower, more expensive, or repetitive?
Look at the last 5+ entries in the session index.
- **What is being avoided?** Are hard tasks aging while easy ones get picked?
Compare task creation dates to what is actually getting done.
- **What is about to break?** Is the task queue running dry? Are docs drifting
from reality? Is a pattern forming that will cause failures?
- **Is the system getting better or worse?** More merged PRs? Fewer failures?
Lower cost per feature? Or regressing?

Think in trends, not point failures. "Test count wrong in handoff" is a point
failure. "Builder has shipped 3 sessions without updating the tracker -- numbers
are drifting and future sessions will start from wrong baselines" is a trend.

## Step 3 -- Write observations

Append a new entry to `docs/healer/log.md` at the END of the file:

```
## YYYY-MM-DD -- Session [last-session-id]

**System health:** [good / caution / concern]

### Observations
- [Specific observation with evidence]
- [Another observation]

### Actions taken
- Created task #NNNN: [title]
- [Or: No tasks needed this cycle]
```

Be specific. "Costs are trending up" is weak. "Last 3 sessions averaged $X.XX
vs $Y.YY the prior 3, driven by longer turn counts" is useful.

## Step 4 -- Create tasks (if needed)

Only create tasks for issues you found. Max 3 tasks per run.

**How to create a task:**
1. Read `docs/tasks/.next-id` for the next number
2. Create `docs/tasks/NNNN.md` with the format below
3. Increment .next-id and write it back
4. Check existing pending tasks first -- no duplicates

**Task format:**
```markdown
---
status: pending
priority: [low / normal / urgent]
target: v0.0.8
created: YYYY-MM-DD
completed:
---

# [Title]

[What and why. Include evidence from your observations.]

## Acceptance Criteria
- [ ] Criterion 1
- [ ] Criterion 2
```

**Priority guide:**
- `urgent`: System is degrading NOW (repeated failures, cost spike, drift)
- `normal`: Pattern that should be addressed within a few sessions
- `low`: Nice-to-have improvement

## Boundaries

**DO** create tasks for meta-layer issues:
- Daemon scripts, prompts, task system, handoff format
- Cost trends, session patterns, velocity
- Documentation drift, stale learnings
- Task queue health (stale tasks, queue running dry)

**DO NOT:**
- Create tasks for nightshift/*.py code improvements (builder's job)
- Modify any code, prompts, or configuration files directly
- Create PRs or branches
- Run tests or make check
- Create tasks for things already tracked as pending tasks

## Step 5 -- Report

End with exactly this format:

```
HEALER: [N] observations, [N] tasks created. System health: [good/caution/concern].
```
2 changes: 1 addition & 1 deletion docs/tasks/.next-id
Original file line number Diff line number Diff line change
@@ -1 +1 @@
54
57
4 changes: 2 additions & 2 deletions docs/tasks/0046.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
status: pending
status: done
priority: urgent
target: v0.0.8
created: 2026-04-05
completed:
completed: 2026-04-05
---

# Build the Healer — meta-layer observer inside the daemon loop
Expand Down
Loading