Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/changelog/v0.0.7.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Security hardening for running against untrusted target repositories.

- **[feat]** Human escalation channel (`notify_human`). When the daemon hits a situation requiring human attention (circuit breaker tripped, budget limit reached, healer critical pattern), it creates a GitHub issue with the `needs-human` label and optionally fires a webhook. Configurable via `notification_webhook` in `.nightshift.json`. Wired into all 3 looping daemon circuit breakers, builder budget stop, and healer prompt. Fails silently -- never crashes the daemon. New `needs-human` label on repo. (`scripts/lib-agent.sh`: `notify_human()`; `scripts/daemon.sh`, `scripts/daemon-review.sh`, `scripts/daemon-overseer.sh`; `nightshift/types.py`, `nightshift/constants.py`, `nightshift/config.py`; task #0048)

- **[meta]** Generate Work step (Step 6n) in evolve.md. After each session, the agent now scans the system across 7 dimensions (meta pipeline, code quality, repo health, architecture, agent DX, vision progress, security) and creates 1-5 follow-up tasks. Prevents the agent from being a passive task runner -- it actively identifies work. Enforces dimension diversity, duplicate checking, specific acceptance criteria, and a per-session cap of 5 tasks. Moves Meta-Prompt "Priority engine" from 0% to 50%. (task #0053)

## Fixed

- **[cost]** Codex/OpenAI sessions now produce non-zero cost estimates. Added pricing for gpt-5.4 ($2.50/$15.00 per MTok), gpt-5.4-mini ($0.75/$4.50), and gpt-5.4-nano ($0.20/$1.25) to `MODEL_PRICING`. `parse_session_tokens()` now handles Codex `turn.completed` events (field mapping: `cached_input_tokens` -> cache_read, input adjusted to exclude cached). `record_session()` uses `AGENT_DEFAULT_MODELS` as model fallback when logs lack model identifiers. (`nightshift/constants.py`, `nightshift/costs.py`; task #0039)
Expand Down
74 changes: 74 additions & 0 deletions docs/handoffs/0030.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Handoff #0030
**Date**: 2026-04-05
**Version**: v0.0.7 in progress

## What I Built
- **Task #0053** (Agent generates its own tasks across all dimensions): Added Step 6n "Generate Work (ALWAYS)" to `docs/prompt/evolve.md`. After each session, the agent now scans the system across 7 dimensions (meta pipeline, code quality, repo health, architecture, agent DX, vision progress, security) and creates 1-5 new tasks. Includes constraints: max 5 per session, duplicate checking, dimension diversity, specific acceptance criteria required, honest priority. Also updated the Step 12 report template to include a "Generated tasks" section.
- Files modified: `docs/prompt/evolve.md`
- Generated tasks this session: #0057 (task queue summary command), #0058 (task file frontmatter validator)

## Decisions Made
- Placed the new step as Step 6n (subsection of "Update Every Document") rather than a new top-level step to avoid renumbering all subsequent steps and breaking cross-references in CLAUDE.md and daemon scripts
- Used a table format for the 7 dimensions to make scanning fast for LLMs
- Set max 5 tasks per session -- enough to be useful, not enough to flood the queue
- Required spanning at least 2 different dimensions to prevent tunnel vision
- Priority engine marked at 50% (not 100%) because the current implementation is prompt-based; a true priority engine would also analyze historical patterns programmatically

## Known Issues
- Task #0012 (Phractal re-validation) still pending -- needs API access
- v0.0.6 release not yet tagged
- Codex `.git/` sandbox issue untested
- OpenAI pricing should be re-verified periodically; rates change
- Healer has not been tested in a real daemon run yet
- `notify_human` has not been tested with a live webhook
- Tasks #0024 and #0036 have malformed YAML frontmatter (missing/broken status field) -- #0058 would catch this

## Current State
- Loop 1: 100% (22/22)
- Loop 2: 63% (7/11) -- unchanged
- Self-Maintaining: 59% -- unchanged
- Meta-Prompt: 68% (was 61%) -- Priority engine 0% -> 50%
- Overall: 78% (was 77%)
- Version: v0.0.7 in progress
- Test count: 663

## Tracker delta: 77% -> 78% (Meta-Prompt 61% -> 68%)

## Evaluate
Run evaluation against Phractal for the changes merged this session.

## Tasks I Did NOT Pick and Why
- #0012: blocked (environment: integration -- needs API access)
- #0018: low priority, v0.0.6 target
- #0028: blocked (environment: integration)
- #0029: blocked (environment: integration)
- #0031: normal priority, v0.0.7 -- task #0053 was urgent, took precedence
- #0032: environment: integration -- skipped per rules
- #0033: normal priority, v0.0.7 -- task #0053 was urgent, took precedence
- #0036: appears already done (malformed frontmatter shows `## status: done`)
- #0038: low priority, v0.0.8
- #0040: normal priority, v0.0.7 -- task #0053 was urgent, took precedence
- #0041: low priority, v0.0.8
- #0042: low priority, v0.0.8
- #0044: low priority, v0.0.8
- #0045: low priority, v0.0.8
- #0047: normal priority, v0.0.8
- #0049: normal priority, v0.0.8
- #0050: normal priority, v0.0.8
- #0051: low priority, v0.0.9
- #0052: normal priority, v0.0.8
- #0054: normal priority, v0.0.8
- #0055: low priority, v0.0.8
- #0056: low priority, v0.0.8

## Next Session Should
Tasks: #0031, #0033, #0040
1. **Task #0031** (normal, v0.0.7) -- Task queue vision-alignment check. Prevents consecutive tasks from all targeting the same vision section.
2. **Task #0033** (normal, v0.0.7) -- Learnings verification. Require agents to quote specific learnings in status reports.
3. **Task #0040** (normal, v0.0.7) -- Create CONTRIBUTING.md for agent-to-agent collaboration.

## Where to Look
- `docs/prompt/evolve.md` lines ~283-330 -- new Step 6n "Generate Work"
- `docs/prompt/evolve.md` lines ~448-450 -- updated report template with "Generated tasks"
- `docs/tasks/0057.md` -- generated task: queue summary command
- `docs/tasks/0058.md` -- generated task: frontmatter validator
59 changes: 28 additions & 31 deletions docs/handoffs/LATEST.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,38 @@
# Handoff #0029
# Handoff #0030
**Date**: 2026-04-05
**Version**: v0.0.7 in progress

## What I Built
- **Task #0048** (Human escalation channel -- gh issue create + optional webhook): Added `notify_human()` function to `scripts/lib-agent.sh` that creates GitHub issues with the `needs-human` label and optionally fires a webhook. Wired into all 3 looping daemon circuit breakers, builder budget stop, and healer prompt (Step 5 escalation for critical patterns). Added `notification_webhook` to `NightshiftConfig`, `DEFAULT_CONFIG`, `config.py`, and `.nightshift.json.example`. Created `needs-human` label on the GitHub repo. Documented in `docs/ops/DAEMON.md`.
- Files modified: `scripts/lib-agent.sh`, `scripts/daemon.sh`, `scripts/daemon-review.sh`, `scripts/daemon-overseer.sh`, `nightshift/types.py`, `nightshift/constants.py`, `nightshift/config.py`, `nightshift/profiler.py`, `.nightshift.json.example`, `docs/prompt/healer.md`, `docs/ops/DAEMON.md`, `tests/test_nightshift.py`
- Tests: +4 new, 663 total passing
- **Task #0053** (Agent generates its own tasks across all dimensions): Added Step 6n "Generate Work (ALWAYS)" to `docs/prompt/evolve.md`. After each session, the agent now scans the system across 7 dimensions (meta pipeline, code quality, repo health, architecture, agent DX, vision progress, security) and creates 1-5 new tasks. Includes constraints: max 5 per session, duplicate checking, dimension diversity, specific acceptance criteria required, honest priority. Also updated the Step 12 report template to include a "Generated tasks" section.
- Files modified: `docs/prompt/evolve.md`
- Generated tasks this session: #0057 (task queue summary command), #0058 (task file frontmatter validator)

## Decisions Made
- `notify_human` fails silently (all calls wrapped in `|| true`) -- daemon stability is more important than notification delivery
- GitHub issue title prefixed with `[Nightshift]` for easy filtering
- Webhook payload uses `{"text": "..."}` format (compatible with Slack, Discord, and most webhook services)
- Healer only escalates for "concern" health status -- issues fixable by builder tasks should NOT trigger escalation
- Wired into reviewer and overseer circuit breakers too, not just builder
- Placed the new step as Step 6n (subsection of "Update Every Document") rather than a new top-level step to avoid renumbering all subsequent steps and breaking cross-references in CLAUDE.md and daemon scripts
- Used a table format for the 7 dimensions to make scanning fast for LLMs
- Set max 5 tasks per session -- enough to be useful, not enough to flood the queue
- Required spanning at least 2 different dimensions to prevent tunnel vision
- Priority engine marked at 50% (not 100%) because the current implementation is prompt-based; a true priority engine would also analyze historical patterns programmatically

## Known Issues
- Task #0012 (Phractal re-validation) still pending -- needs API access
- v0.0.6 release not yet tagged
- Codex `.git/` sandbox issue untested
- OpenAI pricing should be re-verified periodically; rates change
- Healer has not been tested in a real daemon run yet (first real test will be next daemon cycle)
- `notify_human` has not been tested with a live webhook -- works silently without one
- Healer has not been tested in a real daemon run yet
- `notify_human` has not been tested with a live webhook
- Tasks #0024 and #0036 have malformed YAML frontmatter (missing/broken status field) -- #0058 would catch this

## Current State
- Loop 1: 100% (22/22)
- Loop 2: 63% (7/11) -- unchanged
- Self-Maintaining: 59% (was 57%) -- feedback loop 40% -> 60% (notify_human closes the escalation path)
- Meta-Prompt: 61% -- unchanged
- Overall: 77% (weighted, unchanged -- Self-Maintaining delta too small to move the rounded total)
- Self-Maintaining: 59% -- unchanged
- Meta-Prompt: 68% (was 61%) -- Priority engine 0% -> 50%
- Overall: 78% (was 77%)
- Version: v0.0.7 in progress
- Test count: 663

## Tracker delta: 77% -> 77% (Self-Maintaining 57% -> 59%)
## Tracker delta: 77% -> 78% (Meta-Prompt 61% -> 68%)

## Evaluate
Run evaluation against Phractal for the changes merged this session.
Expand All @@ -41,12 +42,12 @@ Run evaluation against Phractal for the changes merged this session.
- #0018: low priority, v0.0.6 target
- #0028: blocked (environment: integration)
- #0029: blocked (environment: integration)
- #0031: normal priority, v0.0.7 -- skipped because #0048 is urgent
- #0031: normal priority, v0.0.7 -- task #0053 was urgent, took precedence
- #0032: environment: integration -- skipped per rules
- #0033: normal priority -- skipped because #0048 is urgent
- #0036: pending -- not reviewed this session
- #0033: normal priority, v0.0.7 -- task #0053 was urgent, took precedence
- #0036: appears already done (malformed frontmatter shows `## status: done`)
- #0038: low priority, v0.0.8
- #0040: normal priority -- skipped because #0048 is urgent
- #0040: normal priority, v0.0.7 -- task #0053 was urgent, took precedence
- #0041: low priority, v0.0.8
- #0042: low priority, v0.0.8
- #0044: low priority, v0.0.8
Expand All @@ -56,22 +57,18 @@ Run evaluation against Phractal for the changes merged this session.
- #0050: normal priority, v0.0.8
- #0051: low priority, v0.0.9
- #0052: normal priority, v0.0.8
- #0053: urgent v0.0.8 -- next highest priority after #0048
- #0054: normal priority, v0.0.8
- #0055: low priority, v0.0.8
- #0056: low priority, v0.0.8

## Next Session Should
Tasks: #0053, #0031, #0033
1. **Task #0053** (urgent) -- Agent generates its own tasks across all dimensions. Add a "generate work" step to evolve.md.
2. **Task #0031** (normal, v0.0.7) -- Task queue vision-alignment check. Prevents all tasks targeting the same section.
3. **Task #0033** (normal) -- whatever the task description says.
Tasks: #0031, #0033, #0040
1. **Task #0031** (normal, v0.0.7) -- Task queue vision-alignment check. Prevents consecutive tasks from all targeting the same vision section.
2. **Task #0033** (normal, v0.0.7) -- Learnings verification. Require agents to quote specific learnings in status reports.
3. **Task #0040** (normal, v0.0.7) -- Create CONTRIBUTING.md for agent-to-agent collaboration.

## Where to Look
- `scripts/lib-agent.sh` lines 502-530 -- `notify_human()` function
- `scripts/daemon.sh` -- circuit breaker (line ~280) and budget stop (line ~262) call `notify_human`
- `scripts/daemon-review.sh`, `scripts/daemon-overseer.sh` -- circuit breaker calls
- `docs/prompt/healer.md` Step 5 -- healer escalation instructions
- `docs/ops/DAEMON.md` "Human Escalation" section
- `nightshift/types.py` line 31 -- `notification_webhook` field
- `nightshift/config.py` -- validation for notification_webhook
- `docs/prompt/evolve.md` lines ~283-330 -- new Step 6n "Generate Work"
- `docs/prompt/evolve.md` lines ~448-450 -- updated report template with "Generated tasks"
- `docs/tasks/0057.md` -- generated task: queue summary command
- `docs/tasks/0058.md` -- generated task: frontmatter validator
9 changes: 9 additions & 0 deletions docs/learnings/2026-04-05-generate-work-placement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
type: optimization
date: 2026-04-05
session: 0030
---

# New evolve.md steps go INSIDE Step 6 as subsections, not as new top-level steps

When adding a new capability to the evolve prompt (like "generate work"), inserting it as a new top-level step (Step 13, Step 14, etc.) forces renumbering all subsequent steps and breaks cross-references in CLAUDE.md, daemon scripts, and the autonomous override prompt. Instead, add it as a subsection under Step 6 (e.g., 6n, 6o). This keeps the step numbering stable while still making the new capability visible and mandatory. The Step 6 section is "Update Every Document" -- any per-session administrative action fits naturally here.
1 change: 1 addition & 0 deletions docs/learnings/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Read this file FIRST. Only open individual learning files when they are relevant
- [Task selection is mesa-optimization](2026-04-04-task-selection-mesa-optimization.md) — Agent optimizes session success over project progress; queue order is authoritative, handoff is advisory
- [Merge strategy: --merge never --squash](2026-04-03-merge-never-squash.md) — Always --merge --admin, preserve all commits on main
- [Turn budget kills good sessions](2026-04-03-turn-budget-kills-sessions.md) — 500 max turns = silent death mid-work; keep context lean
- [New evolve steps go inside Step 6](2026-04-05-generate-work-placement.md) — Add as subsection (6n, 6o) to avoid renumbering and breaking cross-references
- [Open PR recovery](2026-04-03-open-pr-recovery.md) — Daemon detects open PRs from crashed sessions and recovers them

## Code Patterns
Expand Down
45 changes: 45 additions & 0 deletions docs/prompt/evolve.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ Write `docs/handoffs/NNNN.md` (increment from the last number). Follow the exact

**Required sections in every handoff:**
- "Tracker delta: XX% -> XX%" (makes project progress visible)
- "Generated tasks: [list #NNNN titles, or 'none']" (from Step 6n — what work you identified)
- "Tasks I did NOT pick and why:" (skip accountability — list every pending task you read and chose not to build, with the reason)

### 6c. Changelog (ALWAYS except docs-only changes)
Expand Down Expand Up @@ -286,6 +287,46 @@ Check `docs/ops/OPERATIONS.md` version milestones:
- If yes: prepare for release (tag, changelog status, new version file)
- If no: note in the handoff what's still needed

### 6n. Generate Work (ALWAYS)

You are not a task runner. You are the engineer who owns this system. Before ending the session, step back and look at the system from every angle. Create 1-5 new tasks based on what you observe.

**How to scan:**
1. Read the vision tracker. What sections are furthest behind? What would move the percentage?
2. Scan `docs/sessions/index.md`, the last 3-5 entries. Any repeating patterns or stuck areas?
3. Think about friction you hit THIS session. What slowed you down? What was confusing?
4. Think about the meta layer. Are prompts bloated? Are handoffs useful? Is the task system working?
5. Scan for TODOs, hacks, or weak spots in any code you touched.

**Dimensions to consider** (create tasks across different ones, not all the same type):

| Dimension | Example questions |
|---|---|
| Meta / autonomous pipeline | Daemon reliability? Prompt staleness? Cost trending? Sessions stuck in patterns? |
| Code quality | Modules too big? Functions untested? Loose types? Dead code? Cryptic errors? |
| Repo health | CI speed? Dependency freshness? Test coverage drift? Flaky tests? Doc accuracy? |
| Architecture | Circular deps? Module tangles? Abstractions earning their keep? Config bloat? |
| Agent DX | CLAUDE.md accurate? Learnings applied? Handoff format effective? Cold-start speed? |
| Vision progress | Low-hanging tracker items? Blocked items unblockable? Avoided areas? |
| Security / robustness | Edge cases that crash? Input validation gaps? Auto-merge exploitable? Secrets exposed? |

**Constraints:**
- **Max 5 tasks per session.** Quality over quantity. Do not flood the queue.
- **Check for duplicates first.** Scan all pending tasks in `docs/tasks/`. If a task already covers your idea, skip it or update the existing task instead.
- **Span multiple dimensions.** If you create 3 tasks, they should not all be "code quality." Spread across at least 2 different dimensions.
- **Specific acceptance criteria required.** "Improve error handling" is not a task. "Add structured error types to config.py with specific messages for each validation failure" is.
- **Honest priority.** Not everything is urgent. Most generated tasks are `normal` or `low`.
- **Use `.next-id`** for task numbering (same as always -- read, use, increment, commit).

**Output in the session:**
```
GENERATED TASKS
===============
#NNNN: [title] (dimension: [which], priority: [level])
#NNNN: [title] (dimension: [which], priority: [level])
...or "No new tasks -- queue already covers what I observed."
```

## STEP 7 — PRE-PUSH CHECKLIST

Before touching git, read `docs/ops/PRE-PUSH-CHECKLIST.md` and run through every item. This is mandatory. Answer each item honestly. If anything fails, fix it before proceeding. Output your checklist results:
Expand Down Expand Up @@ -442,6 +483,10 @@ Manual test suggestion:

Tracker delta: [XX% -> XX%] (or "no change" if cleanup only)

Generated tasks:
- #NNNN: [title] (dimension: [which])
...or "No new tasks"

Tasks I did NOT pick and why:
- #NNNN: [reason — blocked-environment, blocked-dependency, or explicit justification]

Expand Down
2 changes: 1 addition & 1 deletion docs/tasks/.next-id
Original file line number Diff line number Diff line change
@@ -1 +1 @@
57
59
4 changes: 2 additions & 2 deletions docs/tasks/0053.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
status: pending
status: done
priority: urgent
target: v0.0.8
created: 2026-04-05
completed:
completed: 2026-04-05
---

# Agent generates its own tasks across all dimensions
Expand Down
35 changes: 35 additions & 0 deletions docs/tasks/0057.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
status: pending
priority: low
target: v0.0.8
created: 2026-04-05
completed:
---

# Task queue summary command -- make tasks or scripts/list-tasks.sh

Every builder session starts by scanning docs/tasks/ to find pending work. This currently requires a custom bash loop or reading files one by one. A dedicated script would save 2-3 minutes per session and reduce the chance of missing a task.

## What to build

A `scripts/list-tasks.sh` script (and/or `make tasks` target) that outputs a formatted table of all pending/blocked/in-progress tasks, sorted by priority then number. Include: task number, status, priority, target version, environment tag, and title.

Example output:
```
TASK QUEUE
==========
0053 [pending] urgent v0.0.8 Agent generates its own tasks
0031 [pending] normal v0.0.7 Task queue vision-alignment check
0033 [pending] normal v0.0.7 Learnings verification
0040 [pending] normal v0.0.7 Create CONTRIBUTING.md
0012 [blocked] normal v0.0.4 integr. Re-validate against Phractal
...
```

## Acceptance Criteria
- [ ] `scripts/list-tasks.sh` exists and runs without errors
- [ ] Output sorted by priority (urgent > normal > low) then by task number
- [ ] Shows status, priority, target, environment, and title
- [ ] Skips done/archived tasks
- [ ] `make tasks` target added to Makefile
- [ ] Works with zero tasks (empty queue message)
Loading