Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .recursive/strategy/2026-04-09.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Strategy Report -- Session #0139

**Date**: 2026-04-09
**Period analyzed**: Sessions #0122-#0139
**Queue**: 67 pending, 1 blocked urgent task (`#0277`) after the task edits in this branch
**Eval**: last merged 89/100 in session #0134; latest branch-local rerun in #0139 reached 78/100 but stayed unmerged
**Autonomy**: 85/100 | **Tests**: 1222 | **Vision**: 92% overall

## Why the tracker is flat

The visible tracker is flat for two separate reasons.

1. The tracker file itself is stale. `.recursive/vision-tracker/TRACKER.md` still says "Last updated: 2026-04-06" and still shows Self-Maintaining at 68% with four 0% components. That means the dashboard view has not been recalculated to reflect the work done in sessions #0122-#0139.
2. The recent sessions mostly moved runtime hygiene, eval, and queue state instead of the four missing automation components that actually move Self-Maintaining: auto-release, auto-changelog, auto-tracker, and auto-CLAUDE.md. The visible 0% items are still untouched.

The net effect is that session volume is high, but the work is concentrated in areas that do not advance the remaining tracker gaps.

## What is working

1. Non-build roles are now real instead of theoretical. Session #0122 delegated OVERSEE, #0123 delegated STRATEGIZE, and later sessions also used AUDIT and SECURITY (`.recursive/decisions/log.md:98-204`). That directly invalidates the older "never uses oversee" complaint.
2. Eval quality improved materially. Session #0132 reran Phractal at 84/100, session #0134 raised it to 89/100, and auto-clone validation succeeded along the way (`.recursive/commitments/log.md:128-141`).
3. Queue triage works when invoked. Session #0122 cut the queue from 77 to 69, and session #0128 cut 72 to 63 (`.recursive/decisions/log.md:98-138`).
4. The stop rule is protecting the repo from unsafe churn. Session #0139 stopped after two failed review cycles instead of grinding into a third repair loop (`.recursive/handoffs/LATEST.md:8-10, 23-31`).

## What is failing

1. The eval rerun path is blocked, not merely stale. Session #0139 produced PR #274, two branch-local eval reports, and then hit the stop rule because the PR failed review twice and was closed unmerged. The latest session index also shows the 65-minute #0139 run ending in failure (`.recursive/handoffs/LATEST.md:8-25`, `.recursive/sessions/index.md:102-106`).
2. The remaining tracker delta is concentrated in components that are not being attacked. `Auto-release`, `Auto-changelog update`, `Auto-tracker update`, and `Auto-CLAUDE.md update` are still at 0% in the tracker (`.recursive/vision-tracker/TRACKER.md:75-94`).
3. Queue churn still offsets gains. Session #0136 added 5 new security tasks and pushed the queue up by 4, and session #0137 added 3 more follow-up tasks even though all review cycles passed (`.recursive/commitments/log.md:148-156`).
4. The human-facing "flat tracker" symptom is real because the tracker file is not being rewritten on the same cadence as the sessions. The report history exists, but the tracker snapshot does not move with it (`.recursive/vision-tracker/TRACKER.md:3-5, 128-138`).

## What is missing

1. Auto-release, auto-changelog, auto-tracker, and auto-CLAUDE.md are still unstarted in the tracker (`.recursive/vision-tracker/TRACKER.md:90-94`).
2. Feedback ingestion is still 0% in the meta-prompt section (`.recursive/vision-tracker/TRACKER.md:98-114`).
3. There is no explicit "blocked eval vs stale eval" state. Right now the dashboard can say an eval is stale, but it does not stop the brain from treating a blocked eval path as if it were a measurement problem.
4. The queue has no enforced creation cap yet. The observed pattern is still "fixes create follow-ups faster than the queue is being retired" (`.recursive/tasks/0225.md:14-29`).

## Cost and efficiency

1. Session #0139 spent 65 minutes and $10.50 and ended blocked, which is a poor cost-to-progress ratio for a measurement session (`.recursive/sessions/index.md:106`).
2. The review cascade around #0277 consumed two review cycles, produced a merged eval artifact only in the unmerged branch-local sense, and still left the core blocker unresolved (`.recursive/handoffs/LATEST.md:8-25`).
3. Security and hardening sessions are still cost-effective when they close real defects, but they also create follow-up volume. Session #0136 is the clearest example: the security scan found 4 CONFIRMED and 4 THEORETICAL findings and created 5 tasks (`.recursive/commitments/log.md:182-186`).

## Health assessment

Operational health is mixed.

The good news is that the stop rule and review process are functioning: unsafe loops are being halted, and the system is no longer pretending a blocked PR is ready. The bad news is that the system is still spending real session time on work that does not move the tracker snapshot or restore the eval loop.

## Task assessment

- `#0225`: valid, partially satisfied. Overseer has proven it can trim the queue, but the queue-trend / task-limit behavior is not yet durable.
- `#0226`: partially satisfied. The "never uses oversee/strategize/achieve" claim is no longer true, but the scheduling behavior is still ad hoc enough that it should not be treated as fully closed.
- `#0228`: should be superseded. The sessions_since_eval signal and rerun behavior now exist, but the remaining blocker is the nested-Claude eval path captured in `#0277`.

## Ranked 2-3 session plan

1. **Queue triage session**: run OVERSEE on runtime-state tasks first. Close or supersede stale follow-ups, especially the queue-hygiene family around `#0225`, and record the before/after queue count. This is the fastest way to make the next report show real movement instead of flat churn.
2. **Blocked eval fix session**: work the narrower successor to `#0277`. Keep the fix scoped to the eval path, make the temp runtime handling symlink-safe, sanitize the child subprocess environment, and add a real Claude Code fallback regression. This is the highest-leverage unblock.
3. **Measurement or autonomy session**: if the eval fix lands and there is a fresh code delta, rerun Phractal immediately. If not, spend the session on autonomy work so the planner can distinguish "stale eval with delta" from "blocked eval path with no delta" and stop recommending wasted reruns.
2 changes: 1 addition & 1 deletion .recursive/tasks/.next-id
Original file line number Diff line number Diff line change
@@ -1 +1 @@
279
282
8 changes: 6 additions & 2 deletions .recursive/tasks/0228.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: pending
status: done
priority: normal
target:
created: 2026-04-08
source: github-issue-209
completed:
completed: 2026-04-09
---

# Brain never re-runs eval after building nightshift
Expand All @@ -25,3 +25,7 @@ Add a rule to brain.md: after every 3 build sessions, the next session MUST incl
- Dashboard alerts when eval is 3+ sessions stale
- Brain includes eval rerun in its delegation when the alert fires
- Eval score trend is visible in the dashboard

## Note

Superseded by `#0242` for the signal/alert work and `#0277` for the remaining Claude Code eval-path blocker.
26 changes: 26 additions & 0 deletions .recursive/tasks/0279.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
status: pending
priority: normal
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-09
source: strategy-report-0139
completed:
---

# Triage the runtime-state queue and close superseded follow-ups

## Problem

The queue is still carrying runtime-state follow-ups that have already been functionally covered by later sessions or merged PRs. Session #0122 proved OVERSEE can reduce the queue, but the queue rebounds because stale follow-ups stay open and duplicate scope is not consolidated.

## Fix

Run an OVERSEE pass focused on `.recursive/tasks/` and close or supersede tasks whose acceptance criteria are already satisfied by sessions #0122-#0139. Keep the remaining open items narrow: one root cause, one owner, one next step.

## Acceptance Criteria

- [ ] Tasks fully covered by later sessions or PRs are marked `done` with a superseded note
- [ ] Remaining open tasks in the #0225-#0228 family have an explicit next action
- [ ] Queue before/after counts are recorded in the handoff
- [ ] No duplicate stale follow-ups remain open for the same root cause
27 changes: 27 additions & 0 deletions .recursive/tasks/0280.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
status: pending
priority: urgent
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-09
source: strategy-report-0139
completed:
---

# Unblock the Claude eval path with a narrower eval-runner fix

## Problem

Task `#0277` is blocked after two failed review cycles. The blocker is no longer "run another eval" but the nested-Claude execution path: the child subprocess needs symlink-safe temp handling, a sanitized environment, and a real end-to-end fallback regression before another scorable rerun is worth attempting.

## Fix

Scope the fix to the eval path only. Keep the shared CLI surface unchanged unless the fallback requires a clearly justified runner-level change. Make the child runtime safe, minimize inherited environment variables, and cover the real Claude Code fallback path with regression tests.

## Acceptance Criteria

- [ ] `nightshift test --agent claude --cycles 2 --cycle-minutes 5` completes from a Claude Code shell
- [ ] Temp runtime handling is symlink-safe and ownership-safe
- [ ] The child eval subprocess inherits a minimal, explicit environment
- [ ] Regression coverage proves the real fallback path works end-to-end
- [ ] The rerun yields a scorable report instead of stopping after two agent failures
26 changes: 26 additions & 0 deletions .recursive/tasks/0281.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
status: pending
priority: normal
target: v0.0.9
vision_section: meta-prompt
created: 2026-04-09
source: strategy-report-0139
completed:
---

# Teach the planner to distinguish stale eval from blocked eval

## Problem

The dashboard can report that the last merged eval is stale, but the planner still lacks a first-class distinction between "stale with usable delta" and "blocked eval path with no delta." That gap risks wasting another session on a rerun that cannot measure anything.

## Fix

Add planner or signal logic that treats a blocked eval path as a different state from a stale eval. When there is no new `nightshift/` delta since the last merged eval, the next session should route to queue triage or autonomy work instead of automatically re-running Phractal.

## Acceptance Criteria

- [ ] Planner / dashboard state distinguishes `stale_eval_with_delta` from `eval_blocked_no_delta`
- [ ] The next session does not recommend a fresh eval rerun when there is no code delta and the eval path is blocked
- [ ] Queue triage or autonomy work is recommended instead in that state
- [ ] The planner still recommends a fresh eval rerun once a real code delta exists again
Loading