Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .recursive/tasks/0077.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: wontfix
status: done
priority: normal
target: v0.0.8
vision_section: none
created: 2026-04-05
completed: 2026-04-06
completed: 2026-04-09
---

# Fix dependency flow overlap between code-reviewer and architecture-reviewer
Expand Down
8 changes: 6 additions & 2 deletions .recursive/tasks/0078.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: pending
status: done
priority: normal
target: v0.0.8
vision_section: none
created: 2026-04-05
completed:
completed: 2026-04-09
---

# Add live integration test gate to PR review pipeline
Expand All @@ -21,3 +21,7 @@ Source: code review advisory note on PR #63.
- [ ] Step provides examples of what "real test" means for different PR types (Python module, shell script, config change)
- [ ] The test is proportional to the PR -- small PRs get a simple smoke test, large PRs get more thorough verification
- [ ] The builder runs the test and includes the output in the PR body or merge commit

## Closed by OVERSEE — Obsolete

This task references "evolve.md Step 8" which does not exist. The current `evolve/SKILL.md` has 6 steps (Step 6 is PR + Merge + Handoff). The task also references the "multi-agent review panel" from PR #63, which was replaced by the unified review architecture (PR #107). The concept of running a live integration test before merge remains valid, but this task's specific instructions target non-existent infrastructure. If the integration test gate concept is still wanted, it should be filed as a new task targeting the current `build/SKILL.md` review step with updated context.
4 changes: 2 additions & 2 deletions .recursive/tasks/0080.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: wontfix
status: done
priority: low
target: v0.0.9
vision_section: meta-prompt
created: 2026-04-05
completed:
completed: 2026-04-09
---

# Unify path-based file categorization across modules
Expand Down
4 changes: 2 additions & 2 deletions .recursive/tasks/0107.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: wontfix
status: done
priority: normal
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-05
completed:
completed: 2026-04-09
---

# Activate reviewer daemon cadence and durable review indexing
Expand Down
4 changes: 2 additions & 2 deletions .recursive/tasks/0111.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
status: wontfix
status: done
priority: low
target: v0.0.8
vision_section: none
created: 2026-04-05
source: pr-86-review
completed:
completed: 2026-04-09
---

# Module map should distinguish late imports from hard dependencies
Expand Down
4 changes: 2 additions & 2 deletions .recursive/tasks/0115.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: wontfix
status: done
priority: low
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-05
completed:
completed: 2026-04-09
source: pr-88-review
---

Expand Down
2 changes: 1 addition & 1 deletion .recursive/tasks/0119.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
status: wontfix
status: done
priority: normal
target: v0.0.8
vision_section: none
Expand Down
6 changes: 6 additions & 0 deletions .recursive/tasks/0122.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,10 @@ README contracts.
- [ ] A validation check fails if README advertises a bare `nightshift` command without a shipped console script
- [ ] README config guidance is checked against `.recursive.json`, or the README is generated/sourced so that drift cannot happen silently
- [ ] README snapshot values are generated from canonical sources or validated against them so tracker/test/module counts cannot silently rot
- [ ] `scripts/validate-docs.sh` (or equivalent in `make check`) fails when the latest handoff, tracker, and README disagree on shared snapshot values such as test counts or section percentages (merged from #0124)
- [ ] Shared snapshot values are sourced from canonical data or validated together instead of being maintained independently
- [ ] The new README consistency rule runs in an existing verification path such as `make check` or `bash scripts/validate-docs.sh`

## Note

Task #0124 (validate tracker and handoff snapshots alongside README) was merged into this task by OVERSEE on 2026-04-09. Both cover doc snapshot validation and belong in the same PR.
8 changes: 6 additions & 2 deletions .recursive/tasks/0124.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: pending
status: done
priority: normal
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-05
completed:
completed: 2026-04-09
---

# Validate tracker and handoff snapshot values alongside README snapshots
Expand All @@ -19,3 +19,7 @@ tracker snapshot values first-class consistency targets.
- [ ] `scripts/validate-docs.sh` (or an equivalent check in `make check`) fails when the latest handoff, tracker, and README disagree on shared snapshot values such as test counts or section percentages
- [ ] Shared snapshot values are sourced from canonical data or validated together instead of being maintained independently
- [ ] Regression coverage exists for at least one stale handoff/tracker/README snapshot mismatch

## Closed by OVERSEE — Merged into #0122

Task #0122 (README consistency checks) already covers the same scope: validating that docs agree on shared snapshot values. #0124 adds tracker and handoff snapshot values to the validation surface, but this is naturally included in the same validation pass. A builder implementing #0122 should extend the validation to cover tracker and handoff snapshots (as described in #0124's AC) in the same PR rather than a separate one.
4 changes: 2 additions & 2 deletions .recursive/tasks/0127.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: wontfix
status: done
priority: low
target: v0.0.9
vision_section: none
created: 2026-04-05
completed:
completed: 2026-04-09
---

# Keep task-frontmatter parsing aligned across queue scripts
Expand Down
4 changes: 2 additions & 2 deletions .recursive/tasks/0129.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
status: wontfix
status: done
priority: normal
target: v0.0.9
vision_section: meta-prompt
created: 2026-04-05
source: github-issue-102
completed: 2026-04-06
completed: 2026-04-09
---

# Task queue cap: stop generating tasks when 50+ pending
Expand Down
4 changes: 2 additions & 2 deletions .recursive/tasks/0134.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
status: wontfix
status: done
priority: low
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-05
source: review-pr-106-docs
completed:
completed: 2026-04-09
---

# Keep module-map key symbols aligned with verifier-related helper changes
Expand Down
6 changes: 6 additions & 0 deletions .recursive/tasks/0162.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,9 @@ This means fix quality is scored against a superset of what was counted in disco
## Acceptance Criteria
- [ ] `score_discovery` counts fixes from rejected cycles even when some accepted cycles also exist, OR the behavior is explicitly documented as intentional with a comment explaining the tradeoff
- [ ] A regression test covers a mixed accepted+rejected run and verifies the fix counts are consistent between discovery and fix-quality scorers
- [ ] A test covers `_extract_cycle_fixes` with an accepted cycle where `fixes: []` and verifies it returns `[]` without falling through to `cycle_result` data (merged from #0163)
- [ ] Test suite remains green after additions

## Note

Task #0163 (empty fixes list test) was merged into this task by OVERSEE on 2026-04-09. Both tests belong in the same PR targeting the scoring module.
8 changes: 6 additions & 2 deletions .recursive/tasks/0163.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: pending
status: done
priority: low
target: v0.0.9
vision_section: loop1
created: 2026-04-06
completed:
completed: 2026-04-09
---

# Missing test for accepted cycle with empty fixes list in _extract_cycle_fixes
Expand All @@ -14,3 +14,7 @@ Code review for PR #158 flagged a missing test case: when a cycle has `fixes: []
## Acceptance Criteria
- [ ] A test in `TestScoreFixQuality` or a dedicated `TestExtractCycleFixes` class covers an accepted cycle with `fixes: []` and verifies `_extract_cycle_fixes` returns `[]` (not falling through to any `cycle_result` data)
- [ ] Test suite remains green after the addition

## Closed by OVERSEE — Merged into #0162

Both #0162 and #0163 are small test additions for the scoring module (`score_discovery`/`score_fix_quality`/`_extract_cycle_fixes`). They address the same PR #158 code review and would naturally be completed in one PR. The `_extract_cycle_fixes` empty-fixes test case (#0163) should be added alongside the mixed accepted+rejected run test (#0162 AC) when a builder picks #0162.
6 changes: 6 additions & 0 deletions .recursive/tasks/0173.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ to existing files.

- [ ] `nightshift/scripts/run.sh` and `nightshift/scripts/test.sh` are added to `PROMPT_GUARD_FILES`
in `Recursive/engine/lib-agent.sh`
- [ ] `.recursive.json` is added to `PROMPT_GUARD_FILES` in `Recursive/engine/lib-agent.sh`
(merged from #0196 — defense-in-depth against eval_frequency/verify_command tampering)
- [ ] `bash -n Recursive/engine/lib-agent.sh` passes
- [ ] Existing prompt-guard test coverage still passes
- [ ] `make check` passes

## Note

Task #0196 (add .recursive.json to PROMPT_GUARD_FILES) was merged into this task by OVERSEE on 2026-04-09. Both are single-line additions to the same array in `lib-agent.sh`.
8 changes: 8 additions & 0 deletions .recursive/tasks/0174.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,13 @@ a `text` block or a `tool_result`) does NOT trigger a false positive.
- [ ] Add a regression test: a log where "not logged in" appears in a
non-`result`, non-`agent_message` event body returns exit code 1
from `is_auth_failure`
- [ ] Add a regression test: a log file with a corrupt non-JSON line followed by a valid
`type:result` auth-failure event returns exit code 0 from `is_auth_failure`
(covers malformed-JSON handling from merged task #0175)
- [ ] The test class `TestAuthFailureDetection` in `test_nightshift.py`
is the right place (PR #168 context)

## Note

Task #0175 (malformed JSON test) was merged into this task by OVERSEE on 2026-04-09.
Both test cases belong in the same PR against `TestAuthFailureDetection`.
8 changes: 6 additions & 2 deletions .recursive/tasks/0175.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: pending
status: done
priority: normal
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-06
completed:
completed: 2026-04-09
---

# Add test: is_auth_failure handles malformed JSON lines before a valid auth result
Expand All @@ -18,3 +18,7 @@ to skip malformed lines. This path has no explicit regression test.
`type:result` auth-failure event returns exit code 0 from `is_auth_failure`
- [ ] The test class `TestAuthFailureDetection` in `test_nightshift.py`
is the right place (PR #168 context)

## Closed by OVERSEE — Merged into #0174

Both #0174 and #0175 add tests to `TestAuthFailureDetection` in `test_nightshift.py` and will naturally be done in the same PR. The malformed-JSON test case (#0175 AC) is a straightforward additional test case that belongs alongside the non-result-event test case (#0174 AC). A builder picking #0174 should include the #0175 test case in the same commit.
6 changes: 6 additions & 0 deletions .recursive/tasks/0179.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,10 @@ def test_crlf_eval_file_accepted(self) -> None:
## Acceptance Criteria

- [ ] `TestIsValidEvalFile` has a CRLF test that passes
- [ ] In `read_latest_eval_score()` in `pick-role.py`, when `_is_valid_eval_file()` rejects a file, print a warning to stderr including the filename (merged from #0180)
- [ ] A rejected eval file triggers the warning; existing rejection behavior (returns None) is unchanged
- [ ] `make check` passes

## Note

Task #0180 (log warning on `_is_valid_eval_file()` rejection) was merged into this task by OVERSEE on 2026-04-09. Both touch the same function and originated from the same PR #170 review.
8 changes: 6 additions & 2 deletions .recursive/tasks/0180.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
status: pending
status: done
priority: low
target: v0.0.8
vision_section: self-maintaining
created: 2026-04-06
source: review-pr-170
completed:
completed: 2026-04-09
---

# Log warning when _is_valid_eval_file() rejects an eval file in pick-role.py
Expand Down Expand Up @@ -37,3 +37,7 @@ if not _is_valid_eval_file(text):
- [ ] The warning appears in the ROLE DECISION stderr block visible in daemon logs
- [ ] Existing `TestEvalFileValidation` tests still pass (rejection still returns None)
- [ ] `make check` passes

## Closed by OVERSEE — Merged into #0179

Both #0179 (CRLF regression test for `_is_valid_eval_file()`) and #0180 (log warning when `_is_valid_eval_file()` rejects) touch the same function in `pick-role.py`. They are both follow-ups from the same PR #170 review and will naturally be done in a single PR. A builder picking #0179 should add the stderr warning from #0180 in the same change.
7 changes: 6 additions & 1 deletion .recursive/tasks/0196.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
---
status: pending
status: done
priority: low
target: v0.0.9
vision_section: self-maintaining
created: 2026-04-07
completed: 2026-04-09
source: pentest
---

Expand All @@ -28,3 +29,7 @@ Still worth guarding for defense-in-depth.
- [ ] `bash -n Recursive/engine/lib-agent.sh` passes
- [ ] Existing prompt-guard tests still pass
- [ ] `make check` passes

## Closed by OVERSEE — Merged into #0173

Both #0173 (add scripts/run.sh and scripts/test.sh to PROMPT_GUARD_FILES) and #0196 (add .recursive.json to PROMPT_GUARD_FILES) make static additions to the `PROMPT_GUARD_FILES` array in `lib-agent.sh`. They are one-line changes in the same array and will naturally be done in a single PR. A builder picking #0173 should add `.recursive.json` at the same time.
8 changes: 6 additions & 2 deletions .recursive/tasks/0230.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: pending
status: done
priority: low
target:
created: 2026-04-08
completed: 2026-04-09
source: pr-230-review
completed:
---

# Keep _DELEGATION_ROLE_MAP in sync with available sub-agents
Expand All @@ -15,3 +15,7 @@ Meta-reviewer advisory note on PR #230: if the brain ever delegates a new sub-ag
- [ ] When a new sub-agent type is added to `.recursive/agents/`, `_DELEGATION_ROLE_MAP` is documented as a required update in the new-agent checklist or OPERATIONS.md
- [ ] Alternatively, add a test that verifies all agent definition files in `.recursive/agents/` have a corresponding entry in `_DELEGATION_ROLE_MAP`
- [ ] `make check` passes

## Closed by OVERSEE — Low value / speculative

`_DELEGATION_ROLE_MAP` currently covers all 8 delegatable agent types (build, review, oversee, strategize, achieve, security-check, evolve, audit). The system has been stable with these 8 roles for the entire v2 era. New agent types are rare architectural events that require significant framework changes (new SKILL.md, new agent definition, new operator directory) — at that point, updating a 10-line dict in signals.py is trivial and will be obvious. The documentation overhead of tracking this in a checklist or test is not worth the maintenance cost for an event that may never occur.
Loading