Skip to content

feat: watchdog + natural task creation#125

Merged
fazxes merged 1 commit intomainfrom
feat/watchdog-and-triage
Apr 6, 2026
Merged

feat: watchdog + natural task creation#125
fazxes merged 1 commit intomainfrom
feat/watchdog-and-triage

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 6, 2026

Watchdog auto-restarts daemon. Removed artificial task caps — agent uses judgment.

- scripts/watchdog.sh: auto-restarts daemon on crash, rate-limits
  to 5 restarts/hour, sleeps 1 hour if fundamentally broken
- Restored all 37 wontfixed tasks — the agent does the work, not skip it
- evolve.md Step 6o: removed queue cap, replaced with judgment-based rule
  ("create only what matters, never to fill a quota")
- Merged #74 into #63 (genuine duplicate)
@fazxes fazxes merged commit 0511a85 into main Apr 6, 2026
@fazxes fazxes deleted the feat/watchdog-and-triage branch April 6, 2026 07:36
fazxes added a commit that referenced this pull request Apr 6, 2026
- docs/strategy/2026-04-06.md: full strategy review covering 70 sessions,
  15 PRs, cost analysis (Sonnet $2.30/test vs Opus $5.09/test), and prompt
  health. Key finding: eval gate deadlocked on stale 53/100.
- Pentest validation: all 3 prompt-alert changes confirmed legitimate (PR #168).
  Finding #172 (eval fabrication) and #125 (clean-state) already tracked.
  New task #176 for autonomy first-match bug.
- Task #176: fix read_latest_autonomy_score() to use re.findall()[-1]
- Task #177: re-run Step 0 eval to unblock the BUILD eval gate
- Task #178: fix cost classifier to recognize role-based session types
fazxes added a commit that referenced this pull request Apr 7, 2026
…detection

Pentest fix (fix-now): total_cost() now sums sessions[*].total_cost_usd instead
of trusting the cached root field. docs/sessions/ is gitignored so a pre-written
costs.json survives git clean -fd; a poisoned total_cost_usd could stop the daemon
after one real session or disable budget enforcement entirely.

Task #125: score_clean_state() now penalizes dirty clones. parse_shift_artifacts()
captures git status --short output after the shift; the scorer deducts 2 points
when untracked or modified files remain (exit=0, halt=clean but dirty repo was
scoring 10/10). Restructured scoring: exit-code (0-4) + halt (0-4) + clean (0-2).

7 new regression tests (3 poisoning, 5 dirty-clone, -1 updated).
fazxes added a commit that referenced this pull request Apr 7, 2026
…-dirty-clone-0125

fix: close costs.json budget-stop poisoning and dirty-clone eval detection (#125)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant