Merged
Conversation
- scripts/watchdog.sh: auto-restarts daemon on crash, rate-limits
to 5 restarts/hour, sleeps 1 hour if fundamentally broken
- Restored all 37 wontfixed tasks — the agent does the work, not skip it
- evolve.md Step 6o: removed queue cap, replaced with judgment-based rule
("create only what matters, never to fill a quota")
- Merged #74 into #63 (genuine duplicate)
fazxes
added a commit
that referenced
this pull request
Apr 6, 2026
- docs/strategy/2026-04-06.md: full strategy review covering 70 sessions, 15 PRs, cost analysis (Sonnet $2.30/test vs Opus $5.09/test), and prompt health. Key finding: eval gate deadlocked on stale 53/100. - Pentest validation: all 3 prompt-alert changes confirmed legitimate (PR #168). Finding #172 (eval fabrication) and #125 (clean-state) already tracked. New task #176 for autonomy first-match bug. - Task #176: fix read_latest_autonomy_score() to use re.findall()[-1] - Task #177: re-run Step 0 eval to unblock the BUILD eval gate - Task #178: fix cost classifier to recognize role-based session types
fazxes
added a commit
that referenced
this pull request
Apr 7, 2026
…detection Pentest fix (fix-now): total_cost() now sums sessions[*].total_cost_usd instead of trusting the cached root field. docs/sessions/ is gitignored so a pre-written costs.json survives git clean -fd; a poisoned total_cost_usd could stop the daemon after one real session or disable budget enforcement entirely. Task #125: score_clean_state() now penalizes dirty clones. parse_shift_artifacts() captures git status --short output after the shift; the scorer deducts 2 points when untracked or modified files remain (exit=0, halt=clean but dirty repo was scoring 10/10). Restructured scoring: exit-code (0-4) + halt (0-4) + clean (0-2). 7 new regression tests (3 poisoning, 5 dirty-clone, -1 updated).
fazxes
added a commit
that referenced
this pull request
Apr 7, 2026
…-dirty-clone-0125 fix: close costs.json budget-stop poisoning and dirty-clone eval detection (#125)
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Watchdog auto-restarts daemon. Removed artificial task caps — agent uses judgment.