fix: pentest guard gaps + eval scorer rejected-cycle fallback by fazxes · Pull Request #158 · Recusive/Nightshift

fazxes · 2026-04-06T18:33:46Z

Summary

Security: Added scripts/watchdog.sh, scripts/daemon-strategist.sh, scripts/daemon-review.sh, and scripts/daemon-overseer.sh to PROMPT_GUARD_FILES in lib-agent.sh. These scripts invoke agents or control restart rate-limiting but were absent from both the working-tree guard and the origin-integrity check, per the pre-build pentest scan.
Fix (task Task queue cap: stop generating tasks when 50+ pending #102): Evaluation scorers (score_discovery, score_fix_quality, score_usefulness) now fall back to nested cycle_result data for rejected cycles when aggregate counters stayed at zero. All-rejected-cycle runs no longer score 0 for discovery/usefulness when real fixes exist in the state JSON.
Added 2 helper functions _extract_cycle_fixes() / _extract_cycle_issues() + 5 regression tests.

Test plan

make check passes: 1016 tests, all green
New tests: test_rejected_cycle_fixes_counted, test_rejected_cycle_with_real_title_gets_quality_points, test_rejected_cycle_fix_quality_scored, test_rejected_cycle_usefulness_counted
Pentest: 4 scripts now in PROMPT_GUARD_FILES; comm on guard list will no longer show them as absent

Security (pentest finding): - Added scripts/watchdog.sh to PROMPT_GUARD_FILES in lib-agent.sh. watchdog.sh controls daemon restart rate-limiting and invokes daemon.sh directly but was absent from both the working-tree guard and the origin-integrity check. - Added scripts/daemon-strategist.sh, scripts/daemon-review.sh, and scripts/daemon-overseer.sh to PROMPT_GUARD_FILES. These legacy scripts source lib-agent.sh and invoke agents but were unguarded. Fix (task #102): - score_discovery(), score_fix_quality(), score_usefulness() in evaluation.py now fall back to nested cycle_result data for rejected cycles when aggregate counters stayed at zero. Previously all-rejected runs scored 0 for discovery/usefulness even when real fixes existed. - Added _extract_cycle_fixes() and _extract_cycle_issues() helpers that transparently handle accepted (top-level fixes) and rejected (cycle_result nesting) cycle data. - Added 5 regression tests: rejected-cycle fix counting, title-quality bonus, fix-quality scoring, and usefulness fallback. Tests: 1016 passing (+4 new).

…llow-up tasks - Fixed handoff 0082: added tracker delta, learnings applied, generated tasks sections - Fixed handoff 0082: corrected test count claim (4 tests, not 5) - Added learning: PROMPT_GUARD_FILES must cover all agent-invoking scripts - Created task #162: score_discovery/score_fix_quality asymmetry in mixed runs - Created task #163: missing test for accepted cycle with empty fixes list - Fixed changelog: corrected '5 regression tests' to '4', added [test] internal entry

…one) Queue before: 72 pending + 9 wontfix-in-active-dir Queue after: 65 pending + 0 wontfix (all converted to done for archiving) Merged into primary tasks (5 closures): - #175 -> #174: both add tests to TestAuthFailureDetection, same PR - #163 -> #162: both are scoring module tests from PR #158 review, same PR - #124 -> #122: both validate doc snapshot consistency, same PR scope - #196 -> #173: both add entries to PROMPT_GUARD_FILES in lib-agent.sh - #180 -> #179: both touch _is_valid_eval_file() in pick-role.py, same PR Closed as obsolete (1): - #78: references non-existent "evolve.md Step 8" and the multi-agent review panel replaced by unified review in PR #107 Closed as low-value (1): - #230: _DELEGATION_ROLE_MAP covers all 8 current agent types; new agent types require major framework work making the map update obvious Converted wontfix -> done for archiving (9): - #77, #80, #107, #111, #115, #119, #127, #129, #134 All had wontfix status with rationale already documented; changed to done so daemon's archive_done_tasks() housekeeping removes them

fazxes added 2 commits April 6, 2026 14:33

fazxes merged commit 825f9f6 into main Apr 6, 2026
1 check passed

fazxes deleted the fix/pentest-guard-plus-eval-scorer branch April 6, 2026 18:47

fazxes mentioned this pull request Apr 8, 2026

eval: Evaluation #0016 -- score 86/100, eval gate clear #198

Merged

4 tasks

fazxes mentioned this pull request Apr 9, 2026

oversee: triage task queue — close 16 duplicates and superseded tasks #252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pentest guard gaps + eval scorer rejected-cycle fallback#158

fix: pentest guard gaps + eval scorer rejected-cycle fallback#158
fazxes merged 2 commits intomainfrom
fix/pentest-guard-plus-eval-scorer

fazxes commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fazxes commented Apr 6, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant