Skip to content

fix: pentest guard gaps + eval scorer rejected-cycle fallback#158

Merged
fazxes merged 2 commits intomainfrom
fix/pentest-guard-plus-eval-scorer
Apr 6, 2026
Merged

fix: pentest guard gaps + eval scorer rejected-cycle fallback#158
fazxes merged 2 commits intomainfrom
fix/pentest-guard-plus-eval-scorer

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 6, 2026

Summary

  • Security: Added scripts/watchdog.sh, scripts/daemon-strategist.sh, scripts/daemon-review.sh, and scripts/daemon-overseer.sh to PROMPT_GUARD_FILES in lib-agent.sh. These scripts invoke agents or control restart rate-limiting but were absent from both the working-tree guard and the origin-integrity check, per the pre-build pentest scan.
  • Fix (task Task queue cap: stop generating tasks when 50+ pending #102): Evaluation scorers (score_discovery, score_fix_quality, score_usefulness) now fall back to nested cycle_result data for rejected cycles when aggregate counters stayed at zero. All-rejected-cycle runs no longer score 0 for discovery/usefulness when real fixes exist in the state JSON.
  • Added 2 helper functions _extract_cycle_fixes() / _extract_cycle_issues() + 5 regression tests.

Test plan

  • make check passes: 1016 tests, all green
  • New tests: test_rejected_cycle_fixes_counted, test_rejected_cycle_with_real_title_gets_quality_points, test_rejected_cycle_fix_quality_scored, test_rejected_cycle_usefulness_counted
  • Pentest: 4 scripts now in PROMPT_GUARD_FILES; comm on guard list will no longer show them as absent

fazxes added 2 commits April 6, 2026 14:33
Security (pentest finding):
- Added scripts/watchdog.sh to PROMPT_GUARD_FILES in lib-agent.sh.
  watchdog.sh controls daemon restart rate-limiting and invokes daemon.sh
  directly but was absent from both the working-tree guard and the
  origin-integrity check.
- Added scripts/daemon-strategist.sh, scripts/daemon-review.sh, and
  scripts/daemon-overseer.sh to PROMPT_GUARD_FILES. These legacy scripts
  source lib-agent.sh and invoke agents but were unguarded.

Fix (task #102):
- score_discovery(), score_fix_quality(), score_usefulness() in
  evaluation.py now fall back to nested cycle_result data for rejected
  cycles when aggregate counters stayed at zero. Previously all-rejected
  runs scored 0 for discovery/usefulness even when real fixes existed.
- Added _extract_cycle_fixes() and _extract_cycle_issues() helpers that
  transparently handle accepted (top-level fixes) and rejected (cycle_result
  nesting) cycle data.
- Added 5 regression tests: rejected-cycle fix counting, title-quality
  bonus, fix-quality scoring, and usefulness fallback.

Tests: 1016 passing (+4 new).
…llow-up tasks

- Fixed handoff 0082: added tracker delta, learnings applied, generated tasks sections
- Fixed handoff 0082: corrected test count claim (4 tests, not 5)
- Added learning: PROMPT_GUARD_FILES must cover all agent-invoking scripts
- Created task #162: score_discovery/score_fix_quality asymmetry in mixed runs
- Created task #163: missing test for accepted cycle with empty fixes list
- Fixed changelog: corrected '5 regression tests' to '4', added [test] internal entry
@fazxes fazxes merged commit 825f9f6 into main Apr 6, 2026
1 check passed
@fazxes fazxes deleted the fix/pentest-guard-plus-eval-scorer branch April 6, 2026 18:47
fazxes added a commit that referenced this pull request Apr 9, 2026
…one)

Queue before: 72 pending + 9 wontfix-in-active-dir
Queue after: 65 pending + 0 wontfix (all converted to done for archiving)

Merged into primary tasks (5 closures):
- #175 -> #174: both add tests to TestAuthFailureDetection, same PR
- #163 -> #162: both are scoring module tests from PR #158 review, same PR
- #124 -> #122: both validate doc snapshot consistency, same PR scope
- #196 -> #173: both add entries to PROMPT_GUARD_FILES in lib-agent.sh
- #180 -> #179: both touch _is_valid_eval_file() in pick-role.py, same PR

Closed as obsolete (1):
- #78: references non-existent "evolve.md Step 8" and the multi-agent
  review panel replaced by unified review in PR #107

Closed as low-value (1):
- #230: _DELEGATION_ROLE_MAP covers all 8 current agent types; new agent
  types require major framework work making the map update obvious

Converted wontfix -> done for archiving (9):
- #77, #80, #107, #111, #115, #119, #127, #129, #134
  All had wontfix status with rationale already documented; changed to
  done so daemon's archive_done_tasks() housekeeping removes them
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant