Skip to content

fix: pentest security hardening -- autonomy first-match, eval validation, unified.md guard#170

Merged
fazxes merged 2 commits intomainfrom
fix/pentest-security-hardening-0094
Apr 6, 2026
Merged

fix: pentest security hardening -- autonomy first-match, eval validation, unified.md guard#170
fazxes merged 2 commits intomainfrom
fix/pentest-security-hardening-0094

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 6, 2026

Summary

All three confirmed real from the pre-build pentest scan. No false positives.

Test plan

  • 11 new tests: TestEvalFileValidation (5), TestIsValidEvalFile (4), TestReadAutonomyScore (2 new)
  • Existing TestReadEvalScore fixtures updated to use valid eval content
  • make check passes: 1097 tests, ruff clean, mypy strict, dry-runs both agents

fazxes added 2 commits April 6, 2026 18:54
…ion, unified.md guard

Three confirmed pentest findings from the pre-build scan:

1. read_latest_autonomy_score() used re.search(), returning the first
   TOTAL: match. ACHIEVE reports include both a baseline and an updated
   TOTAL; the first-match bug returned the lower baseline score (76 today,
   as low as 67 historically), over-scheduling ACHIEVE by up to 30 points.
   Fix: re.findall()[-1] returns the last (updated) score. (task #176)

2. read_latest_eval_score() had a broad fallback regex (Total.*?(\d+)/100)
   that accepted any file containing "Total: 99/100" or even prose like
   "Total overhead is 12/100 cases". A fabricated single-line file could
   bypass the eval gate entirely. Fix: removed fallback; added
   _is_valid_eval_file() requiring **Date**: and >= 3 dimension rows (N/10).
   Files failing validation return None. (task #172)

3. docs/prompt/unified.md existed as a prompt-like control file but was
   absent from PROMPT_GUARD_FILES in lib-agent.sh. Added. (pentest finding)

11 new tests. 1097 total. All checks pass.
@fazxes fazxes merged commit 94f8e60 into main Apr 6, 2026
@fazxes fazxes deleted the fix/pentest-security-hardening-0094 branch April 6, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant