fix: pentest security hardening -- autonomy first-match, eval validation, unified.md guard by fazxes · Pull Request #170 · Recusive/Nightshift

fazxes · 2026-04-06T22:54:18Z

Summary

Autonomy first-match bug (fix: eliminate python3 -c shell injection + CODEX_THINKING validation (#0045 #0189) #176): read_latest_autonomy_score() returned the first TOTAL: match (baseline 76) instead of the last (updated 81). Fix: re.findall()[-1]. Over-scheduled ACHIEVE by up to 30 points.
Eval fallback regex removed (security: close open_pr_data injection vector in daemon.sh sanitizers (#0182) #172): Loose fallback Total.*?(\d+)/100 accepted fabricated files like Total: 99/100. Added _is_valid_eval_file() content validator (requires **Date**: + 3 dimension rows); removed fallback entirely.
unified.md guard gap: docs/prompt/unified.md was absent from PROMPT_GUARD_FILES. Added. (Not loaded by any active script but was an inconsistency in the security model.)

All three confirmed real from the pre-build pentest scan. No false positives.

Test plan

11 new tests: TestEvalFileValidation (5), TestIsValidEvalFile (4), TestReadAutonomyScore (2 new)
Existing TestReadEvalScore fixtures updated to use valid eval content
make check passes: 1097 tests, ruff clean, mypy strict, dry-runs both agents

…ion, unified.md guard Three confirmed pentest findings from the pre-build scan: 1. read_latest_autonomy_score() used re.search(), returning the first TOTAL: match. ACHIEVE reports include both a baseline and an updated TOTAL; the first-match bug returned the lower baseline score (76 today, as low as 67 historically), over-scheduling ACHIEVE by up to 30 points. Fix: re.findall()[-1] returns the last (updated) score. (task #176) 2. read_latest_eval_score() had a broad fallback regex (Total.*?(\d+)/100) that accepted any file containing "Total: 99/100" or even prose like "Total overhead is 12/100 cases". A fabricated single-line file could bypass the eval gate entirely. Fix: removed fallback; added _is_valid_eval_file() requiring **Date**: and >= 3 dimension rows (N/10). Files failing validation return None. (task #172) 3. docs/prompt/unified.md existed as a prompt-like control file but was absent from PROMPT_GUARD_FILES in lib-agent.sh. Added. (pentest finding) 11 new tests. 1097 total. All checks pass.

…otes

fazxes added 2 commits April 6, 2026 18:54

docs: add review follow-up tasks #179 #180 #181 from PR #170 review n…

2f813ac

…otes

fazxes merged commit 94f8e60 into main Apr 6, 2026

fazxes deleted the fix/pentest-security-hardening-0094 branch April 6, 2026 22:57

fazxes mentioned this pull request Apr 9, 2026

oversee: triage task queue — close 16 duplicates and superseded tasks #252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pentest security hardening -- autonomy first-match, eval validation, unified.md guard#170

fix: pentest security hardening -- autonomy first-match, eval validation, unified.md guard#170
fazxes merged 2 commits intomainfrom
fix/pentest-security-hardening-0094

fazxes commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fazxes commented Apr 6, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant