Conversation
…ion, unified.md guard Three confirmed pentest findings from the pre-build scan: 1. read_latest_autonomy_score() used re.search(), returning the first TOTAL: match. ACHIEVE reports include both a baseline and an updated TOTAL; the first-match bug returned the lower baseline score (76 today, as low as 67 historically), over-scheduling ACHIEVE by up to 30 points. Fix: re.findall()[-1] returns the last (updated) score. (task #176) 2. read_latest_eval_score() had a broad fallback regex (Total.*?(\d+)/100) that accepted any file containing "Total: 99/100" or even prose like "Total overhead is 12/100 cases". A fabricated single-line file could bypass the eval gate entirely. Fix: removed fallback; added _is_valid_eval_file() requiring **Date**: and >= 3 dimension rows (N/10). Files failing validation return None. (task #172) 3. docs/prompt/unified.md existed as a prompt-like control file but was absent from PROMPT_GUARD_FILES in lib-agent.sh. Added. (pentest finding) 11 new tests. 1097 total. All checks pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
read_latest_autonomy_score()returned the firstTOTAL:match (baseline 76) instead of the last (updated 81). Fix:re.findall()[-1]. Over-scheduled ACHIEVE by up to 30 points.Total.*?(\d+)/100accepted fabricated files likeTotal: 99/100. Added_is_valid_eval_file()content validator (requires**Date**:+ 3 dimension rows); removed fallback entirely.docs/prompt/unified.mdwas absent fromPROMPT_GUARD_FILES. Added. (Not loaded by any active script but was an inconsistency in the security model.)All three confirmed real from the pre-build pentest scan. No false positives.
Test plan
TestEvalFileValidation(5),TestIsValidEvalFile(4),TestReadAutonomyScore(2 new)TestReadEvalScorefixtures updated to use valid eval contentmake checkpasses: 1097 tests, ruff clean, mypy strict, dry-runs both agents