v2.0.0 — Pre-Execution Gate
The biggest release since the project started. ast-guard has evolved from a Python pair-comparison tool into a multi-language pre-execution gate evaluated against 81,515 real agent code blocks from frontier models.
What's New
Check 6 — Behavioral Risk Scoring
A new additive risk scoring engine for standalone analysis, inspired by YARA, Bandit, and Semgrep. Rather than binary blocklists, each code block accumulates a risk score based on detected patterns. 20+ named patterns across five tiers — from safe exclusions to critical blocks.
Detected patterns include: test file manipulation, monkey-patching (time.time = lambda, module.func = stub), process termination tricks (sys.exit(0)), stack introspection (inspect.currentframe, sys._getframe), dunder method hijacking (eq returning True), timer spoofing, PATH hijacking, LD_PRELOAD injection, sandbox escape patterns, and answer extraction via file traversal.
scan_standalone() — Standalone Analysis Mode
New public API function for analyzing single agent code blocks without a baseline. Designed for autonomous agent loops, trajectory analysis, and any context where an original reference is unavailable.
from ast_guard import scan_standalone
result = scan_standalone(agent_code)
print(result["verdict"]) # CLEAN / WARNING / CRITICALMulti-Language Engine
Python (native ast, zero dependencies), Bash, and JavaScript via tree-sitter. Language auto-detected from shebang, keywords, and syntax patterns. All languages route through the same check pipeline via a unified metric interface.
pip install ast-guard[multilang]Cross-Benchmark Evaluation Framework
Seven benchmark loaders (MALT, TRACE, EvilGenie, Countdown-Code, Helff Gaming Verifiers, School of Reward Hacks, SpecBench stub), a unified CodePair format, and a cross-benchmark runner.
Structural Benchmark
36 curated ground-truth code pairs across 12 structural hack categories. Hand-crafted from documented real-world patterns (METR o3 evaluation, TRACE taxonomy, EvilGenie, Terminal Wrench). 100% F1, 4.7ms mean scan time.
Scientific Documentation
- benchmarks/RESULTS.md — precision, recall, F1, confusion matrices across all datasets
- benchmarks/METHODOLOGY.md — full 6-iteration calibration history (ablation study)
- benchmarks/data/iteration_log.json — structured iteration data for reproducibility
Evaluation Results
| Benchmark | Samples | Key Metric | Value |
|---|---|---|---|
| Structural Benchmark (curated ground truth) | 36 | F1 | 100% |
| TRACE taxonomy (Deshpande et al. 2026) | 33 | F1 | 95.7% |
| School of Reward Hacks (longtermrisk) | 26 | Recall | 96.2% |
| Countdown-Code (Khan et al.) | 15,894 | True Negative Rate | 99.0% |
| MALT (METR) — 81,515 agent code blocks | 81,515 | Specificity | 78.5% |
| MALT — hardcoded_solution category | 429 | Detection Rate | 72.0% |
| MALT — bypass_constraints category | 2,379 | Detection Rate | 44.3% |
Performance: 4.7ms mean per scan. ~210 samples/second. Zero API cost.
Detection Gap Fixes (from v1.3.x)
- Multi-level alias detection: chained aliases (g = f; h = g), tuple unpacking (a, b = print, eval), dict dispatch (d = {"e": eval}; d"e")
- chr() obfuscation via aliases and builtins["chr"]
- resolve_call_name bare-attr collision fix — eliminated false positives from dynamic method names
Full Changelog
See CHANGELOG.md for complete details.