Release v2.0.0 — Pre-Execution Gate · Nick-is-building/ast-guard

v2.0.0 — Pre-Execution Gate

The biggest release since the project started. ast-guard has evolved from a Python pair-comparison tool into a multi-language pre-execution gate evaluated against 81,515 real agent code blocks from frontier models.

What's New

Check 6 — Behavioral Risk Scoring

A new additive risk scoring engine for standalone analysis, inspired by YARA, Bandit, and Semgrep. Rather than binary blocklists, each code block accumulates a risk score based on detected patterns. 20+ named patterns across five tiers — from safe exclusions to critical blocks.

Detected patterns include: test file manipulation, monkey-patching (time.time = lambda, module.func = stub), process termination tricks (sys.exit(0)), stack introspection (inspect.currentframe, sys._getframe), dunder method hijacking (eq returning True), timer spoofing, PATH hijacking, LD_PRELOAD injection, sandbox escape patterns, and answer extraction via file traversal.

scan_standalone() — Standalone Analysis Mode

New public API function for analyzing single agent code blocks without a baseline. Designed for autonomous agent loops, trajectory analysis, and any context where an original reference is unavailable.

from ast_guard import scan_standalone
result = scan_standalone(agent_code)
print(result["verdict"])  # CLEAN / WARNING / CRITICAL

Multi-Language Engine

Python (native ast, zero dependencies), Bash, and JavaScript via tree-sitter. Language auto-detected from shebang, keywords, and syntax patterns. All languages route through the same check pipeline via a unified metric interface.

pip install ast-guard[multilang]

Cross-Benchmark Evaluation Framework

Seven benchmark loaders (MALT, TRACE, EvilGenie, Countdown-Code, Helff Gaming Verifiers, School of Reward Hacks, SpecBench stub), a unified CodePair format, and a cross-benchmark runner.

Structural Benchmark

36 curated ground-truth code pairs across 12 structural hack categories. Hand-crafted from documented real-world patterns (METR o3 evaluation, TRACE taxonomy, EvilGenie, Terminal Wrench). 100% F1, 4.7ms mean scan time.

Scientific Documentation

benchmarks/RESULTS.md — precision, recall, F1, confusion matrices across all datasets
benchmarks/METHODOLOGY.md — full 6-iteration calibration history (ablation study)
benchmarks/data/iteration_log.json — structured iteration data for reproducibility

Evaluation Results

Benchmark	Samples	Key Metric	Value
Structural Benchmark (curated ground truth)	36	F1	100%
TRACE taxonomy (Deshpande et al. 2026)	33	F1	95.7%
School of Reward Hacks (longtermrisk)	26	Recall	96.2%
Countdown-Code (Khan et al.)	15,894	True Negative Rate	99.0%
MALT (METR) — 81,515 agent code blocks	81,515	Specificity	78.5%
MALT — hardcoded_solution category	429	Detection Rate	72.0%
MALT — bypass_constraints category	2,379	Detection Rate	44.3%

Performance: 4.7ms mean per scan. ~210 samples/second. Zero API cost.

Detection Gap Fixes (from v1.3.x)

Multi-level alias detection: chained aliases (g = f; h = g), tuple unpacking (a, b = print, eval), dict dispatch (d = {"e": eval}; d"e")
chr() obfuscation via aliases and builtins["chr"]
resolve_call_name bare-attr collision fix — eliminated false positives from dynamic method names

Full Changelog

See CHANGELOG.md for complete details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0 — Pre-Execution Gate

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v2.0.0 — Pre-Execution Gate

What's New

Evaluation Results

Detection Gap Fixes (from v1.3.x)

Full Changelog

Uh oh!