Skip to content

v2.0.0 — Pre-Execution Gate

Latest

Choose a tag to compare

@Nick-is-building Nick-is-building released this 31 May 08:05
· 126 commits to main since this release

v2.0.0 — Pre-Execution Gate

The biggest release since the project started. ast-guard has evolved from a Python pair-comparison tool into a multi-language pre-execution gate evaluated against 81,515 real agent code blocks from frontier models.


What's New

Check 6 — Behavioral Risk Scoring

A new additive risk scoring engine for standalone analysis, inspired by YARA, Bandit, and Semgrep. Rather than binary blocklists, each code block accumulates a risk score based on detected patterns. 20+ named patterns across five tiers — from safe exclusions to critical blocks.

Detected patterns include: test file manipulation, monkey-patching (time.time = lambda, module.func = stub), process termination tricks (sys.exit(0)), stack introspection (inspect.currentframe, sys._getframe), dunder method hijacking (eq returning True), timer spoofing, PATH hijacking, LD_PRELOAD injection, sandbox escape patterns, and answer extraction via file traversal.

scan_standalone() — Standalone Analysis Mode

New public API function for analyzing single agent code blocks without a baseline. Designed for autonomous agent loops, trajectory analysis, and any context where an original reference is unavailable.

from ast_guard import scan_standalone
result = scan_standalone(agent_code)
print(result["verdict"])  # CLEAN / WARNING / CRITICAL

Multi-Language Engine

Python (native ast, zero dependencies), Bash, and JavaScript via tree-sitter. Language auto-detected from shebang, keywords, and syntax patterns. All languages route through the same check pipeline via a unified metric interface.

pip install ast-guard[multilang]

Cross-Benchmark Evaluation Framework

Seven benchmark loaders (MALT, TRACE, EvilGenie, Countdown-Code, Helff Gaming Verifiers, School of Reward Hacks, SpecBench stub), a unified CodePair format, and a cross-benchmark runner.

Structural Benchmark

36 curated ground-truth code pairs across 12 structural hack categories. Hand-crafted from documented real-world patterns (METR o3 evaluation, TRACE taxonomy, EvilGenie, Terminal Wrench). 100% F1, 4.7ms mean scan time.

Scientific Documentation

  • benchmarks/RESULTS.md — precision, recall, F1, confusion matrices across all datasets
  • benchmarks/METHODOLOGY.md — full 6-iteration calibration history (ablation study)
  • benchmarks/data/iteration_log.json — structured iteration data for reproducibility

Evaluation Results

Benchmark Samples Key Metric Value
Structural Benchmark (curated ground truth) 36 F1 100%
TRACE taxonomy (Deshpande et al. 2026) 33 F1 95.7%
School of Reward Hacks (longtermrisk) 26 Recall 96.2%
Countdown-Code (Khan et al.) 15,894 True Negative Rate 99.0%
MALT (METR) — 81,515 agent code blocks 81,515 Specificity 78.5%
MALT — hardcoded_solution category 429 Detection Rate 72.0%
MALT — bypass_constraints category 2,379 Detection Rate 44.3%

Performance: 4.7ms mean per scan. ~210 samples/second. Zero API cost.


Detection Gap Fixes (from v1.3.x)

  • Multi-level alias detection: chained aliases (g = f; h = g), tuple unpacking (a, b = print, eval), dict dispatch (d = {"e": eval}; d"e")
  • chr() obfuscation via aliases and builtins["chr"]
  • resolve_call_name bare-attr collision fix — eliminated false positives from dynamic method names

Full Changelog

See CHANGELOG.md for complete details.