Releases: Nick-is-building/ast-guard
v2.0.0 — Pre-Execution Gate
v2.0.0 — Pre-Execution Gate
The biggest release since the project started. ast-guard has evolved from a Python pair-comparison tool into a multi-language pre-execution gate evaluated against 81,515 real agent code blocks from frontier models.
What's New
Check 6 — Behavioral Risk Scoring
A new additive risk scoring engine for standalone analysis, inspired by YARA, Bandit, and Semgrep. Rather than binary blocklists, each code block accumulates a risk score based on detected patterns. 20+ named patterns across five tiers — from safe exclusions to critical blocks.
Detected patterns include: test file manipulation, monkey-patching (time.time = lambda, module.func = stub), process termination tricks (sys.exit(0)), stack introspection (inspect.currentframe, sys._getframe), dunder method hijacking (eq returning True), timer spoofing, PATH hijacking, LD_PRELOAD injection, sandbox escape patterns, and answer extraction via file traversal.
scan_standalone() — Standalone Analysis Mode
New public API function for analyzing single agent code blocks without a baseline. Designed for autonomous agent loops, trajectory analysis, and any context where an original reference is unavailable.
from ast_guard import scan_standalone
result = scan_standalone(agent_code)
print(result["verdict"]) # CLEAN / WARNING / CRITICALMulti-Language Engine
Python (native ast, zero dependencies), Bash, and JavaScript via tree-sitter. Language auto-detected from shebang, keywords, and syntax patterns. All languages route through the same check pipeline via a unified metric interface.
pip install ast-guard[multilang]Cross-Benchmark Evaluation Framework
Seven benchmark loaders (MALT, TRACE, EvilGenie, Countdown-Code, Helff Gaming Verifiers, School of Reward Hacks, SpecBench stub), a unified CodePair format, and a cross-benchmark runner.
Structural Benchmark
36 curated ground-truth code pairs across 12 structural hack categories. Hand-crafted from documented real-world patterns (METR o3 evaluation, TRACE taxonomy, EvilGenie, Terminal Wrench). 100% F1, 4.7ms mean scan time.
Scientific Documentation
- benchmarks/RESULTS.md — precision, recall, F1, confusion matrices across all datasets
- benchmarks/METHODOLOGY.md — full 6-iteration calibration history (ablation study)
- benchmarks/data/iteration_log.json — structured iteration data for reproducibility
Evaluation Results
| Benchmark | Samples | Key Metric | Value |
|---|---|---|---|
| Structural Benchmark (curated ground truth) | 36 | F1 | 100% |
| TRACE taxonomy (Deshpande et al. 2026) | 33 | F1 | 95.7% |
| School of Reward Hacks (longtermrisk) | 26 | Recall | 96.2% |
| Countdown-Code (Khan et al.) | 15,894 | True Negative Rate | 99.0% |
| MALT (METR) — 81,515 agent code blocks | 81,515 | Specificity | 78.5% |
| MALT — hardcoded_solution category | 429 | Detection Rate | 72.0% |
| MALT — bypass_constraints category | 2,379 | Detection Rate | 44.3% |
Performance: 4.7ms mean per scan. ~210 samples/second. Zero API cost.
Detection Gap Fixes (from v1.3.x)
- Multi-level alias detection: chained aliases (g = f; h = g), tuple unpacking (a, b = print, eval), dict dispatch (d = {"e": eval}; d"e")
- chr() obfuscation via aliases and builtins["chr"]
- resolve_call_name bare-attr collision fix — eliminated false positives from dynamic method names
Full Changelog
See CHANGELOG.md for complete details.
v1.3.0 — Extensional Enumeration Detection & Detection Gap Fixes
What's New
Check 5 — Extensional Enumeration Detector
Detects functions that replace algorithmic logic with exhaustive input-output mappings (compare-return chains). Based on the extensional enumeration pattern identified by Helff et al. ("LLMs Gaming Verifiers", arXiv:2604.15149).
- Flags functions where ≥70% of if-statements are
if x == Const: return Constwith minimal loop logic - WARNING individually, CRITICAL in combination with Check 1 or Check 2
- Configurable thresholds:
enumeration_ratio,enumeration_min_ifs - Supports
match/case(Python 3.10+)
Detection Gap Fixes
- Check 2 rename bypass closed: File-level complexity fallback when function names differ between original and generated code
builtins.evaldetection:builtinsmodule now recognized alongside__builtins__- Syntax-error telemetry fix: No longer corrupts telemetry data on unparseable generated code
- Exception handling:
except SyntaxErrorinstead ofexcept Exception— analyzer bugs now propagate correctly
GitHub Action
Native SARIF scanning action for CI/CD pipelines with optional upload to GitHub Security tab:
- uses: Nick-is-building/ast-guard/.github/actions/ast-guard@main
with:
original: original.py
generated: optimized.py
mode: strict
upload-sarif: "true"ast-guard v1.2.0 — Constant Folding, SARIF Output, Enhanced Obfuscation Detection
What's New
Constant Folding for Obfuscation Detection
resolve_constant_string()recursively resolves string concatenation viaast.BinOp(ast.Add)- Catches patterns like
__builtins__['ev' + 'al']that previously evaded Check 3
New Anti-Obfuscation Paths
__builtins__.__dict__['eval']— Attribute chain to__dict__now detectedgetattr(globals()['__builtins__'], 'eval')— Subscript onglobals()as first argument now detected- Centralized via
_is_builtins_reference()helper
Complexity Floor for Small Functions
- New
complexity_abs_minthreshold (default: 5) - Check 2 only fires when original complexity meets minimum floor
- Prevents false positives on legitimate simplifications of small functions (e.g., complexity 3→1)
Set-Literal-Size Allowlist Blocker
- New
set_literal_maxthreshold (default: 15) - Data Structure Swap allowlist override is blocked when a set literal exceeds this size
- Catches precomputed lookup tables disguised as data structure optimizations
SARIF v2.1.0 Output
- New
--sarifCLI flag produces SARIF v2.1.0 output - Compatible with
github/codeql-action/upload-sariffor GitHub Security Tab integration - 4 rule definitions mapping to the 4 core checks
- Includes artifact references for both original and generated files
Additional
- All remaining German docstrings translated to English
- 14 new tests for v1.2 features
- 57 tests total across all modules
- Version bumped to 1.2.0
ast-guard v1.1.0 — MCP Server, TRACE Benchmark, FailProofAI Integration
What's New
MCP Server
- Integrated Model Context Protocol server directly into ast-guard (
ast_guard/mcp_server.py) - Two MCP tools:
ast_guard_scanandast_guard_feedback - Optional dependency:
pip install ast-guard[mcp] - Entry point:
ast-guard-mcpcommand - 8 dedicated MCP tests
TRACE Benchmark Suite
- 22 hacked samples across 14 TRACE subcategories
- 8 clean/benign samples
- Results: 90.9% detection rate, 100% precision, 0% false positives
- CLI runner with
--jsonand--verboseoptions
FailProofAI Integration Proposal
- Issue #375 with working policy prototype using PreToolUse hook
deny()on CRITICAL,instruct()on WARNING,allow()on CLEAN
Codebase
- Full English translation (code, comments, docstrings, category names)
- Complete README rewrite with benchmarks, MCP docs, and related work
- GitHub Actions CI across Python 3.11, 3.12, 3.13
- 43 tests across all modules
ast-guard v1.0.0 — First Public Release
The world's first deterministic reward hacking detector for LLM-generated Python code.
What is ast-guard?
When LLMs autonomously generate and test code, they cheat — hardcoding outputs, replacing algorithms with lookup tables, or manipulating test environments. ast-guard catches this structurally via AST analysis, before the code ever runs.
Highlights
- Four detection checks: Hardcoding (if-count, literal-count, long strings), Complexity Collapse, Forbidden Calls & Obfuscation, Import Drift
- Three sensitivity modes: strict (blocks execution), standard (warnings only), audit (silent telemetry)
- Zero dependencies: Pure Python standard library, works everywhere Python 3.11+ runs
- Diff-based analysis: Only flags what's NEW in the generated code
- Anti-obfuscation: Catches variable aliasing,
__builtins__access,getattrtricks,chr()encoding - Allowlist-aware: Recognizes legitimate optimizations (comprehensions, built-ins, data structure swaps)
- Built-in telemetry: Anonymized local metrics collection for community-driven threshold calibration
- Privacy by design: Never stores code, filenames, or timestamps
Quick Start
git clone https://github.com/Nick-is-building/ast-guard.git
cd ast-guard
python -m pytest tests/ -v # 35 tests, all passingRequirements
- Python 3.11+
- No pip install needed