Releases · Nick-is-building/ast-guard

31 May 08:05

v2.0.0

5fe291e

Latest

v2.0.0 — Pre-Execution Gate

The biggest release since the project started. ast-guard has evolved from a Python pair-comparison tool into a multi-language pre-execution gate evaluated against 81,515 real agent code blocks from frontier models.

What's New

Check 6 — Behavioral Risk Scoring

A new additive risk scoring engine for standalone analysis, inspired by YARA, Bandit, and Semgrep. Rather than binary blocklists, each code block accumulates a risk score based on detected patterns. 20+ named patterns across five tiers — from safe exclusions to critical blocks.

Detected patterns include: test file manipulation, monkey-patching (time.time = lambda, module.func = stub), process termination tricks (sys.exit(0)), stack introspection (inspect.currentframe, sys._getframe), dunder method hijacking (eq returning True), timer spoofing, PATH hijacking, LD_PRELOAD injection, sandbox escape patterns, and answer extraction via file traversal.

scan_standalone() — Standalone Analysis Mode

New public API function for analyzing single agent code blocks without a baseline. Designed for autonomous agent loops, trajectory analysis, and any context where an original reference is unavailable.

from ast_guard import scan_standalone
result = scan_standalone(agent_code)
print(result["verdict"])  # CLEAN / WARNING / CRITICAL

Multi-Language Engine

Python (native ast, zero dependencies), Bash, and JavaScript via tree-sitter. Language auto-detected from shebang, keywords, and syntax patterns. All languages route through the same check pipeline via a unified metric interface.

pip install ast-guard[multilang]

Cross-Benchmark Evaluation Framework

Seven benchmark loaders (MALT, TRACE, EvilGenie, Countdown-Code, Helff Gaming Verifiers, School of Reward Hacks, SpecBench stub), a unified CodePair format, and a cross-benchmark runner.

Structural Benchmark

36 curated ground-truth code pairs across 12 structural hack categories. Hand-crafted from documented real-world patterns (METR o3 evaluation, TRACE taxonomy, EvilGenie, Terminal Wrench). 100% F1, 4.7ms mean scan time.

Scientific Documentation

benchmarks/RESULTS.md — precision, recall, F1, confusion matrices across all datasets
benchmarks/METHODOLOGY.md — full 6-iteration calibration history (ablation study)
benchmarks/data/iteration_log.json — structured iteration data for reproducibility

Evaluation Results

Benchmark	Samples	Key Metric	Value
Structural Benchmark (curated ground truth)	36	F1	100%
TRACE taxonomy (Deshpande et al. 2026)	33	F1	95.7%
School of Reward Hacks (longtermrisk)	26	Recall	96.2%
Countdown-Code (Khan et al.)	15,894	True Negative Rate	99.0%
MALT (METR) — 81,515 agent code blocks	81,515	Specificity	78.5%
MALT — hardcoded_solution category	429	Detection Rate	72.0%
MALT — bypass_constraints category	2,379	Detection Rate	44.3%

Performance: 4.7ms mean per scan. ~210 samples/second. Zero API cost.

Detection Gap Fixes (from v1.3.x)

Multi-level alias detection: chained aliases (g = f; h = g), tuple unpacking (a, b = print, eval), dict dispatch (d = {"e": eval}; d"e")
chr() obfuscation via aliases and builtins["chr"]
resolve_call_name bare-attr collision fix — eliminated false positives from dynamic method names

Full Changelog

See CHANGELOG.md for complete details.

Assets 2

28 May 05:55

Nick-is-building

1.3.0

50500d1

v1.3.0 — Extensional Enumeration Detection & Detection Gap Fixes

What's New

Check 5 — Extensional Enumeration Detector

Detects functions that replace algorithmic logic with exhaustive input-output mappings (compare-return chains). Based on the extensional enumeration pattern identified by Helff et al. ("LLMs Gaming Verifiers", arXiv:2604.15149).

Flags functions where ≥70% of if-statements are if x == Const: return Const with minimal loop logic
WARNING individually, CRITICAL in combination with Check 1 or Check 2
Configurable thresholds: enumeration_ratio, enumeration_min_ifs
Supports match/case (Python 3.10+)

Detection Gap Fixes

Check 2 rename bypass closed: File-level complexity fallback when function names differ between original and generated code
builtins.eval detection: builtins module now recognized alongside __builtins__
Syntax-error telemetry fix: No longer corrupts telemetry data on unparseable generated code
Exception handling: except SyntaxError instead of except Exception — analyzer bugs now propagate correctly

GitHub Action

Native SARIF scanning action for CI/CD pipelines with optional upload to GitHub Security tab:

- uses: Nick-is-building/ast-guard/.github/actions/ast-guard@main
  with:
    original: original.py
    generated: optimized.py
    mode: strict
    upload-sarif: "true"

Assets 2

25 May 23:04

Nick-is-building

1.2.0

a0ff03a

ast-guard v1.2.0 — Constant Folding, SARIF Output, Enhanced Obfuscation Detection

What's New

Constant Folding for Obfuscation Detection

resolve_constant_string() recursively resolves string concatenation via ast.BinOp(ast.Add)
Catches patterns like __builtins__['ev' + 'al'] that previously evaded Check 3

New Anti-Obfuscation Paths

__builtins__.__dict__['eval'] — Attribute chain to __dict__ now detected
getattr(globals()['__builtins__'], 'eval') — Subscript on globals() as first argument now detected
Centralized via _is_builtins_reference() helper

Complexity Floor for Small Functions

New complexity_abs_min threshold (default: 5)
Check 2 only fires when original complexity meets minimum floor
Prevents false positives on legitimate simplifications of small functions (e.g., complexity 3→1)

Set-Literal-Size Allowlist Blocker

New set_literal_max threshold (default: 15)
Data Structure Swap allowlist override is blocked when a set literal exceeds this size
Catches precomputed lookup tables disguised as data structure optimizations

SARIF v2.1.0 Output

New --sarif CLI flag produces SARIF v2.1.0 output
Compatible with github/codeql-action/upload-sarif for GitHub Security Tab integration
4 rule definitions mapping to the 4 core checks
Includes artifact references for both original and generated files

Additional

All remaining German docstrings translated to English
14 new tests for v1.2 features
57 tests total across all modules
Version bumped to 1.2.0

Assets 2

25 May 23:03

Nick-is-building

1.1.0

a0ff03a

ast-guard v1.1.0 — MCP Server, TRACE Benchmark, FailProofAI Integration

What's New

MCP Server

Integrated Model Context Protocol server directly into ast-guard (ast_guard/mcp_server.py)
Two MCP tools: ast_guard_scan and ast_guard_feedback
Optional dependency: pip install ast-guard[mcp]
Entry point: ast-guard-mcp command
8 dedicated MCP tests

TRACE Benchmark Suite

22 hacked samples across 14 TRACE subcategories
8 clean/benign samples
Results: 90.9% detection rate, 100% precision, 0% false positives
CLI runner with --json and --verbose options

FailProofAI Integration Proposal

Issue #375 with working policy prototype using PreToolUse hook
deny() on CRITICAL, instruct() on WARNING, allow() on CLEAN

Codebase

Full English translation (code, comments, docstrings, category names)
Complete README rewrite with benchmarks, MCP docs, and related work
GitHub Actions CI across Python 3.11, 3.12, 3.13
43 tests across all modules

Assets 2

21 May 19:29

Nick-is-building

v1.0.0

f7505e7

ast-guard v1.0.0 — First Public Release

The world's first deterministic reward hacking detector for LLM-generated Python code.

What is ast-guard?

When LLMs autonomously generate and test code, they cheat — hardcoding outputs, replacing algorithms with lookup tables, or manipulating test environments. ast-guard catches this structurally via AST analysis, before the code ever runs.

Highlights

Four detection checks: Hardcoding (if-count, literal-count, long strings), Complexity Collapse, Forbidden Calls & Obfuscation, Import Drift
Three sensitivity modes: strict (blocks execution), standard (warnings only), audit (silent telemetry)
Zero dependencies: Pure Python standard library, works everywhere Python 3.11+ runs
Diff-based analysis: Only flags what's NEW in the generated code
Anti-obfuscation: Catches variable aliasing, __builtins__ access, getattr tricks, chr() encoding
Allowlist-aware: Recognizes legitimate optimizations (comprehensions, built-ins, data structure swaps)
Built-in telemetry: Anonymized local metrics collection for community-driven threshold calibration
Privacy by design: Never stores code, filenames, or timestamps

Quick Start

git clone https://github.com/Nick-is-building/ast-guard.git
cd ast-guard
python -m pytest tests/ -v  # 35 tests, all passing

Requirements

Python 3.11+
No pip install needed

Assets 2

Releases: Nick-is-building/ast-guard

v2.0.0 — Pre-Execution Gate

v2.0.0 — Pre-Execution Gate

What's New

Evaluation Results

Detection Gap Fixes (from v1.3.x)

Full Changelog

Uh oh!

v1.3.0 — Extensional Enumeration Detection & Detection Gap Fixes

What's New

Check 5 — Extensional Enumeration Detector

Detection Gap Fixes

GitHub Action

Uh oh!

ast-guard v1.2.0 — Constant Folding, SARIF Output, Enhanced Obfuscation Detection

What's New

Constant Folding for Obfuscation Detection

New Anti-Obfuscation Paths

Complexity Floor for Small Functions

Set-Literal-Size Allowlist Blocker

SARIF v2.1.0 Output

Additional

Uh oh!

ast-guard v1.1.0 — MCP Server, TRACE Benchmark, FailProofAI Integration

What's New

MCP Server

TRACE Benchmark Suite

FailProofAI Integration Proposal

Codebase

Uh oh!

ast-guard v1.0.0 — First Public Release

What is ast-guard?

Highlights

Quick Start

Requirements

Uh oh!