Skip to content

Releases: Nick-is-building/ast-guard

v2.0.0 — Pre-Execution Gate

31 May 08:05

Choose a tag to compare

v2.0.0 — Pre-Execution Gate

The biggest release since the project started. ast-guard has evolved from a Python pair-comparison tool into a multi-language pre-execution gate evaluated against 81,515 real agent code blocks from frontier models.


What's New

Check 6 — Behavioral Risk Scoring

A new additive risk scoring engine for standalone analysis, inspired by YARA, Bandit, and Semgrep. Rather than binary blocklists, each code block accumulates a risk score based on detected patterns. 20+ named patterns across five tiers — from safe exclusions to critical blocks.

Detected patterns include: test file manipulation, monkey-patching (time.time = lambda, module.func = stub), process termination tricks (sys.exit(0)), stack introspection (inspect.currentframe, sys._getframe), dunder method hijacking (eq returning True), timer spoofing, PATH hijacking, LD_PRELOAD injection, sandbox escape patterns, and answer extraction via file traversal.

scan_standalone() — Standalone Analysis Mode

New public API function for analyzing single agent code blocks without a baseline. Designed for autonomous agent loops, trajectory analysis, and any context where an original reference is unavailable.

from ast_guard import scan_standalone
result = scan_standalone(agent_code)
print(result["verdict"])  # CLEAN / WARNING / CRITICAL

Multi-Language Engine

Python (native ast, zero dependencies), Bash, and JavaScript via tree-sitter. Language auto-detected from shebang, keywords, and syntax patterns. All languages route through the same check pipeline via a unified metric interface.

pip install ast-guard[multilang]

Cross-Benchmark Evaluation Framework

Seven benchmark loaders (MALT, TRACE, EvilGenie, Countdown-Code, Helff Gaming Verifiers, School of Reward Hacks, SpecBench stub), a unified CodePair format, and a cross-benchmark runner.

Structural Benchmark

36 curated ground-truth code pairs across 12 structural hack categories. Hand-crafted from documented real-world patterns (METR o3 evaluation, TRACE taxonomy, EvilGenie, Terminal Wrench). 100% F1, 4.7ms mean scan time.

Scientific Documentation

  • benchmarks/RESULTS.md — precision, recall, F1, confusion matrices across all datasets
  • benchmarks/METHODOLOGY.md — full 6-iteration calibration history (ablation study)
  • benchmarks/data/iteration_log.json — structured iteration data for reproducibility

Evaluation Results

Benchmark Samples Key Metric Value
Structural Benchmark (curated ground truth) 36 F1 100%
TRACE taxonomy (Deshpande et al. 2026) 33 F1 95.7%
School of Reward Hacks (longtermrisk) 26 Recall 96.2%
Countdown-Code (Khan et al.) 15,894 True Negative Rate 99.0%
MALT (METR) — 81,515 agent code blocks 81,515 Specificity 78.5%
MALT — hardcoded_solution category 429 Detection Rate 72.0%
MALT — bypass_constraints category 2,379 Detection Rate 44.3%

Performance: 4.7ms mean per scan. ~210 samples/second. Zero API cost.


Detection Gap Fixes (from v1.3.x)

  • Multi-level alias detection: chained aliases (g = f; h = g), tuple unpacking (a, b = print, eval), dict dispatch (d = {"e": eval}; d"e")
  • chr() obfuscation via aliases and builtins["chr"]
  • resolve_call_name bare-attr collision fix — eliminated false positives from dynamic method names

Full Changelog

See CHANGELOG.md for complete details.

v1.3.0 — Extensional Enumeration Detection & Detection Gap Fixes

28 May 05:55

Choose a tag to compare

What's New

Check 5 — Extensional Enumeration Detector

Detects functions that replace algorithmic logic with exhaustive input-output mappings (compare-return chains). Based on the extensional enumeration pattern identified by Helff et al. ("LLMs Gaming Verifiers", arXiv:2604.15149).

  • Flags functions where ≥70% of if-statements are if x == Const: return Const with minimal loop logic
  • WARNING individually, CRITICAL in combination with Check 1 or Check 2
  • Configurable thresholds: enumeration_ratio, enumeration_min_ifs
  • Supports match/case (Python 3.10+)

Detection Gap Fixes

  • Check 2 rename bypass closed: File-level complexity fallback when function names differ between original and generated code
  • builtins.eval detection: builtins module now recognized alongside __builtins__
  • Syntax-error telemetry fix: No longer corrupts telemetry data on unparseable generated code
  • Exception handling: except SyntaxError instead of except Exception — analyzer bugs now propagate correctly

GitHub Action

Native SARIF scanning action for CI/CD pipelines with optional upload to GitHub Security tab:

- uses: Nick-is-building/ast-guard/.github/actions/ast-guard@main
  with:
    original: original.py
    generated: optimized.py
    mode: strict
    upload-sarif: "true"

ast-guard v1.2.0 — Constant Folding, SARIF Output, Enhanced Obfuscation Detection

25 May 23:04
a0ff03a

Choose a tag to compare

What's New

Constant Folding for Obfuscation Detection

  • resolve_constant_string() recursively resolves string concatenation via ast.BinOp(ast.Add)
  • Catches patterns like __builtins__['ev' + 'al'] that previously evaded Check 3

New Anti-Obfuscation Paths

  • __builtins__.__dict__['eval'] — Attribute chain to __dict__ now detected
  • getattr(globals()['__builtins__'], 'eval') — Subscript on globals() as first argument now detected
  • Centralized via _is_builtins_reference() helper

Complexity Floor for Small Functions

  • New complexity_abs_min threshold (default: 5)
  • Check 2 only fires when original complexity meets minimum floor
  • Prevents false positives on legitimate simplifications of small functions (e.g., complexity 3→1)

Set-Literal-Size Allowlist Blocker

  • New set_literal_max threshold (default: 15)
  • Data Structure Swap allowlist override is blocked when a set literal exceeds this size
  • Catches precomputed lookup tables disguised as data structure optimizations

SARIF v2.1.0 Output

  • New --sarif CLI flag produces SARIF v2.1.0 output
  • Compatible with github/codeql-action/upload-sarif for GitHub Security Tab integration
  • 4 rule definitions mapping to the 4 core checks
  • Includes artifact references for both original and generated files

Additional

  • All remaining German docstrings translated to English
  • 14 new tests for v1.2 features
  • 57 tests total across all modules
  • Version bumped to 1.2.0

ast-guard v1.1.0 — MCP Server, TRACE Benchmark, FailProofAI Integration

25 May 23:03
a0ff03a

Choose a tag to compare

What's New

MCP Server

  • Integrated Model Context Protocol server directly into ast-guard (ast_guard/mcp_server.py)
  • Two MCP tools: ast_guard_scan and ast_guard_feedback
  • Optional dependency: pip install ast-guard[mcp]
  • Entry point: ast-guard-mcp command
  • 8 dedicated MCP tests

TRACE Benchmark Suite

  • 22 hacked samples across 14 TRACE subcategories
  • 8 clean/benign samples
  • Results: 90.9% detection rate, 100% precision, 0% false positives
  • CLI runner with --json and --verbose options

FailProofAI Integration Proposal

  • Issue #375 with working policy prototype using PreToolUse hook
  • deny() on CRITICAL, instruct() on WARNING, allow() on CLEAN

Codebase

  • Full English translation (code, comments, docstrings, category names)
  • Complete README rewrite with benchmarks, MCP docs, and related work
  • GitHub Actions CI across Python 3.11, 3.12, 3.13
  • 43 tests across all modules

ast-guard v1.0.0 — First Public Release

21 May 19:29

Choose a tag to compare

The world's first deterministic reward hacking detector for LLM-generated Python code.

What is ast-guard?

When LLMs autonomously generate and test code, they cheat — hardcoding outputs, replacing algorithms with lookup tables, or manipulating test environments. ast-guard catches this structurally via AST analysis, before the code ever runs.

Highlights

  • Four detection checks: Hardcoding (if-count, literal-count, long strings), Complexity Collapse, Forbidden Calls & Obfuscation, Import Drift
  • Three sensitivity modes: strict (blocks execution), standard (warnings only), audit (silent telemetry)
  • Zero dependencies: Pure Python standard library, works everywhere Python 3.11+ runs
  • Diff-based analysis: Only flags what's NEW in the generated code
  • Anti-obfuscation: Catches variable aliasing, __builtins__ access, getattr tricks, chr() encoding
  • Allowlist-aware: Recognizes legitimate optimizations (comprehensions, built-ins, data structure swaps)
  • Built-in telemetry: Anonymized local metrics collection for community-driven threshold calibration
  • Privacy by design: Never stores code, filenames, or timestamps

Quick Start

git clone https://github.com/Nick-is-building/ast-guard.git
cd ast-guard
python -m pytest tests/ -v  # 35 tests, all passing

Requirements

  • Python 3.11+
  • No pip install needed