Skip to content

v1.2.0 — Industry Benchmarks & 6 New Safety Shields

Choose a tag to compare

@ankitlade12 ankitlade12 released this 01 Apr 03:46
· 157 commits to main since this release

What's New

6 New Safety Shields (19 → 24 modules)

  • Data Exfiltration Guard — Catches LLMs smuggling data out via base64, steganography, URLs
  • Privilege Escalation Detector — Stops agents from going rogue (tool requests, instruction modification, self-delegation)
  • Prompt Fuzzer — Built-in red-teaming with 5 attack categories and 8 mutation strategies
  • Unicode Shield — Detects zero-width chars, homoglyphs, bidi overrides, tag character attacks
  • HITL Policy Gate — Configurable human approval workflows for high-risk actions
  • Compliance Reporter — Auto-generates SOC2/HIPAA/GDPR compliance reports

Industry Benchmarks

Evaluated against 10 industry datasets + 2 synthetic benchmarks (5,100+ samples):

Benchmark Best F1 Module
AdvBench 95.8% Combined
HarmBench 94.7% Combined
Exfiltration 100.0% Exfiltration Guard
Unicode Injection 95.4% Unicode Shield
Fuzzer Self-Test 91.7% Combined
ToxiGen 73.8% Toxicity ML
TruthfulQA 72.5% Grounding
HaluEval 71.8% Grounding
JailbreakBench 71.6% Combined

Module Upgrades

  • ML Shield: 65 → 175 training examples, threshold 0.85 → 0.65
  • Toxicity: Built-in TF-IDF classifier (181 toxic + 111 safe examples)
  • Grounding: TF-IDF semantic similarity, character n-grams, stemmed overlap
  • Shield: 13 new harmful content request patterns

Benchmark Infrastructure

  • 12 dataset adapters with stratified sampling
  • E2E runner for 9 models across OpenAI, Anthropic, Google
  • Baseline comparisons (OpenAI Moderation, Perspective API, LlamaGuard)
  • 3 CI/CD workflows (smoke PR gate, industry weekly, E2E release)
  • FP rate as headline metric in all reporting

Other

  • Google GenAI migration (`google-genai` replaces deprecated `google-generativeai`)
  • HITL Gate rewritten with deterministic risk mapping
  • New optional dep: `pip install agentarmor[benchmarks]` for industry dataset evaluation

Install

```bash
pip install agentarmor==1.2.0
pip install agentarmor[all]==1.2.0 # All providers + ML
```

Full Changelog: v1.1.0...v1.2.0