v1.2.0 — Industry Benchmarks & 6 New Safety Shields
What's New
6 New Safety Shields (19 → 24 modules)
- Data Exfiltration Guard — Catches LLMs smuggling data out via base64, steganography, URLs
- Privilege Escalation Detector — Stops agents from going rogue (tool requests, instruction modification, self-delegation)
- Prompt Fuzzer — Built-in red-teaming with 5 attack categories and 8 mutation strategies
- Unicode Shield — Detects zero-width chars, homoglyphs, bidi overrides, tag character attacks
- HITL Policy Gate — Configurable human approval workflows for high-risk actions
- Compliance Reporter — Auto-generates SOC2/HIPAA/GDPR compliance reports
Industry Benchmarks
Evaluated against 10 industry datasets + 2 synthetic benchmarks (5,100+ samples):
| Benchmark | Best F1 | Module |
|---|---|---|
| AdvBench | 95.8% | Combined |
| HarmBench | 94.7% | Combined |
| Exfiltration | 100.0% | Exfiltration Guard |
| Unicode Injection | 95.4% | Unicode Shield |
| Fuzzer Self-Test | 91.7% | Combined |
| ToxiGen | 73.8% | Toxicity ML |
| TruthfulQA | 72.5% | Grounding |
| HaluEval | 71.8% | Grounding |
| JailbreakBench | 71.6% | Combined |
Module Upgrades
- ML Shield: 65 → 175 training examples, threshold 0.85 → 0.65
- Toxicity: Built-in TF-IDF classifier (181 toxic + 111 safe examples)
- Grounding: TF-IDF semantic similarity, character n-grams, stemmed overlap
- Shield: 13 new harmful content request patterns
Benchmark Infrastructure
- 12 dataset adapters with stratified sampling
- E2E runner for 9 models across OpenAI, Anthropic, Google
- Baseline comparisons (OpenAI Moderation, Perspective API, LlamaGuard)
- 3 CI/CD workflows (smoke PR gate, industry weekly, E2E release)
- FP rate as headline metric in all reporting
Other
- Google GenAI migration (`google-genai` replaces deprecated `google-generativeai`)
- HITL Gate rewritten with deterministic risk mapping
- New optional dep: `pip install agentarmor[benchmarks]` for industry dataset evaluation
Install
```bash
pip install agentarmor==1.2.0
pip install agentarmor[all]==1.2.0 # All providers + ML
```
Full Changelog: v1.1.0...v1.2.0