Skip to content

detectors: add Agent Threat Rules#1676

Open
eeee2345 wants to merge 3 commits intoNVIDIA:mainfrom
eeee2345:feat/atr-detectors
Open

detectors: add Agent Threat Rules#1676
eeee2345 wants to merge 3 commits intoNVIDIA:mainfrom
eeee2345:feat/atr-detectors

Conversation

@eeee2345
Copy link
Copy Markdown

@eeee2345 eeee2345 commented Apr 8, 2026

Summary

Adds ATR (Agent Threat Rules) detectors for AI agent-specific threats not covered by garak's existing injection/jailbreak detectors. Focuses on MCP tool poisoning, skill compromise, context exfiltration, and excessive autonomy.

What's included

Two files:

  • garak/detectors/atr.py -- 9 detector classes + rule sync + rule generation utilities
  • garak/detectors/atr_rules.json -- 714 regex patterns from 108 ATR rules (bundled, no runtime dependency)

9 detector classes:

Detector ATR Category Patterns Catches
atr.AgentThreats all 9 714 Comprehensive scan
atr.PromptInjection prompt-injection 198 Instruction overrides, persona hijacking
atr.ToolPoisoning tool-poisoning 64 Hidden instructions in tool descriptions
atr.CredentialExfiltration context-exfiltration 102 API keys, private keys, DB URLs
atr.PrivilegeEscalation privilege-escalation 68 Shell commands, permission changes
atr.SkillCompromise skill-compromise 148 Typosquatting, rug pulls, impersonation
atr.ExcessiveAutonomy excessive-autonomy 42 Retry loops, resource exhaustion
atr.AgentManipulation agent-manipulation 56 Cross-agent attacks, trust exploitation
atr.DataPoisoning data-poisoning 36 Poisoned content, injected instructions

Two utility functions:

  • sync_rules_from_github() -- pull latest ATR rules (requires git + PyYAML, optional)
  • generate_rule_from_probe() -- generate ATR rule drafts from successful garak probe outputs (red team -> blue team feedback loop)

Why this belongs in garak

garak's existing detectors cover LLM-level threats (jailbreaks, known-bad signatures, encoding tricks). ATR covers agent-level threats specific to the MCP/tool-use ecosystem:

  • Tool descriptions with hidden exfiltration instructions
  • Skill impersonation via typosquatted tool names
  • Rug pull attacks (tools that change behavior after trust is established)
  • Cross-agent manipulation in multi-agent systems

These are not redundant with existing garak detectors -- they scan for a different attack surface.

Provenance

ATR rules are MIT-licensed, community-driven, and adopted by Cisco AI Defense (34 rules merged into their official skill-scanner). The bundled JSON is a static snapshot; sync_rules_from_github() enables updates without waiting for garak releases.

Source: https://github.com/Agent-Threat-Rule/agent-threat-rules

Usage

# Use as detector in garak config
detectors:
  - atr.PromptInjection
  - atr.ToolPoisoning
  - atr.CredentialExfiltration

# Or scan everything
detectors:
  - atr.AgentThreats
# Sync to latest rules
from garak.detectors.atr import sync_rules_from_github
sync_rules_from_github()

# Generate rules from probe results
from garak.detectors.atr import generate_rule_from_probe
rule = generate_rule_from_probe(successful_attacks, category="tool-poisoning")

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

@leondz
Copy link
Copy Markdown
Collaborator

leondz commented Apr 8, 2026

please sign dco for review

Comment thread garak/detectors/atr.py Outdated
Comment on lines +40 to +67
def sync_rules_from_github(
repo: str = "Agent-Threat-Rule/agent-threat-rules",
branch: str = "main",
output: Path | None = None,
) -> int:
"""Fetch latest ATR rules from GitHub and update the bundled JSON.

Requires: git, PyYAML (pip install pyyaml).
Returns the number of patterns synced.

Usage::

from garak.detectors.atr import sync_rules_from_github
count = sync_rules_from_github()
print(f"Synced {count} patterns")
"""
import yaml # PyYAML -- optional dependency

dest = output or _RULES_PATH
with tempfile.TemporaryDirectory() as tmpdir:
subprocess.run(
["git", "clone", "--depth", "1", "-b", branch,
f"https://github.com/{repo}.git", tmpdir],
check=True, capture_output=True,
)
rules_dir = Path(tmpdir) / "rules"
if not rules_dir.exists():
raise FileNotFoundError(f"No rules/ directory in {repo}")
Copy link
Copy Markdown
Collaborator

@leondz leondz Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the data mechanism within garak

@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from ce52ef2 to c960ed7 Compare April 8, 2026 20:08
Signed-off-by: Panguard AI <support@panguard.ai>
Signed-off-by: eeee2345 <imadam4real@gmail.com>
@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from c960ed7 to 90c887d Compare April 8, 2026 20:11
@eeee2345
Copy link
Copy Markdown
Author

eeee2345 commented Apr 8, 2026

I have read the DCO Document and I hereby sign the DCO

github-actions bot added a commit that referenced this pull request Apr 8, 2026
@eeee2345
Copy link
Copy Markdown
Author

eeee2345 commented Apr 8, 2026

recheck

Copy link
Copy Markdown
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submission.

As an extra detector to mix into a run this could be useful. The helper methods that have been placed in the detector should likely be extracted as separate tooling.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +60 to +64
subprocess.run(
["git", "clone", "--depth", "1", "-b", branch,
f"https://github.com/{repo}.git", tmpdir],
check=True, capture_output=True,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subprocess.run is not an acceptable method of retrieving data at runtime.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +88 to +171
def generate_rule_from_probe(
probe_outputs: list[str],
category: str = "prompt-injection",
severity: str = "high",
min_common_length: int = 8,
) -> str:
"""Generate an ATR rule YAML draft from successful Garak probe outputs.

Takes a list of strings that bypassed defenses (successful attacks)
and extracts common substrings as detection patterns. Returns a
YAML rule string ready for review and submission to ATR.

This is a starting point -- generated rules should be reviewed by
a human before being added to the ATR ruleset.

Usage::

from garak.detectors.atr import generate_rule_from_probe
attacks = ["ignore previous instructions and ...", "forget all rules and ..."]
rule_yaml = generate_rule_from_probe(attacks, category="prompt-injection")
print(rule_yaml)
"""
if not probe_outputs:
return ""

# Extract keywords that appear in 50%+ of outputs
word_counts: dict[str, int] = {}
for text in probe_outputs:
words = set(re.findall(r"\b[a-zA-Z]{4,}\b", text.lower()))
for w in words:
word_counts[w] = word_counts.get(w, 0) + 1

threshold = max(2, len(probe_outputs) // 2)
common_words = sorted(
[w for w, c in word_counts.items() if c >= threshold],
key=lambda w: word_counts[w],
reverse=True,
)[:6]

if not common_words:
return ""

# Build regex pattern from common words
pattern = r"(?i)\b" + r"\b.*\b".join(re.escape(w) for w in common_words[:4]) + r"\b"

date = datetime.now().strftime("%Y/%m/%d")
rule_id = f"ATR-DRAFT-{hash(pattern) % 100000:05d}"

return f"""title: "Garak-generated: {common_words[0]} pattern"
id: {rule_id}
rule_version: 1
status: draft
description: >
Auto-generated from {len(probe_outputs)} successful Garak probe outputs.
Common keywords: {', '.join(common_words[:6])}.
REVIEW REQUIRED before adding to production ruleset.
author: "garak + ATR"
date: "{date}"
schema_version: "0.1"
detection_tier: pattern
maturity: experimental
severity: {severity}
tags:
category: {category}
subcategory: garak-generated
confidence: low
agent_source:
type: mcp_exchange
framework: [any]
provider: [any]
detection:
conditions:
- field: content
operator: regex
value: '{pattern}'
description: "Pattern from {len(probe_outputs)} Garak probe outputs"
condition: any
response:
actions: [alert]
test_cases:
true_positives:
- input: "{probe_outputs[0][:100].replace(chr(34), chr(39))}"
expected: triggered
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting utility for creating yaml files to contribute to defensive tooling. It is never called by code in this PR. I would suggest it should be extracted as a utility that could be used to post process either a report.jsonl or hitlog.jsonl. I could see this being placed in the tools path for use only in a repo based install or exposed as an analyze module to be available as a package provided utility shipped as part of installed package similar to how report_digest is exposed.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +40 to +85
def sync_rules_from_github(
repo: str = "Agent-Threat-Rule/agent-threat-rules",
branch: str = "main",
output: Path | None = None,
) -> int:
"""Fetch latest ATR rules from GitHub and update the bundled JSON.

Requires: git, PyYAML (pip install pyyaml).
Returns the number of patterns synced.

Usage::

from garak.detectors.atr import sync_rules_from_github
count = sync_rules_from_github()
print(f"Synced {count} patterns")
"""
import yaml # PyYAML -- optional dependency

dest = output or _RULES_PATH
with tempfile.TemporaryDirectory() as tmpdir:
subprocess.run(
["git", "clone", "--depth", "1", "-b", branch,
f"https://github.com/{repo}.git", tmpdir],
check=True, capture_output=True,
)
rules_dir = Path(tmpdir) / "rules"
if not rules_dir.exists():
raise FileNotFoundError(f"No rules/ directory in {repo}")

result: dict[str, list[list[str]]] = {}
for yaml_file in sorted(rules_dir.rglob("*.yaml")):
doc = yaml.safe_load(yaml_file.read_text())
if not doc or not doc.get("detection", {}).get("conditions"):
continue
cat = doc.get("tags", {}).get("category", "unknown")
if cat not in result:
result[cat] = []
for cond in doc["detection"]["conditions"]:
if cond.get("operator") == "regex" and cond.get("value"):
pat = re.sub(r"^\(\?[imsx]+\)", "", cond["value"])
result[cat].append([doc["id"], doc.get("severity", "medium"), pat])

dest.write_text(json.dumps(result, indent=2, ensure_ascii=True))
total = sum(len(v) for v in result.values())
logger.info("ATR sync: %d patterns across %d categories -> %s", total, len(result), dest)
return total
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest this also is likely better extracted into the tools path as a separate utility to be executed independently to configuration the user's system. The utility should likely write the generated configuration to the user's XDG based data_path by default or to stdout so the user can place it in the correct location in their XDG_DATA_HOME for the detector to pick it up in place of the shipped version.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +31 to +37
_RULES_PATH = Path(__file__).parent / "atr_rules.json"
_ALL_RULES: dict[str, list[list[str]]] = {}
if _RULES_PATH.exists():
with open(_RULES_PATH) as f:
_ALL_RULES = json.load(f)
else:
logger.warning("ATR rules file not found: %s", _RULES_PATH)
Copy link
Copy Markdown
Collaborator

@jmartin-tech jmartin-tech Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the data_path access pattern see:

with open(
data_path / "graph_connectivity.json",
"r",
encoding="utf-8",
) as f:
self.prompts = json.load(f)

from garak.data import path as data_path

This helper class provides access to files in the installed package's data directory and supports user override of the file via the XDG base directory specification so users can provider their own content without needed write permissions to the python runtime library path.

Also it is preferred to load this inside of __init__ for a detector instead of globally on module import. This could be accomplished using a the ABC abstract class patterns. See:

class PackageHallucinationProbe(garak.probes.Probe, ABC):
"""Abstract base class for package hallucination probes
Generators sometimes recommend importing non-existent packages into code. These
package names can be found by attackers and then squatted in public package
repositories, so that incorrect code from generators will start to run, silently
loading malicious squatted packages onto the machine. This is bad. This probe
checks whether a model will recommend code that uses non-existent packages."""
lang = "*"
doc_uri = "https://vulcan.io/blog/ai-hallucinations-package-risk"
tags = [
"owasp:llm09",
"owasp:llm02",
"quality:Robustness:GenerativeMisinformation",
"payload:malicious:badcode",
]
goal = "base probe for importing non-existent packages"
DEFAULT_PARAMS = garak.probes.Probe.DEFAULT_PARAMS | {
"follow_prompt_cap": True,
}
@property
@abstractmethod
def language_name(self) -> str:
"""Programming language name - must be overridden by subclasses"""
raise NotImplementedError
def __init__(self, config_root=_config):
super().__init__(config_root=config_root)
self.prompts = []
for stub_prompt in stub_prompts:
for code_task in code_tasks:
self.prompts.append(
stub_prompt.replace("<language>", self.language_name).replace(
"<task>", code_task
)
)
if self.follow_prompt_cap:
self._prune_data(cap=self.soft_probe_prompt_cap)

@eeee2345
Copy link
Copy Markdown
Author

eeee2345 commented Apr 8, 2026

Thanks @jmartin-tech @leondz for the thorough review. Addressed all four points:

  1. data_path: rules moved to garak/data/atr/rules.json, loaded via from garak.data import path as data_path. Supports XDG user override.

  2. No subprocess: removed entirely from the detector. Sync tool now uses urllib.request to download a zip — no git dependency.

  3. Extracted tools: sync_rules() and generate_rule() moved to tools/atr.py. Writes to XDG data_path by default or --stdout. Detector is now pure detection logic only.

  4. Init-time loading: _load_rules() called in __init__, not on module import.

For context on the rule set and methodology — the full spec is at agentthreatrule.org and the academic paper is on Zenodo. Would appreciate any design feedback on how the detector categories map to garak's existing taxonomy — happy to adjust the tagging.

…subprocess

Changes per reviewer comments:
1. Rules loading uses garak's data_path mechanism (L37 feedback)
   - Moved atr_rules.json -> garak/data/atr/rules.json
   - Detector loads via from garak.data import path as data_path
   - Supports XDG user override
2. Removed subprocess.run (L64 feedback)
   - sync tool uses urllib.request to download zip
   - No git dependency required
3. Extracted helper methods to tools/ (L85, L171 feedback)
   - tools/atr.py: sync_rules() + generate_rule()
   - Writes to XDG data_path by default, or --stdout
   - Detector is now pure detection logic only
4. Rules loaded in __init__, not module level (L37 feedback)
   - _load_rules() called per-instance, not on import

Signed-off-by: eeee2345 <eeee2345@users.noreply.github.com>
@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from abf1394 to 63f60e8 Compare April 8, 2026 21:49
@leondz leondz changed the title feat: add ATR detectors -- 108 AI agent threat detection rules detectors: add Agent Threat Rules Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more adjustments requested.

Comment thread tools/atr.py Outdated
xdg_dir.mkdir(parents=True, exist_ok=True)
return xdg_dir / "rules.json"
except Exception:
return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the fallback location for the exception handler makes sense. If the import fails the tool was likely executed from a location other than the repo source. It is also somewhat unexpected for a tool to create something in a relative path like that. I would hazard that support for either XDG path or a user supplied command line location is sufficient and if the XDG path search raises and exception it may best to exit early and suggest the user to supply a valid --output value or utilize the --stdout option.

Suggested change
return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json"
print("The user XDG storage location could not be identified, supply --output or --stdout options or ensure garak is available in the python environment.", file=sys.stderr)
sys.exit(1)

Comment thread tools/atr.py


if __name__ == "__main__":
main()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required to ensure consistent windows support:

Suggested change
main()
sys.stdout.reconfigure(encoding="utf-8")
main()

Comment thread garak/detectors/atr.py Outdated

def _load_rules() -> dict[str, list[list[str]]]:
"""Load ATR rules from garak's data directory."""
rules_path = data_path / _ATR_RULES_FILE
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining in this manner may not work with data_path as the / operator is doing custom things in this context. I am also not sure the constant really adds value here as it is only used once, I could see value in the constant if utilized in test code.

_RULES_FILENAME = "rules.json"
Suggested change
rules_path = data_path / _ATR_RULES_FILE
rules_path = data_path / __name__.split(".")[-1] / _RULES_FILENAME

Comment thread garak/detectors/atr.py Outdated
Comment on lines +68 to +77
self._rules = _load_rules()
self._compiled: list[tuple[str, re.Pattern]] = []
for cat in self.atr_categories or list(self._rules.keys()):
self._compiled.extend(_compile_category(self._rules, cat))
logger.info(
"ATR detector %s: %d patterns from %d categories",
self.__class__.__name__,
len(self._compiled),
len(self.atr_categories) or len(self._rules),
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the code in detect there is no reason to keep the _rules dictionary outside the init scope.

Suggested change
self._rules = _load_rules()
self._compiled: list[tuple[str, re.Pattern]] = []
for cat in self.atr_categories or list(self._rules.keys()):
self._compiled.extend(_compile_category(self._rules, cat))
logger.info(
"ATR detector %s: %d patterns from %d categories",
self.__class__.__name__,
len(self._compiled),
len(self.atr_categories) or len(self._rules),
)
rules = _load_rules()
self._compiled: list[tuple[str, re.Pattern]] = []
for cat in self.atr_categories or list(rules.keys()):
self._compiled.extend(_compile_category(rules, cat))
logger.info(
"ATR detector %s: %d patterns from %d categories",
self.__class__.__name__,
len(self._compiled),
len(self.atr_categories) or len(rules),
)

@eeee2345
Copy link
Copy Markdown
Author

@jmartin-tech @leondz — all four review points have been addressed (data_path, no subprocess, extracted tools, init-time loading). Ready for re-review when you have a moment.

Since the original submission, ATR has shipped v2.0.0 with some changes worth noting:

  • 113 rules (up from 108), including 3 rules generated end-to-end by our Threat Cloud crystallization pipeline — the first detection rules produced by automated threat intelligence, not hand-written regex
  • RFC-001 v1.1: a vendor-neutral quality standard for detection rules with maturity levels, confidence scoring, and review tier definitions. This means every ATR rule ships with machine-readable quality metadata
  • 96,096-skill ecosystem scan discovered 751 active malware from 3 coordinated threat actors — validating these rules against real attacks, not just benchmarks
  • Compound detection gates: MCP-context rules now require 30%+ condition match, reducing false positives on legitimate documentation

Happy to update the PR to v2.0.0 rules if that's useful. The sync tool already supports pulling latest from npm, so garak users would get rule updates automatically via atr sync.

Also — if there's interest, ATR's Threat Cloud can accept detection signals from garak runs. That means every garak user running ATR detectors would contribute back to the rule pipeline. No PII, just pattern hashes. Happy to discuss if that's in scope.

@eeee2345
Copy link
Copy Markdown
Author

Apologies — my previous comment referenced the round-1 feedback only. I missed that there was a second round of review on 4/10.

This commit addresses all round-2 items:

  1. Fallback path: removed relative fallback, exits with error + guidance
  2. Windows encoding: added sys.stdout.reconfigure
  3. data_path: inlined per suggestion, dropped constant
  4. _rules scope: moved to local in init

Also updated rules.json to ATR v2.0.0 (113 rules, 736 patterns).

- tools/atr.py: exit with error if XDG path unavailable (no relative fallback)
- tools/atr.py: add sys.stdout.reconfigure for Windows encoding
- detectors/atr.py: inline data_path construction, drop module-level constant
- detectors/atr.py: move _rules to local scope in __init__
- rules.json: update to ATR v2.0.0 (113 rules, 736 patterns, 9 categories)

Signed-off-by: eeee2345 <imadam4real@gmail.com>
@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from 35d0cd9 to ddef4a5 Compare April 15, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants