SkillWard

SkillWard is a security scanner for AI Agent Skills that combines static analysis, LLM evaluation, and sandbox verification to comprehensively identify potential risks in Agent Skills.

Highlights · Architecture · UI · Benchmark · Quick Start · Structure · English | 中文

"Five scanners on 238,180 Skills showed highly inconsistent results, only 0.12% were flagged by all five, with individual flag rates ranging from 3.79% to 41.93%." — Holzbauer et al., Malicious Or Not: Adding Repository Context to Agent Skill Classification, 2026

SkillWard enables security review of AI Agent Skills before they are published or deployed, reducing the potential risks of Agent usage. Beyond static analysis and LLM evaluation, it executes suspicious Skills in isolated Docker sandboxes, replacing uncertain warnings with runtime evidence. Across 5,000 real-world Skills, ~25% were flagged as unsafe; among the ~38% suspicious samples that entered the sandbox, ~one-third revealed runtime threats that review-only pipelines could not catch.

How does SkillWard address this challenge?

We ran two existing open-source scanning tools on the same dataset as reference baselines (see Comparison for details). Here are three real-world cases:

Unique Detection: Threats missed by other tools, precisely caught by SkillWard — see ai-skill-scanner
Low False Positives: Compliant content wrongly blocked by other tools, correctly cleared by SkillWard — see roku
Deeper Analysis: For threats all tools detect, SkillWard provides more complete risk tracing and evidence — see amber-hunter

Highlights

Three-Stage Security Coverage - Static analysis, LLM evaluation, and sandbox execution turn obvious threats and ambiguous warnings into high-confidence decisions
Autonomous Sandbox Execution - An in-container Agent provisions environments, installs dependencies, repairs common failures, and drives Skills end-to-end with up to 99% deployment success
Runtime Security Guard - A purpose-built Guard monitors Agent runtime behavior, capturing clear evidence for exfiltration, suspicious network access, sensitive writes, and hidden credential risks
Ready Out of the Box, Extensible on Demand - Single-skill or batch scans, Quick Scan / Sandbox Scan / Deep Trace modes, tunable via environment variables, LLM provider configuration, and Docker settings
Evidence-Rich Results - Every scan returns real-time logs, three-stage findings, threat evidence, and remediation guidance that security and platform teams can act on immediately

Architecture

SkillWard uses a static + dynamic three-stage analysis approach:

Stage A · Static Analysis: Runs in seconds, catches known malicious patterns and suspicious signals

Scans Skill code and configuration using YARA rules and regex to identify known malicious patterns (credential theft, code injection, etc.), validates that a Skill's declared permissions and capabilities match its actual code behavior, and detects hidden files, encoding obfuscation, prompt poisoning, and other suspicious characteristics.

Stage B · LLM Evaluation: Semantic reasoning to judge intent and assign safety confidence

Adds semantic reasoning on top of static signals. Skills that can be confidently classified are resolved here; Skills that remain uncertain advance to Stage C for sandbox verification.

Stage C · Sandbox Verification: Actually runs suspicious Skills, leaving hidden risks nowhere to hide

An in-container Agent executes the Skill end-to-end, with a custom Guard monitoring throughout. Pre-planted honeypot decoys lure malicious Skills into revealing credential theft, data exfiltration, supply chain attacks, and other hidden behavior.

SkillWard UI

SkillWard UI provides a clean, intuitive web interface, supporting single or batch Skill submission, three scan modes (Quick Scan / Sandbox Scan / Deep Trace), and comprehensive scan results display.

Single Skill Scan	Batch Scan

Detailed Analysis Report

Report Overview + Three-Stage Analysis	Threat Details + Detection Evidence + Recommendations

Each report includes: Analysis Results (three-stage verdicts, confidence scores, threat levels), Issue Location (file path, line number, highlighted code snippets), and Remediation Suggestions (actionable security recommendations).

Benchmark

We evaluated SkillWard on a real-world AI Agent Skills dataset containing Skills collected from ClawHub and known-malicious samples curated from security communities.

Pipeline Results

Stage A + B: Static Scan + LLM Evaluation

Combining YARA rules, regex-based static analysis, and LLM semantic evaluation, all Skills are quickly triaged: safe ~49%, unsafe ~13%, suspicious ~38%, where suspicious Skills are escalated to Stage C for sandbox verification.

Stage C: Sandbox Verification

After executing this batch of suspicious Skills end-to-end inside an isolated Docker sandbox, roughly one-third revealed potential threats that neither static analysis nor LLM evaluation could catch, including:

Credential exfiltration that only surfaces along the execution path
Persistence backdoors via crontab / SSH / startup scripts
Postinstall supply-chain attacks triggered during package installation
Outbound exfiltration chains identifiable only after correlating multi-step operations

Stage C verdict breakdown for these suspicious Skills:

Level	Meaning	% of suspicious
safe	Confirmed safe after sandbox verification	~69%
medium risk	Medium-risk behavior (undeclared external requests, env-var harvesting, etc.)	~17%
high risk	High-risk behavior (credential theft, persistence backdoors, remote code execution, etc.)	~14%

Overall

Across all stages: Stage A + B directly blocked ~13% unsafe Skills, and ~38% suspicious Skills entered the sandbox; among those suspicious Skills, ~17% were judged medium risk and ~14% were judged high risk.

Common Threat Patterns (% of unsafe Skills)

Pattern	Occurrences
Credential theft (API keys, passwords, private keys)	36%
Undeclared external network requests	24%
Env var / `.env` harvesting	15%
Remote code download and execution	9%
Persistence backdoor (crontab / SSH / startup)	8%
Supply chain and privilege escalation	8%

For detailed case studies and comparison, see How does SkillWard address this challenge? above.

Quick Start

Requirements: Python 3.10+ / Docker (sandbox) / Node.js 18+ (UI mode)

1. Install & Configure

# Clone the repository
git clone https://github.com/Fangcun-AI/SkillWard.git
cd SkillWard

# Install dependencies
pip install -r requirements.txt && pip install -e ./skill-scanner

# Pull Docker sandbox image
docker pull fangcunai/skillward:amd64    # Intel/AMD
docker pull fangcunai/skillward:arm64    # Apple Silicon/ARM

# Configure environment variables (.env.example lists all available options — fill in as needed)
cp guardian-api/.env.example guardian-api/.env

For detailed configuration, see Configuration Guide

2. Run Scans

# Full pipeline (static + LLM + sandbox)
python guardian-api/guardian.py /path/to/skills-dir -o ./output --enable-after-tool --parallel 4 -v

# Stage A + B only (static + LLM, no Docker required)
python guardian-api/guardian.py /path/to/skills-dir --stage pre-scan -o ./output -v

# Stage C only (Docker sandbox)
python guardian-api/guardian.py /path/to/skills-dir --stage runtime -o ./output --enable-after-tool --parallel 4

3. Common Scenarios

# Scan specific Skills only
python guardian-api/guardian.py /path/to/skills-dir -s skill-a,skill-b -o ./output

# Quick test run (first 10 Skills)
python guardian-api/guardian.py /path/to/skills-dir -n 10 -o ./output

# Increase sandbox timeout for complex Skills
python guardian-api/guardian.py /path/to/skills-dir --timeout 900 --prep-timeout 600 -o ./output

For more options and usage details, see CLI Guide

Tip

Optional: Launch Web UI

cd guardian-api && python guardian_api.py       # API server
cd guardian-ui && npm install && npm run dev    # Frontend

Repository Structure

SkillWard/
├── docs/                        # Documentation (config, CLI, cases, comparison)
├── guardian-api/                 # Backend: scanning pipeline & API server
│   ├── guardian.py               # Core three-stage scanning engine
│   └── guardian_api.py           # FastAPI server (SSE streaming)
├── guardian-ui/                  # Frontend: Next.js web dashboard
├── skill-scanner/                # Static analysis engine (15 analyzers)
├── models/                      # Data model definitions
├── services/                    # Business logic services
├── utils/                       # Utility functions
├── resources/                   # Banner, screenshots, demo assets
├── requirements.txt
├── README.md
└── README_CN.md

Guide	Description
Configuration	Quick start, LLM model providers, sandbox security monitoring, optional tuning
CLI Guide	Full command-line reference, common usage, and output files
Showcase	Real-world detection cases, how SkillWard catches threats in public Skills
Comparison	Side-by-side analysis with two open-source scanning tools

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkillWard

How does SkillWard address this challenge?

Highlights

Architecture

SkillWard UI

Detailed Analysis Report

Benchmark

Pipeline Results

Stage A + B: Static Scan + LLM Evaluation

Stage C: Sandbox Verification

Overall

Common Threat Patterns (% of unsafe Skills)

Quick Start

1. Install & Configure

2. Run Scans

3. Common Scenarios

Repository Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.claude		.claude
.idea		.idea
docs		docs
guardian-api		guardian-api
guardian-ui		guardian-ui
models		models
resources		resources
services		services
skill-scanner		skill-scanner
utils		utils
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SkillWard

How does SkillWard address this challenge?

Highlights

Architecture

SkillWard UI

Detailed Analysis Report

Benchmark

Pipeline Results

Stage A + B: Static Scan + LLM Evaluation

Stage C: Sandbox Verification

Overall

Common Threat Patterns (% of unsafe Skills)

Quick Start

1. Install & Configure

2. Run Scans

3. Common Scenarios

Repository Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages