Skip to content

Fangcun-AI/SkillWard

Repository files navigation

SkillWard Banner

SkillWard

SkillWard License Python Docker Version

SkillWard is a security scanner for AI Agent Skills that combines static analysis, LLM evaluation, and sandbox verification to comprehensively identify potential risks in Agent Skills.

Highlights · Architecture · UI · Benchmark · Quick Start · Structure · English | 中文

"Five scanners on 238,180 Skills showed highly inconsistent results, only 0.12% were flagged by all five, with individual flag rates ranging from 3.79% to 41.93%." — Holzbauer et al., Malicious Or Not: Adding Repository Context to Agent Skill Classification, 2026

SkillWard enables security review of AI Agent Skills before they are published or deployed, reducing the potential risks of Agent usage. Beyond static analysis and LLM evaluation, it executes suspicious Skills in isolated Docker sandboxes, replacing uncertain warnings with runtime evidence. Across 5,000 real-world Skills, ~25% were flagged as unsafe; among the ~38% suspicious samples that entered the sandbox, ~one-third revealed runtime threats that review-only pipelines could not catch.

How does SkillWard address this challenge?

We ran two existing open-source scanning tools on the same dataset as reference baselines (see Comparison for details). Here are three real-world cases:

  • Unique Detection: Threats missed by other tools, precisely caught by SkillWard — see ai-skill-scanner
  • Low False Positives: Compliant content wrongly blocked by other tools, correctly cleared by SkillWard — see roku
  • Deeper Analysis: For threats all tools detect, SkillWard provides more complete risk tracing and evidence — see amber-hunter

Highlights

  • Three-Stage Security Coverage - Static analysis, LLM evaluation, and sandbox execution turn obvious threats and ambiguous warnings into high-confidence decisions
  • Autonomous Sandbox Execution - An in-container Agent provisions environments, installs dependencies, repairs common failures, and drives Skills end-to-end with up to 99% deployment success
  • Runtime Security Guard - A purpose-built Guard monitors Agent runtime behavior, capturing clear evidence for exfiltration, suspicious network access, sensitive writes, and hidden credential risks
  • Ready Out of the Box, Extensible on Demand - Single-skill or batch scans, Quick Scan / Sandbox Scan / Deep Trace modes, tunable via environment variables, LLM provider configuration, and Docker settings
  • Evidence-Rich Results - Every scan returns real-time logs, three-stage findings, threat evidence, and remediation guidance that security and platform teams can act on immediately

Architecture

System Architecture

SkillWard uses a static + dynamic three-stage analysis approach:

Stage A · Static Analysis: Runs in seconds, catches known malicious patterns and suspicious signals

Scans Skill code and configuration using YARA rules and regex to identify known malicious patterns (credential theft, code injection, etc.), validates that a Skill's declared permissions and capabilities match its actual code behavior, and detects hidden files, encoding obfuscation, prompt poisoning, and other suspicious characteristics.

Stage B · LLM Evaluation: Semantic reasoning to judge intent and assign safety confidence

Adds semantic reasoning on top of static signals. Skills that can be confidently classified are resolved here; Skills that remain uncertain advance to Stage C for sandbox verification.

Stage C · Sandbox Verification: Actually runs suspicious Skills, leaving hidden risks nowhere to hide

An in-container Agent executes the Skill end-to-end, with a custom Guard monitoring throughout. Pre-planted honeypot decoys lure malicious Skills into revealing credential theft, data exfiltration, supply chain attacks, and other hidden behavior.


SkillWard UI

SkillWard UI provides a clean, intuitive web interface, supporting single or batch Skill submission, three scan modes (Quick Scan / Sandbox Scan / Deep Trace), and comprehensive scan results display.

Single Skill Scan Batch Scan
Single Skill Scan Demo Batch Scan Demo

Detailed Analysis Report

Report Overview + Three-Stage Analysis Threat Details + Detection Evidence + Recommendations
Analysis Report Details Analysis Report - Threats & Recommendations

Each report includes: Analysis Results (three-stage verdicts, confidence scores, threat levels), Issue Location (file path, line number, highlighted code snippets), and Remediation Suggestions (actionable security recommendations).


Benchmark

We evaluated SkillWard on a real-world AI Agent Skills dataset containing Skills collected from ClawHub and known-malicious samples curated from security communities.

Pipeline Results

Stage A + B: Static Scan + LLM Evaluation

Combining YARA rules, regex-based static analysis, and LLM semantic evaluation, all Skills are quickly triaged: safe ~49%, unsafe ~13%, suspicious ~38%, where suspicious Skills are escalated to Stage C for sandbox verification.

Stage C: Sandbox Verification

After executing this batch of suspicious Skills end-to-end inside an isolated Docker sandbox, roughly one-third revealed potential threats that neither static analysis nor LLM evaluation could catch, including:

  • Credential exfiltration that only surfaces along the execution path
  • Persistence backdoors via crontab / SSH / startup scripts
  • Postinstall supply-chain attacks triggered during package installation
  • Outbound exfiltration chains identifiable only after correlating multi-step operations

Stage C verdict breakdown for these suspicious Skills:

Level Meaning % of suspicious
safe Confirmed safe after sandbox verification ~69%
medium risk Medium-risk behavior (undeclared external requests, env-var harvesting, etc.) ~17%
high risk High-risk behavior (credential theft, persistence backdoors, remote code execution, etc.) ~14%

Overall

Across all stages: Stage A + B directly blocked ~13% unsafe Skills, and ~38% suspicious Skills entered the sandbox; among those suspicious Skills, ~17% were judged medium risk and ~14% were judged high risk.

Common Threat Patterns (% of unsafe Skills)

Pattern Occurrences
Credential theft (API keys, passwords, private keys) 36%
Undeclared external network requests 24%
Env var / .env harvesting 15%
Remote code download and execution 9%
Persistence backdoor (crontab / SSH / startup) 8%
Supply chain and privilege escalation 8%

For detailed case studies and comparison, see How does SkillWard address this challenge? above.


Quick Start

Requirements: Python 3.10+ / Docker (sandbox) / Node.js 18+ (UI mode)

1. Install & Configure

# Clone the repository
git clone https://github.com/Fangcun-AI/SkillWard.git
cd SkillWard

# Install dependencies
pip install -r requirements.txt && pip install -e ./skill-scanner

# Pull Docker sandbox image
docker pull fangcunai/skillward:amd64    # Intel/AMD
docker pull fangcunai/skillward:arm64    # Apple Silicon/ARM

# Configure environment variables (.env.example lists all available options — fill in as needed)
cp guardian-api/.env.example guardian-api/.env

For detailed configuration, see Configuration Guide

2. Run Scans

# Full pipeline (static + LLM + sandbox)
python guardian-api/guardian.py /path/to/skills-dir -o ./output --enable-after-tool --parallel 4 -v

# Stage A + B only (static + LLM, no Docker required)
python guardian-api/guardian.py /path/to/skills-dir --stage pre-scan -o ./output -v

# Stage C only (Docker sandbox)
python guardian-api/guardian.py /path/to/skills-dir --stage runtime -o ./output --enable-after-tool --parallel 4

3. Common Scenarios

# Scan specific Skills only
python guardian-api/guardian.py /path/to/skills-dir -s skill-a,skill-b -o ./output

# Quick test run (first 10 Skills)
python guardian-api/guardian.py /path/to/skills-dir -n 10 -o ./output

# Increase sandbox timeout for complex Skills
python guardian-api/guardian.py /path/to/skills-dir --timeout 900 --prep-timeout 600 -o ./output

For more options and usage details, see CLI Guide

Tip

Optional: Launch Web UI

cd guardian-api && python guardian_api.py       # API server
cd guardian-ui && npm install && npm run dev    # Frontend

Repository Structure

SkillWard/
├── docs/                        # Documentation (config, CLI, cases, comparison)
├── guardian-api/                 # Backend: scanning pipeline & API server
│   ├── guardian.py               # Core three-stage scanning engine
│   └── guardian_api.py           # FastAPI server (SSE streaming)
├── guardian-ui/                  # Frontend: Next.js web dashboard
├── skill-scanner/                # Static analysis engine (15 analyzers)
├── models/                      # Data model definitions
├── services/                    # Business logic services
├── utils/                       # Utility functions
├── resources/                   # Banner, screenshots, demo assets
├── requirements.txt
├── README.md
└── README_CN.md
Guide Description
Configuration Quick start, LLM model providers, sandbox security monitoring, optional tuning
CLI Guide Full command-line reference, common usage, and output files
Showcase Real-world detection cases, how SkillWard catches threats in public Skills
Comparison Side-by-side analysis with two open-source scanning tools

License

Apache License 2.0

About

Security scanner for Agent Skills — uncover hidden threats before deployment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors