UnityInFlow · hermanngeorge15 · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,45 @@
+# Contributing to injection-scanner
+
+## Development
+
+```bash
+cargo build          # compile
+cargo test           # run tests
+cargo clippy -- -D warnings  # lint
+cargo fmt            # format
+cargo build --release # optimized binary
+```
+
+## Adding a New Pattern
+
+1. Choose the appropriate category YAML file in `patterns/core/`
+2. Add your pattern following the existing format:
+```yaml
+  - id: PI0XX
+    name: descriptive-name
+    pattern: "your\\s+regex\\s+pattern"
+    description: "What this pattern detects"
+    remediation: "How to fix it"
+    tags: [category]
+```
+3. Optionally override severity: `severity: CRITICAL` (otherwise inherits from category)
+4. Add test cases in the appropriate test file
+5. Run `cargo test` -- all green
+6. Submit a PR
+
+## Pattern ID Numbering
+
+- PI001-PI009: Role override (Category A)
+- PI010-PI019: Instruction injection (Category B)
+- PI020-PI029: Data exfiltration (Category C)
+- PI030-PI039: Jailbreaks (Category D)
+- PI040-PI049: Encoding/obfuscation (Category E)
+
+## Commit Convention
+
+```
+feat: add new pattern category
+fix: reduce false positives in PI001
+test: add non-match cases for exfiltration
+docs: update PATTERNS.md
+```
diff --git a/PATTERNS.md b/PATTERNS.md
@@ -0,0 +1,38 @@
+# Pattern Contribution Guide
+
+injection-scanner uses a YAML-based pattern library. Core patterns are embedded at compile time. Community patterns can be added via PR.
+
+## Pattern Format
+
+```yaml
+category: category_name
+default_severity: CRITICAL  # CRITICAL | HIGH | MEDIUM | LOW
+patterns:
+  - id: PI0XX
+    name: descriptive-name
+    pattern: "regex\\s+pattern"
+    severity: HIGH  # optional -- overrides category default
+    description: "What this detects"
+    remediation: "How to fix"
+    tags: [tag1, tag2]
+```
+
+## Categories
+
+| Category | ID Range | Default Severity |
+|---|---|---|
+| Role Override | PI001-PI009 | CRITICAL |
+| Instruction Injection | PI010-PI019 | HIGH |
+| Data Exfiltration | PI020-PI029 | CRITICAL |
+| Jailbreaks | PI030-PI039 | HIGH |
+| Encoding/Obfuscation | PI040-PI049 | HIGH |
+
+## Submitting a Pattern
+
+1. Fork the repo
+2. Add pattern to the appropriate `patterns/core/*.yaml` file
+3. Include in your PR:
+   - At least 3 true positive test cases
+   - At least 2 non-match cases (false positive prevention)
+4. Run `cargo test`
+5. Submit PR with title: `feat: add PI0XX pattern-name`
diff --git a/README.md b/README.md
@@ -1,16 +1,133 @@
 # injection-scanner
 
-> Prompt injection static scanner -- detects role overrides, instruction injection, exfiltration, jailbreaks, and encoding attacks in AI spec files, skills, and RAG documents.
+Prompt injection is the SQL injection of the AI era. As AI agents process untrusted text -- skill files, RAG documents, user inputs, CLAUDE.md specs -- a single injected instruction can hijack agent behavior, exfiltrate data, or bypass safety controls.
 
-**Status:** Under construction
+**injection-scanner** is a static analysis tool that catches prompt injection patterns before they reach your AI agent. It scans files for role overrides, instruction injection, data exfiltration, jailbreaks, and encoding attacks using a maintained YAML pattern library.
 
-## What this does
+## Installation
 
-Scans files for prompt injection patterns that could redirect AI agent behavior -- exfiltrating data, bypassing safety rules, or executing unintended actions.
+Download the pre-built binary for your platform from [GitHub Releases](https://github.com/UnityInFlow/injection-scanner/releases).
 
-## Installation
+```bash
+# macOS / Linux
+chmod +x injection-scanner
+sudo mv injection-scanner /usr/local/bin/
+
+# Verify
+injection-scanner --version
+```
+
+Or build from source:
+
+```bash
+cargo install --path .
+```
+
+## Usage
+
+### Scan a single file
+
+```bash
+injection-scanner check CLAUDE.md
+```
+
+### Scan a directory recursively
+
+```bash
+injection-scanner check .
+```
+
+### Read from stdin
+
+```bash
+cat skill.md | injection-scanner check -
+```
+
+### JSON output
+
+```bash
+injection-scanner check CLAUDE.md --format json
+```
+
+## Pattern Categories
+
+| Category | Patterns | Default Severity | Examples |
+|---|---|---|---|
+| Role Override | 7 | CRITICAL | "ignore all previous instructions", "you are now", "forget everything" |
+| Instruction Injection | 5 | HIGH | "SYSTEM OVERRIDE", hidden HTML instructions, "your real purpose is" |
+| Data Exfiltration | 6 | CRITICAL | "send data to https://...", "output your instructions", "repeat system message" |
+| Jailbreaks | 9 | HIGH | "developer mode enabled", "DAN mode", "pretend you have no restrictions" |
+| Encoding/Obfuscation | 3 | HIGH | Unicode RTL overrides, zero-width characters, zero-width sequences |
+
+**30 patterns** across 5 categories. See [PATTERNS.md](PATTERNS.md) for the full pattern contribution guide.
+
+## Output Examples
+
+### Text output (default)
+
+```
+tests/fixtures/injected-skill.md
+  :6 CRITICAL  Attempts to override agent instructions — Remove instruction override text. If documenting attacks, use code blocks.  (PI001)
+  :6 CRITICAL  System-level instruction injection — Remove system override. Only actual system prompts should use SYSTEM prefix.  (PI011)
+  :9 HIGH  Role reassignment attempt — Remove role override. Use explicit skill scoping instead.  (PI003)
+  :9 CRITICAL  System-level instruction injection — Remove system override. Only actual system prompts should use SYSTEM prefix.  (PI011)
+  :14 HIGH  Developer mode jailbreak — Remove developer mode activation.  (PI030)
+  :16 HIGH  Restriction removal via pretense — Remove restriction bypass.  (PI033)
+
+6 finding(s): 3 critical, 3 high, 0 medium, 0 low
+```
+
+### JSON output
+
+```json
+[
+  {
+    "file": "tests/fixtures/injected-skill.md",
+    "matches": [
+      {
+        "pattern_id": "PI001",
+        "pattern_name": "ignore-previous-instructions",
+        "severity": "CRITICAL",
+        "message": "Attempts to override agent instructions",
+        "remediation": "Remove instruction override text. If documenting attacks, use code blocks.",
+        "file": "tests/fixtures/injected-skill.md",
+        "line": 6,
+        "matched_text": "ignore all previous instructions"
+      }
+    ],
+    "critical_count": 3,
+    "high_count": 3,
+    "medium_count": 0,
+    "low_count": 0
+  }
+]
+```
+
+## Inline Suppression
+
+Suppress specific patterns on a line by adding a comment on the line above:
+
+```markdown
+<!-- injection-scanner:ignore PI001 -->
+ignore all previous instructions
+```
+
+Multiple patterns can be suppressed:
+
+```markdown
+<!-- injection-scanner:ignore PI001,PI003 -->
+```
+
+## Exit Codes
+
+| Code | Meaning |
+|---|---|
+| 0 | No findings |
+| 1 | One or more findings detected |
+
+## Part of UnityInFlow
 
-Coming soon -- pre-built binaries on GitHub Releases.
+injection-scanner is tool #03 in the [UnityInFlow](https://github.com/UnityInFlow) AI agent tooling ecosystem -- 20 open-source tools covering spec validation, token cost control, runtime, compliance, and more.
 
 ## License
 

diff --git a/src/main.rs b/src/main.rs
@@ -1,8 +1,19 @@
+use std::fs;
+use std::io::Read;
+use std::path::PathBuf;
+
+use anyhow::{Context, Result};
 use clap::Parser;
 
+use injection_scanner::allowlist::parse_suppressions;
+use injection_scanner::pattern::{PatternCategory, ScanReport};
+use injection_scanner::patterns::load_all_patterns;
+use injection_scanner::reporter::{format_json, format_text};
+use injection_scanner::scanner::scan_content;
+
 #[derive(Parser)]
 #[command(name = "injection-scanner")]
-#[command(about = "Prompt injection static scanner for AI spec files")]
+#[command(about = "Prompt injection static scanner for AI spec files, skills, and RAG documents")]
 #[command(version)]
 struct Cli {
     #[command(subcommand)]
@@ -13,16 +24,87 @@ struct Cli {
 enum Commands {
     /// Scan files for prompt injection patterns
     Check {
-        /// File or directory to scan
+        /// File or directory to scan (use - for stdin)
         path: String,
         /// Output format: text or json
         #[arg(long, default_value = "text")]
         format: String,
+        /// Additional patterns directory
+        #[arg(long)]
+        patterns: Option<PathBuf>,
     },
 }
 
-fn main() -> anyhow::Result<()> {
-    let _cli = Cli::parse();
-    println!("injection-scanner v0.0.1 — not yet implemented");
-    Ok(())
+fn scan_file(path: &str, content: &str, categories: &[PatternCategory]) -> ScanReport {
+    let suppressions = parse_suppressions(content);
+    scan_content(path, content, categories, &suppressions)
+}
+
+fn walkdir(dir: &PathBuf) -> Result<Vec<PathBuf>> {
+    let mut files = Vec::new();
+    for entry in
+        fs::read_dir(dir).with_context(|| format!("Failed to read directory {}", dir.display()))?
+    {
+        let entry = entry?;
+        let path = entry.path();
+        if path.is_file() {
+            let ext = path.extension().and_then(|e| e.to_str()).unwrap_or("");
+            if matches!(ext, "md" | "yaml" | "yml" | "txt" | "toml") {
+                files.push(path);
+            }
+        } else if path.is_dir() {
+            files.extend(walkdir(&path)?);
+        }
+    }
+    Ok(files)
+}
+
+fn main() -> Result<()> {
+    let cli = Cli::parse();
+
+    match cli.command {
+        Commands::Check {
+            path,
+            format,
+            patterns,
+        } => {
+            let categories =
+                load_all_patterns(patterns.as_deref()).context("Failed to load patterns")?;
+
+            let mut reports = Vec::new();
+
+            if path == "-" {
+                let mut content = String::new();
+                std::io::stdin()
+                    .read_to_string(&mut content)
+                    .context("Failed to read from stdin")?;
+                reports.push(scan_file("<stdin>", &content, &categories));
+            } else {
+                let target = PathBuf::from(&path);
+                if target.is_file() {
+                    let content = fs::read_to_string(&target)
+                        .with_context(|| format!("Failed to read {}", target.display()))?;
+                    reports.push(scan_file(&path, &content, &categories));
+                } else if target.is_dir() {
+                    for entry in walkdir(&target)? {
+                        let content = fs::read_to_string(&entry)
+                            .with_context(|| format!("Failed to read {}", entry.display()))?;
+                        reports.push(scan_file(&entry.to_string_lossy(), &content, &categories));
+                    }
+                } else {
+                    anyhow::bail!("Path does not exist: {}", path);
+                }
+            }
+
+            let output = match format.as_str() {
+                "json" => format_json(&reports)?,
+                _ => format_text(&reports),
+            };
+
+            print!("{}", output);
+
+            let has_findings = reports.iter().any(|r| r.has_findings());
+            std::process::exit(if has_findings { 1 } else { 0 });
+        }
+    }
 }