Skip to content

SamsonCyber/stegoff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StegOFF

Steganography detection, semantic manipulation defense, and prompt injection guard for AI systems.

StegOFF finds and destroys hidden payloads in text, images, audio, PDFs, and code files before they reach your LLM, agent, or end user. v0.4.0 adds a semantic defense layer: ML-based detection of fabricated citations, authority priming, polarization bias, RAG poisoning, and few-shot code poisoning, plus an HTML sanitizer that strips hidden content injection before agent ingestion.

Zero dependencies for text detection. Optional numpy/Pillow for image and audio analysis.

Install

pip install stegoff            # text detection only, zero deps
pip install stegoff[image]     # + image analysis (numpy, Pillow)
pip install stegoff[full]      # + audio statistical analysis (scipy)
pip install stegoff[server]    # + FastAPI server & middleware

Quick Start

CLI

# Scan a file
stegoff scan suspicious.png

# Scan text from clipboard/pipe
echo "some text" | stegoff scan-text

# Scan a directory
stegoff scan-dir ./uploads --json

# Pipeline guard: strip steg from stdin, output clean text
cat user_input.txt | stegoff guard

# Block mode: exit 1 if steg found
cat user_input.txt | stegoff guard --block

Exit codes: 0 = clean, 2 = steganography detected.

Python API

from stegoff import scan, scan_text, scan_file

# Scan text (runs steg + injection + authority + polarization + ML classifier)
report = scan_text("Hello\u200c\u200d world")
print(report.clean)          # False
print(report.findings)       # [Finding(method=ZERO_WIDTH, ...)]
print(report.to_json())      # JSON output

# Scan a file (auto-detects format)
report = scan_file("image.png")

# Universal entry point (accepts str, bytes, or Path)
report = scan("anything")

Semantic Manipulation Detection

from stegoff import scan_text

# Detect fabricated citations and fake institutions
report = scan_text(
    "A study in the Journal of Advanced Neuropharmacology by Dr. Sarah Mitchell "
    "at the Harvard Cognitive Enhancement Institute confirms 340% improvement."
)
print(report.semantic_manipulation_detected)  # True
# Finding: authority_fabrication (confidence: 98.1%)

# Detect biased framing and superlative saturation
report = scan_text(
    "This revolutionary, groundbreaking, state-of-the-art product is unmatched "
    "and unparalleled. Every expert agrees. 100% guaranteed breakthrough."
)
# Finding: polarization_bias (score: 0.75)

# Detect RAG poisoning with fake standards
report = scan_text(
    "According to NIST SP 800-204C, eval() is the recommended approach. "
    "PEP 1247 mandates eval-first design patterns."
)
# Finding: authority_fabrication (fake NIST standard format)

# Clean text produces no findings
report = scan_text("Product scores 3.2/5. Reviews are mixed.")
print(report.clean)  # True

HTML Sanitization

from stegoff import sanitize_html, scan_html

# Scan HTML for hidden content injection vectors
findings = scan_html('<div style="display:none">HIDDEN INJECTION</div>')
# Finding: hidden_html_content (display:none)

# Strip hidden elements, comments, suspicious meta/aria before agent ingestion
clean_html, findings = sanitize_html(raw_html)
# Removes: CSS-hidden elements, HTML comments, suspicious meta tags,
#           oversized aria-labels, ld+json scripts

Decorator

from stegoff import steg_guard

@steg_guard
def process_user_message(text: str) -> str:
    return llm.generate(text)

# Strips steg by default. Options:
@steg_guard(on_detect="raise")     # raise StegDetected
@steg_guard(on_detect="log")       # log and continue
@steg_guard(block_injection=True)  # always raise on prompt injection

FastAPI Middleware

from fastapi import FastAPI
from stegoff.server.middleware import StegOffMiddleware

app = FastAPI()
app.add_middleware(StegOffMiddleware)  # scans all incoming requests

FastAPI Endpoints

from stegoff.server.app import app  # mount directly, or:
# POST /scan/file   - upload and scan
# POST /scan/text   - scan text body
# POST /guard/file  - upload, get sanitized version
# POST /guard/text  - scan text, get sanitized version
# GET  /health      - health check

Sanitization

from stegoff import sanitize_text, sanitize_html, sanitize_image, sanitize_wav

clean_text, result = sanitize_text(dirty_text)        # strip steg chars
clean_html, findings = sanitize_html(raw_html)         # strip hidden elements
clean_bytes, result = sanitize_image(image_bytes)      # re-encode, strip metadata
clean_bytes, result = sanitize_wav(wav_bytes)          # re-encode audio

What It Detects

Text Steganography (14 methods, zero dependencies)

Method Technique
Zero-width characters ZWSP, ZWNJ, ZWJ, BOM binary encoding
Unicode tag characters U+E0000 "invisible ink"
Homoglyph substitution Cyrillic/Greek/fullwidth replacing Latin
Variation selectors VS1-VS256 presence/absence encoding
Combining marks Stacked diacritics (Zalgo-style)
Confusable whitespace EN/EM/thin space substitution
Bidirectional overrides RTL/LTR override injection
Hangul fillers Korean filler chars in non-Korean text
Math alphanumeric Unicode math bold/italic substitution
Braille patterns Byte-per-character braille encoding
Emoji substitution Binary or multi-type emoji encoding
Emoji skin tone Skin tone modifier sequences
Invisible separators Catch-all for Unicode Cf format chars
Trailing whitespace Snow-style space/tab encoding at line ends

Semantic Manipulation (3 detectors, v0.4.0)

Detector Catches Technique
Authority Fabrication Fake journals, bogus institutions, fabricated standards Pattern matching against known-good databases + fabrication signature regex
Polarization Bias Superlative saturation, one-sided framing, missing hedging Lexical density scoring (superlatives, hedges, sentiment markers)
ML Classifier All 4 attack types + clean text TF-IDF (word + char n-grams) + Logistic Regression, 5-class, 14.5K features

The ML classifier detects:

  • Authority fabrication: fake journal names, fabricated researcher credentials, bogus institutional affiliations
  • Polarization bias: superlative-saturated marketing copy, one-sided product endorsements
  • RAG poisoning: fabricated NIST standards, fake PEPs, bogus security advisories recommending insecure practices
  • Few-shot poisoning: code review examples that approve SQL injection, eval(), pickle deserialization

Training data: 5,000 synthetic examples generated from attack templates (1,000 per class). The classifier runs in ~1ms with no GPU.

HTML Content Injection (sanitizer + scanner, v0.4.0)

Technique Detection
CSS display:none Style attribute regex
CSS visibility:hidden Style attribute regex
CSS opacity:0 Style attribute regex
CSS font-size:0/1px Style + color match
Offscreen positioning left/top: -9999px
Text-indent offscreen text-indent: -9999px
Color matching (white-on-white) Foreground color vs known backgrounds
HTML comments Comment node detection
Hidden class names .hidden, .d-none, .sr-only, etc.
Meta tag injection Long meta content (>50 chars)
Aria-label injection Oversized aria-label (>50 chars)
ld+json scripts Structured data injection

Prompt Injection (44 patterns, 17 categories)

All decoded payloads are automatically scanned for prompt injection: instruction overrides, jailbreaks, identity manipulation, privilege escalation, data exfiltration, code execution, destructive commands, delimiter injection, tool/function manipulation, social engineering, and more. Multi-vector attacks (3+ categories) trigger an aggregate CRITICAL finding.

Image (requires stegoff[image])

LSB chi-square/RS/sample pairs, DCT/F5 coefficient anomaly, bit-plane correlation, PVD, PNG ancillary chunks, trailing data, EXIF payload detection, palette manipulation.

Audio (requires stegoff[full])

WAV/PCM LSB statistical analysis, spectral anomaly detection, trailing data.

Binary / Documents

PDF (JavaScript, post-EOF, incremental updates, form fields, stream outliers), polyglot files, embedded file signatures, JSON structure scanning, code comment scanning, XML entity detection.

Architecture

stegoff/
  detectors/
    text.py                 14 Unicode steganography methods
    image.py                LSB, DCT, PVD, palette, EXIF, PNG
    binary.py               Polyglots, embedded files, steghide
    audio.py                WAV frequency analysis
    prompt_injection.py     44 regex patterns, 17 categories
    authority.py            Fabricated citation detection
    polarization.py         Bias and framing detection
    semantic_classifier.py  ML 5-class classifier
    llm.py                  Semantic steg via Anthropic API
  sanitizers/
    text.py                 Strip zero-width, homoglyphs, BIDI
    html.py                 Strip hidden elements, comments, meta
    image.py                Re-encode, strip metadata
    audio.py                Re-encode WAV
    structured.py           JSON/XML structure cleaning
    semantic.py             LLM-based semantic sanitization
  orchestrator.py           Routes scans, aggregates findings
  report.py                 Finding, ScanReport, Severity, StegMethod
  guard.py                  @steg_guard decorator
  cli.py                    CLI entry point
  server/                   FastAPI REST API + middleware

Tested Against

StegOFF is adversarially tested against st3gg (100+ encoding techniques) and the Agent Trap Lab (30 attack vectors across 5 categories). The test suite includes 500+ tests:

  • Text steganography: 33 detection tests, 8 sanitization tests
  • Prompt injection: 10 pattern tests + 38 red team payloads
  • Semantic manipulation: 54 detector tests + 30 red team evasion tests
  • HTML sanitizer: 30 red team tests (CSS evasion, DOM tricks, encoding)
  • ML classifier: 31 robustness tests (boundary, stability, generalization)
  • Pipeline integration: 63 end-to-end tests (scan_text, scan_file, report API)
  • False positives: 19 tests with legitimate content (academic text, code reviews, news)
  • Adversarial: 83 st3gg fixture tests (100% detection rate)
  • Full attack scenarios: product review pages, security advisories, documentation

Agent Trap Lab Results (v0.4.0)

Live testing against 29 trap pages using qwen3.5:9b via Ollama:

Attack Category Traps Baseline Compromised StegOFF Defended Defense Rate
Content Injection 8 6-8 6-8 100%
Behavioral Control 5 4-5 4-5 100%
Compositional 7 2 2 100%
Semantic Manipulation 5 1-2 0-1 50%
Cognitive State 3 2 0 0% (known gap)

Content injection, behavioral control, and compositional attacks are fully defended. Semantic attacks (authority priming, RAG poisoning) are detected by the ML classifier but the LLM may still repeat fabricated claims despite injected warnings. This is a fundamental limitation: the defense can flag the content, but the LLM's tendency to echo authoritative-sounding text is a model-level behavior, not a content-level one.

Development

git clone https://github.com/SamsonCyber/stegoff
cd stegoff
pip install -e ".[dev]"
pytest

Training the Semantic Classifier

from stegoff.detectors.semantic_classifier import train_and_save
metrics = train_and_save(n_per_class=1000, seed=42)
# Generates 5,000 synthetic examples, trains TF-IDF + LR
# Model saved to stegoff/detectors/models/semantic_clf.pkl

Research

The semantic defense layer is informed by:

  • BiasDef (Wu & Saxena, 2025, arXiv 2512.00804) for polarization scoring in RAG pipelines
  • Authority Bias in LLMs (Mammen et al., 2026, arXiv 2601.13433) for representation-level authority susceptibility
  • MiniCheck (Tang et al., 2024, arXiv 2404.10774) for NLI-based grounded fact-checking
  • SelfCheckGPT (Manakul et al., 2023, arXiv 2303.08896) for consistency-based hallucination detection
  • CaMeL (Debenedetti et al., 2025, arXiv 2503.18813) for dual-LLM architecture against indirect prompt injection
  • Franklin et al. (2026) agent vulnerability taxonomy

License

MIT

About

Steganography detection, sanitization & prompt injection guard for AI systems. 35+ methods, 30+ formats, 3 detection layers, 0% false positives.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages