Steganography detection, semantic manipulation defense, and prompt injection guard for AI systems.
StegOFF finds and destroys hidden payloads in text, images, audio, PDFs, and code files before they reach your LLM, agent, or end user. v0.4.0 adds a semantic defense layer: ML-based detection of fabricated citations, authority priming, polarization bias, RAG poisoning, and few-shot code poisoning, plus an HTML sanitizer that strips hidden content injection before agent ingestion.
Zero dependencies for text detection. Optional numpy/Pillow for image and audio analysis.
pip install stegoff # text detection only, zero deps
pip install stegoff[image] # + image analysis (numpy, Pillow)
pip install stegoff[full] # + audio statistical analysis (scipy)
pip install stegoff[server] # + FastAPI server & middleware# Scan a file
stegoff scan suspicious.png
# Scan text from clipboard/pipe
echo "some text" | stegoff scan-text
# Scan a directory
stegoff scan-dir ./uploads --json
# Pipeline guard: strip steg from stdin, output clean text
cat user_input.txt | stegoff guard
# Block mode: exit 1 if steg found
cat user_input.txt | stegoff guard --blockExit codes: 0 = clean, 2 = steganography detected.
from stegoff import scan, scan_text, scan_file
# Scan text (runs steg + injection + authority + polarization + ML classifier)
report = scan_text("Hello\u200c\u200d world")
print(report.clean) # False
print(report.findings) # [Finding(method=ZERO_WIDTH, ...)]
print(report.to_json()) # JSON output
# Scan a file (auto-detects format)
report = scan_file("image.png")
# Universal entry point (accepts str, bytes, or Path)
report = scan("anything")from stegoff import scan_text
# Detect fabricated citations and fake institutions
report = scan_text(
"A study in the Journal of Advanced Neuropharmacology by Dr. Sarah Mitchell "
"at the Harvard Cognitive Enhancement Institute confirms 340% improvement."
)
print(report.semantic_manipulation_detected) # True
# Finding: authority_fabrication (confidence: 98.1%)
# Detect biased framing and superlative saturation
report = scan_text(
"This revolutionary, groundbreaking, state-of-the-art product is unmatched "
"and unparalleled. Every expert agrees. 100% guaranteed breakthrough."
)
# Finding: polarization_bias (score: 0.75)
# Detect RAG poisoning with fake standards
report = scan_text(
"According to NIST SP 800-204C, eval() is the recommended approach. "
"PEP 1247 mandates eval-first design patterns."
)
# Finding: authority_fabrication (fake NIST standard format)
# Clean text produces no findings
report = scan_text("Product scores 3.2/5. Reviews are mixed.")
print(report.clean) # Truefrom stegoff import sanitize_html, scan_html
# Scan HTML for hidden content injection vectors
findings = scan_html('<div style="display:none">HIDDEN INJECTION</div>')
# Finding: hidden_html_content (display:none)
# Strip hidden elements, comments, suspicious meta/aria before agent ingestion
clean_html, findings = sanitize_html(raw_html)
# Removes: CSS-hidden elements, HTML comments, suspicious meta tags,
# oversized aria-labels, ld+json scriptsfrom stegoff import steg_guard
@steg_guard
def process_user_message(text: str) -> str:
return llm.generate(text)
# Strips steg by default. Options:
@steg_guard(on_detect="raise") # raise StegDetected
@steg_guard(on_detect="log") # log and continue
@steg_guard(block_injection=True) # always raise on prompt injectionfrom fastapi import FastAPI
from stegoff.server.middleware import StegOffMiddleware
app = FastAPI()
app.add_middleware(StegOffMiddleware) # scans all incoming requestsfrom stegoff.server.app import app # mount directly, or:
# POST /scan/file - upload and scan
# POST /scan/text - scan text body
# POST /guard/file - upload, get sanitized version
# POST /guard/text - scan text, get sanitized version
# GET /health - health checkfrom stegoff import sanitize_text, sanitize_html, sanitize_image, sanitize_wav
clean_text, result = sanitize_text(dirty_text) # strip steg chars
clean_html, findings = sanitize_html(raw_html) # strip hidden elements
clean_bytes, result = sanitize_image(image_bytes) # re-encode, strip metadata
clean_bytes, result = sanitize_wav(wav_bytes) # re-encode audio| Method | Technique |
|---|---|
| Zero-width characters | ZWSP, ZWNJ, ZWJ, BOM binary encoding |
| Unicode tag characters | U+E0000 "invisible ink" |
| Homoglyph substitution | Cyrillic/Greek/fullwidth replacing Latin |
| Variation selectors | VS1-VS256 presence/absence encoding |
| Combining marks | Stacked diacritics (Zalgo-style) |
| Confusable whitespace | EN/EM/thin space substitution |
| Bidirectional overrides | RTL/LTR override injection |
| Hangul fillers | Korean filler chars in non-Korean text |
| Math alphanumeric | Unicode math bold/italic substitution |
| Braille patterns | Byte-per-character braille encoding |
| Emoji substitution | Binary or multi-type emoji encoding |
| Emoji skin tone | Skin tone modifier sequences |
| Invisible separators | Catch-all for Unicode Cf format chars |
| Trailing whitespace | Snow-style space/tab encoding at line ends |
| Detector | Catches | Technique |
|---|---|---|
| Authority Fabrication | Fake journals, bogus institutions, fabricated standards | Pattern matching against known-good databases + fabrication signature regex |
| Polarization Bias | Superlative saturation, one-sided framing, missing hedging | Lexical density scoring (superlatives, hedges, sentiment markers) |
| ML Classifier | All 4 attack types + clean text | TF-IDF (word + char n-grams) + Logistic Regression, 5-class, 14.5K features |
The ML classifier detects:
- Authority fabrication: fake journal names, fabricated researcher credentials, bogus institutional affiliations
- Polarization bias: superlative-saturated marketing copy, one-sided product endorsements
- RAG poisoning: fabricated NIST standards, fake PEPs, bogus security advisories recommending insecure practices
- Few-shot poisoning: code review examples that approve SQL injection, eval(), pickle deserialization
Training data: 5,000 synthetic examples generated from attack templates (1,000 per class). The classifier runs in ~1ms with no GPU.
| Technique | Detection |
|---|---|
| CSS display:none | Style attribute regex |
| CSS visibility:hidden | Style attribute regex |
| CSS opacity:0 | Style attribute regex |
| CSS font-size:0/1px | Style + color match |
| Offscreen positioning | left/top: -9999px |
| Text-indent offscreen | text-indent: -9999px |
| Color matching (white-on-white) | Foreground color vs known backgrounds |
| HTML comments | Comment node detection |
| Hidden class names | .hidden, .d-none, .sr-only, etc. |
| Meta tag injection | Long meta content (>50 chars) |
| Aria-label injection | Oversized aria-label (>50 chars) |
| ld+json scripts | Structured data injection |
All decoded payloads are automatically scanned for prompt injection: instruction overrides, jailbreaks, identity manipulation, privilege escalation, data exfiltration, code execution, destructive commands, delimiter injection, tool/function manipulation, social engineering, and more. Multi-vector attacks (3+ categories) trigger an aggregate CRITICAL finding.
LSB chi-square/RS/sample pairs, DCT/F5 coefficient anomaly, bit-plane correlation, PVD, PNG ancillary chunks, trailing data, EXIF payload detection, palette manipulation.
WAV/PCM LSB statistical analysis, spectral anomaly detection, trailing data.
PDF (JavaScript, post-EOF, incremental updates, form fields, stream outliers), polyglot files, embedded file signatures, JSON structure scanning, code comment scanning, XML entity detection.
stegoff/
detectors/
text.py 14 Unicode steganography methods
image.py LSB, DCT, PVD, palette, EXIF, PNG
binary.py Polyglots, embedded files, steghide
audio.py WAV frequency analysis
prompt_injection.py 44 regex patterns, 17 categories
authority.py Fabricated citation detection
polarization.py Bias and framing detection
semantic_classifier.py ML 5-class classifier
llm.py Semantic steg via Anthropic API
sanitizers/
text.py Strip zero-width, homoglyphs, BIDI
html.py Strip hidden elements, comments, meta
image.py Re-encode, strip metadata
audio.py Re-encode WAV
structured.py JSON/XML structure cleaning
semantic.py LLM-based semantic sanitization
orchestrator.py Routes scans, aggregates findings
report.py Finding, ScanReport, Severity, StegMethod
guard.py @steg_guard decorator
cli.py CLI entry point
server/ FastAPI REST API + middleware
StegOFF is adversarially tested against st3gg (100+ encoding techniques) and the Agent Trap Lab (30 attack vectors across 5 categories). The test suite includes 500+ tests:
- Text steganography: 33 detection tests, 8 sanitization tests
- Prompt injection: 10 pattern tests + 38 red team payloads
- Semantic manipulation: 54 detector tests + 30 red team evasion tests
- HTML sanitizer: 30 red team tests (CSS evasion, DOM tricks, encoding)
- ML classifier: 31 robustness tests (boundary, stability, generalization)
- Pipeline integration: 63 end-to-end tests (scan_text, scan_file, report API)
- False positives: 19 tests with legitimate content (academic text, code reviews, news)
- Adversarial: 83 st3gg fixture tests (100% detection rate)
- Full attack scenarios: product review pages, security advisories, documentation
Live testing against 29 trap pages using qwen3.5:9b via Ollama:
| Attack Category | Traps | Baseline Compromised | StegOFF Defended | Defense Rate |
|---|---|---|---|---|
| Content Injection | 8 | 6-8 | 6-8 | 100% |
| Behavioral Control | 5 | 4-5 | 4-5 | 100% |
| Compositional | 7 | 2 | 2 | 100% |
| Semantic Manipulation | 5 | 1-2 | 0-1 | 50% |
| Cognitive State | 3 | 2 | 0 | 0% (known gap) |
Content injection, behavioral control, and compositional attacks are fully defended. Semantic attacks (authority priming, RAG poisoning) are detected by the ML classifier but the LLM may still repeat fabricated claims despite injected warnings. This is a fundamental limitation: the defense can flag the content, but the LLM's tendency to echo authoritative-sounding text is a model-level behavior, not a content-level one.
git clone https://github.com/SamsonCyber/stegoff
cd stegoff
pip install -e ".[dev]"
pytestfrom stegoff.detectors.semantic_classifier import train_and_save
metrics = train_and_save(n_per_class=1000, seed=42)
# Generates 5,000 synthetic examples, trains TF-IDF + LR
# Model saved to stegoff/detectors/models/semantic_clf.pklThe semantic defense layer is informed by:
- BiasDef (Wu & Saxena, 2025, arXiv 2512.00804) for polarization scoring in RAG pipelines
- Authority Bias in LLMs (Mammen et al., 2026, arXiv 2601.13433) for representation-level authority susceptibility
- MiniCheck (Tang et al., 2024, arXiv 2404.10774) for NLI-based grounded fact-checking
- SelfCheckGPT (Manakul et al., 2023, arXiv 2303.08896) for consistency-based hallucination detection
- CaMeL (Debenedetti et al., 2025, arXiv 2503.18813) for dual-LLM architecture against indirect prompt injection
- Franklin et al. (2026) agent vulnerability taxonomy
MIT