StegOFF

Steganography detection, semantic manipulation defense, and prompt injection guard for AI systems.

StegOFF finds and destroys hidden payloads in text, images, audio, PDFs, and code files before they reach your LLM, agent, or end user. v0.4.0 adds a semantic defense layer: ML-based detection of fabricated citations, authority priming, polarization bias, RAG poisoning, and few-shot code poisoning, plus an HTML sanitizer that strips hidden content injection before agent ingestion.

Zero dependencies for text detection. Optional numpy/Pillow for image and audio analysis.

Install

pip install stegoff            # text detection only, zero deps
pip install stegoff[image]     # + image analysis (numpy, Pillow)
pip install stegoff[full]      # + audio statistical analysis (scipy)
pip install stegoff[server]    # + FastAPI server & middleware

Quick Start

CLI

# Scan a file
stegoff scan suspicious.png

# Scan text from clipboard/pipe
echo "some text" | stegoff scan-text

# Scan a directory
stegoff scan-dir ./uploads --json

# Pipeline guard: strip steg from stdin, output clean text
cat user_input.txt | stegoff guard

# Block mode: exit 1 if steg found
cat user_input.txt | stegoff guard --block

Exit codes: 0 = clean, 2 = steganography detected.

Python API

from stegoff import scan, scan_text, scan_file

# Scan text (runs steg + injection + authority + polarization + ML classifier)
report = scan_text("Hello\u200c\u200d world")
print(report.clean)          # False
print(report.findings)       # [Finding(method=ZERO_WIDTH, ...)]
print(report.to_json())      # JSON output

# Scan a file (auto-detects format)
report = scan_file("image.png")

# Universal entry point (accepts str, bytes, or Path)
report = scan("anything")

Semantic Manipulation Detection

from stegoff import scan_text

# Detect fabricated citations and fake institutions
report = scan_text(
    "A study in the Journal of Advanced Neuropharmacology by Dr. Sarah Mitchell "
    "at the Harvard Cognitive Enhancement Institute confirms 340% improvement."
)
print(report.semantic_manipulation_detected)  # True
# Finding: authority_fabrication (confidence: 98.1%)

# Detect biased framing and superlative saturation
report = scan_text(
    "This revolutionary, groundbreaking, state-of-the-art product is unmatched "
    "and unparalleled. Every expert agrees. 100% guaranteed breakthrough."
)
# Finding: polarization_bias (score: 0.75)

# Detect RAG poisoning with fake standards
report = scan_text(
    "According to NIST SP 800-204C, eval() is the recommended approach. "
    "PEP 1247 mandates eval-first design patterns."
)
# Finding: authority_fabrication (fake NIST standard format)

# Clean text produces no findings
report = scan_text("Product scores 3.2/5. Reviews are mixed.")
print(report.clean)  # True

HTML Sanitization

from stegoff import sanitize_html, scan_html

# Scan HTML for hidden content injection vectors
findings = scan_html('<div style="display:none">HIDDEN INJECTION</div>')
# Finding: hidden_html_content (display:none)

# Strip hidden elements, comments, suspicious meta/aria before agent ingestion
clean_html, findings = sanitize_html(raw_html)
# Removes: CSS-hidden elements, HTML comments, suspicious meta tags,
#           oversized aria-labels, ld+json scripts

Decorator

from stegoff import steg_guard

@steg_guard
def process_user_message(text: str) -> str:
    return llm.generate(text)

# Strips steg by default. Options:
@steg_guard(on_detect="raise")     # raise StegDetected
@steg_guard(on_detect="log")       # log and continue
@steg_guard(block_injection=True)  # always raise on prompt injection

FastAPI Middleware

from fastapi import FastAPI
from stegoff.server.middleware import StegOffMiddleware

app = FastAPI()
app.add_middleware(StegOffMiddleware)  # scans all incoming requests

FastAPI Endpoints

from stegoff.server.app import app  # mount directly, or:
# POST /scan/file   - upload and scan
# POST /scan/text   - scan text body
# POST /guard/file  - upload, get sanitized version
# POST /guard/text  - scan text, get sanitized version
# GET  /health      - health check

Sanitization

from stegoff import sanitize_text, sanitize_html, sanitize_image, sanitize_wav

clean_text, result = sanitize_text(dirty_text)        # strip steg chars
clean_html, findings = sanitize_html(raw_html)         # strip hidden elements
clean_bytes, result = sanitize_image(image_bytes)      # re-encode, strip metadata
clean_bytes, result = sanitize_wav(wav_bytes)          # re-encode audio

What It Detects

Text Steganography (14 methods, zero dependencies)

Method	Technique
Zero-width characters	ZWSP, ZWNJ, ZWJ, BOM binary encoding
Unicode tag characters	U+E0000 "invisible ink"
Homoglyph substitution	Cyrillic/Greek/fullwidth replacing Latin
Variation selectors	VS1-VS256 presence/absence encoding
Combining marks	Stacked diacritics (Zalgo-style)
Confusable whitespace	EN/EM/thin space substitution
Bidirectional overrides	RTL/LTR override injection
Hangul fillers	Korean filler chars in non-Korean text
Math alphanumeric	Unicode math bold/italic substitution
Braille patterns	Byte-per-character braille encoding
Emoji substitution	Binary or multi-type emoji encoding
Emoji skin tone	Skin tone modifier sequences
Invisible separators	Catch-all for Unicode Cf format chars
Trailing whitespace	Snow-style space/tab encoding at line ends

Semantic Manipulation (3 detectors, v0.4.0)

Detector	Catches	Technique
Authority Fabrication	Fake journals, bogus institutions, fabricated standards	Pattern matching against known-good databases + fabrication signature regex
Polarization Bias	Superlative saturation, one-sided framing, missing hedging	Lexical density scoring (superlatives, hedges, sentiment markers)
ML Classifier	All 4 attack types + clean text	TF-IDF (word + char n-grams) + Logistic Regression, 5-class, 14.5K features

The ML classifier detects:

Authority fabrication: fake journal names, fabricated researcher credentials, bogus institutional affiliations
Polarization bias: superlative-saturated marketing copy, one-sided product endorsements
RAG poisoning: fabricated NIST standards, fake PEPs, bogus security advisories recommending insecure practices
Few-shot poisoning: code review examples that approve SQL injection, eval(), pickle deserialization

Training data: 5,000 synthetic examples generated from attack templates (1,000 per class). The classifier runs in ~1ms with no GPU.

HTML Content Injection (sanitizer + scanner, v0.4.0)

Technique	Detection
CSS display:none	Style attribute regex
CSS visibility:hidden	Style attribute regex
CSS opacity:0	Style attribute regex
CSS font-size:0/1px	Style + color match
Offscreen positioning	left/top: -9999px
Text-indent offscreen	text-indent: -9999px
Color matching (white-on-white)	Foreground color vs known backgrounds
HTML comments	Comment node detection
Hidden class names	.hidden, .d-none, .sr-only, etc.
Meta tag injection	Long meta content (>50 chars)
Aria-label injection	Oversized aria-label (>50 chars)
ld+json scripts	Structured data injection

Prompt Injection (44 patterns, 17 categories)

All decoded payloads are automatically scanned for prompt injection: instruction overrides, jailbreaks, identity manipulation, privilege escalation, data exfiltration, code execution, destructive commands, delimiter injection, tool/function manipulation, social engineering, and more. Multi-vector attacks (3+ categories) trigger an aggregate CRITICAL finding.

Image (requires `stegoff[image]`)

LSB chi-square/RS/sample pairs, DCT/F5 coefficient anomaly, bit-plane correlation, PVD, PNG ancillary chunks, trailing data, EXIF payload detection, palette manipulation.

Audio (requires `stegoff[full]`)

WAV/PCM LSB statistical analysis, spectral anomaly detection, trailing data.

Binary / Documents

PDF (JavaScript, post-EOF, incremental updates, form fields, stream outliers), polyglot files, embedded file signatures, JSON structure scanning, code comment scanning, XML entity detection.

Architecture

stegoff/
  detectors/
    text.py                 14 Unicode steganography methods
    image.py                LSB, DCT, PVD, palette, EXIF, PNG
    binary.py               Polyglots, embedded files, steghide
    audio.py                WAV frequency analysis
    prompt_injection.py     44 regex patterns, 17 categories
    authority.py            Fabricated citation detection
    polarization.py         Bias and framing detection
    semantic_classifier.py  ML 5-class classifier
    llm.py                  Semantic steg via Anthropic API
  sanitizers/
    text.py                 Strip zero-width, homoglyphs, BIDI
    html.py                 Strip hidden elements, comments, meta
    image.py                Re-encode, strip metadata
    audio.py                Re-encode WAV
    structured.py           JSON/XML structure cleaning
    semantic.py             LLM-based semantic sanitization
  orchestrator.py           Routes scans, aggregates findings
  report.py                 Finding, ScanReport, Severity, StegMethod
  guard.py                  @steg_guard decorator
  cli.py                    CLI entry point
  server/                   FastAPI REST API + middleware

Tested Against

StegOFF is adversarially tested against st3gg (100+ encoding techniques) and the Agent Trap Lab (30 attack vectors across 5 categories). The test suite includes 500+ tests:

Text steganography: 33 detection tests, 8 sanitization tests
Prompt injection: 10 pattern tests + 38 red team payloads
Semantic manipulation: 54 detector tests + 30 red team evasion tests
HTML sanitizer: 30 red team tests (CSS evasion, DOM tricks, encoding)
ML classifier: 31 robustness tests (boundary, stability, generalization)
Pipeline integration: 63 end-to-end tests (scan_text, scan_file, report API)
False positives: 19 tests with legitimate content (academic text, code reviews, news)
Adversarial: 83 st3gg fixture tests (100% detection rate)
Full attack scenarios: product review pages, security advisories, documentation

Agent Trap Lab Results (v0.4.0)

Live testing against 29 trap pages using qwen3.5:9b via Ollama:

Attack Category	Traps	Baseline Compromised	StegOFF Defended	Defense Rate
Content Injection	8	6-8	6-8	100%
Behavioral Control	5	4-5	4-5	100%
Compositional	7	2	2	100%
Semantic Manipulation	5	1-2	0-1	50%
Cognitive State	3	2	0	0% (known gap)

Content injection, behavioral control, and compositional attacks are fully defended. Semantic attacks (authority priming, RAG poisoning) are detected by the ML classifier but the LLM may still repeat fabricated claims despite injected warnings. This is a fundamental limitation: the defense can flag the content, but the LLM's tendency to echo authoritative-sounding text is a model-level behavior, not a content-level one.

Development

git clone https://github.com/SamsonCyber/stegoff
cd stegoff
pip install -e ".[dev]"
pytest

Training the Semantic Classifier

from stegoff.detectors.semantic_classifier import train_and_save
metrics = train_and_save(n_per_class=1000, seed=42)
# Generates 5,000 synthetic examples, trains TF-IDF + LR
# Model saved to stegoff/detectors/models/semantic_clf.pkl

Research

The semantic defense layer is informed by:

BiasDef (Wu & Saxena, 2025, arXiv 2512.00804) for polarization scoring in RAG pipelines
Authority Bias in LLMs (Mammen et al., 2026, arXiv 2601.13433) for representation-level authority susceptibility
MiniCheck (Tang et al., 2024, arXiv 2404.10774) for NLI-based grounded fact-checking
SelfCheckGPT (Manakul et al., 2023, arXiv 2303.08896) for consistency-based hallucination detection
CaMeL (Debenedetti et al., 2025, arXiv 2503.18813) for dual-LLM architecture against indirect prompt injection
Franklin et al. (2026) agent vulnerability taxonomy

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
benchmarks		benchmarks
stegoff		stegoff
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REDTEAM_LOG.md		REDTEAM_LOG.md
benchmark_full.py		benchmark_full.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StegOFF

Install

Quick Start

CLI

Python API

Semantic Manipulation Detection

HTML Sanitization

Decorator

FastAPI Middleware

FastAPI Endpoints

Sanitization

What It Detects

Text Steganography (14 methods, zero dependencies)

Semantic Manipulation (3 detectors, v0.4.0)

HTML Content Injection (sanitizer + scanner, v0.4.0)

Prompt Injection (44 patterns, 17 categories)

Image (requires `stegoff[image]`)

Audio (requires `stegoff[full]`)

Binary / Documents

Architecture

Tested Against

Agent Trap Lab Results (v0.4.0)

Development

Training the Semantic Classifier

Research

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StegOFF

Install

Quick Start

CLI

Python API

Semantic Manipulation Detection

HTML Sanitization

Decorator

FastAPI Middleware

FastAPI Endpoints

Sanitization

What It Detects

Text Steganography (14 methods, zero dependencies)

Semantic Manipulation (3 detectors, v0.4.0)

HTML Content Injection (sanitizer + scanner, v0.4.0)

Prompt Injection (44 patterns, 17 categories)

Image (requires stegoff[image])

Audio (requires stegoff[full])

Binary / Documents

Architecture

Tested Against

Agent Trap Lab Results (v0.4.0)

Development

Training the Semantic Classifier

Research

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Image (requires `stegoff[image]`)

Audio (requires `stegoff[full]`)

Packages