OCR in reverse. Make your documents worse.
A Python toolkit that generates photorealistic smartphone photos of logistics documents
with coffee stains, folds, blur, skew — and verified ground truth for every field.
penquify.com · Docs · GitHub
From Chilean slang "penca" (lousy, worse) — because your document photos should look realistically bad, not studio-perfect.
ERP purchase order penquify generates penquify generates
(or any JSON/PDF) ──► dispatch guide PDF ──► realistic photos
with supplier jargon, with verified
unit mismatches, ground truth +
realistic discrepancies occlusion manifest
You don't build the PDF. You don't design the document. You give penquify an OC number, a JSON payload, or upload an existing PDF — and it:
- Generates a realistic document with supplier-style names (not your ERP master data names), realistic unit mismatches (CJ vs KG, UN vs L), and configurable quantity discrepancies
- Renders a clean PDF from Jinja2 templates (dispatch guides, invoices, POs, BOLs)
- Produces N photorealistic photos — each a different failure mode (blur, fold, stain, crop, angle)
- Verifies every field by blind-extracting from the photo and comparing programmatically against source data
- Generates an occlusion manifest explaining which fields are hidden in each variation and why
8 built-in presets + infinite custom via JSON or natural language.
# From scratch — penquify generates the document AND the photos
penquify demo
# From an existing PDF — penquify detects the schema and generates variations
penquify upload --image existing_invoice.pdf
# From a description — no JSON needed
penquify config --text "folded paper with grease, shot on old Motorola"pip install penquify
# or from source
git clone https://github.com/MAXMARDONES/penquify.git
cd penquify && pip install -e ".[all]"
# browser engine for HTML → PDF rendering
playwright install chromiumexport GEMINI_API_KEY="your-key" # required for photo generation
export PENQUIFY_OUTPUT="./output" # where files go (default: ~/penquify-output)# Full demo: PDF + 8 photo variations
penquify demo
# PDF only from JSON
penquify pdf --doc-json invoice.json
# Photos from any document image
penquify photos --image scan.png --presets full_picture blurry coffee_stain
# Full dataset: 10 documents x 3 variations each
penquify dataset --doc-json docs.json --presets full_picture folded_skewed blurryfrom penquify.models import Document, DocHeader, DocItem, PhotoVariation, Stain
from penquify.generators.pdf import generate_document_files
from penquify.generators.photo import generate_dataset
doc = Document(
header=DocHeader(doc_type="guia_despacho", doc_number="00847291", date="16/04/2026",
emitter_name="ACME FOODS LTDA.", oc_number="4500000316"),
items=[
DocItem(pos=1, code="AF-001", description="FROZEN POTATO WEDGES",
qty=12, unit="CJ", unit_price=15000, total=180000),
],
)
files = await generate_document_files(doc, "output/")
photos = await generate_dataset(files["png"], preset_names=["full_picture", "blurry"])| Template | Description | Status |
|---|---|---|
guia_despacho |
Chilean dispatch guide (guia de despacho electronica) | Done |
factura_sii |
Chilean tax invoice (DTE tipo 33, SII XML) | Planned |
purchase_order |
Standard purchase order | Planned |
bill_of_lading |
Transport bill of lading (BOL) | Planned |
nota_credito |
Credit note (DTE tipo 61) | Planned |
remito |
Argentine dispatch note | Planned |
Templates are Jinja2 HTML — add your own:
penquify pdf --template my_template.html --doc-json data.jsonA fixed system instruction handles base realism (paper physics, camera behavior, operational context). The variation config controls specifics. Every field is optional — override only what you need.
| Preset | What it tests |
|---|---|
full_picture |
Baseline: clean handheld shot, 90% frame coverage |
folded_skewed |
Geometric distortion: dog-ear, crease, 6deg tilt |
zoomed_detail |
Close-up OCR: tight crop, oblique 25-30deg |
blurry |
Motion blur: rushed capture, partial legibility |
cropped_header |
Missing data: top 10-15% cut off |
strong_oblique |
Extreme angle: 45deg, strong curvature |
coffee_stain |
Contamination: stain over text |
stapled_stack |
Multi-page: stapled with sheets behind |
{
"name": "my_variation",
"camera": "Samsung Galaxy S8",
"year_device_style": "2017 Android",
"aspect_ratio": "4:3",
"document_coverage": "90% of frame",
"background": "blurred warehouse at edges",
"curvature": "slight",
"folds": "dog_ear",
"wrinkles": "medium",
"angle": "45 degree oblique",
"skew": "strong",
"rotation_degrees": 8,
"motion_blur": true,
"glare": "strong",
"shadow_from_hand": true,
"jpeg_compression": "heavy",
"hand_visible": true,
"grip_type": "both hands",
"glove": "warehouse glove",
"stain": {"type": "coffee", "location": "upper_right", "opacity": "heavy", "text_obstruction": "partial"},
"cropped_header": true,
"stapled": true,
"stacked_sheets_behind": 2
}Every string field is free text — cameras, angles, backgrounds, grip types. Use presets or write whatever describes your scenario.
galaxy_s7 galaxy_s8 galaxy_a5_2017 moto_g5 iphone_7 iphone_8 pixel_2 huawei_p10 xiaomi_note4 galaxy_s9 iphone_xr galaxy_a10 galaxy_a50 iphone_11 galaxy_a21s iphone_12 pixel_4a galaxy_a13 iphone_14 pixel_7 warehouse_generic field_worker
Or any free text: PhotoVariation(camera="Nokia 3310 with cracked screen")
Don't know the schema? Just describe it:
from penquify.generators.config import text_to_variation
config = await text_to_variation(
"blurry photo with coffee stain, strong angle, old Samsung, paper folded in half"
)
# → returns valid PhotoVariation JSONuvicorn penquify.api.server:app --port 8080| Method | Path | Description |
|---|---|---|
POST |
/generate/document |
Document JSON → PDF + PNG |
POST |
/generate/photos |
Image → realistic photos |
POST |
/generate/dataset |
Document → PDF → photos (full pipeline) |
POST |
/generate/config |
Natural language → variation JSON |
GET |
/documents |
List generated runs |
GET |
/documents/{id}/{file} |
Download file |
GET |
/presets |
Photo presets |
GET |
/templates |
Document templates |
5 tools for Claude Desktop, Cursor, Windsurf, or any MCP client:
{
"mcpServers": {
"penquify": {
"command": "python3",
"args": ["-m", "penquify.mcp"],
"env": {"GEMINI_API_KEY": "your-key"}
}
}
}Tools: penquify_generate_document penquify_generate_photos penquify_generate_dataset penquify_text_to_config penquify_list_presets
/penquify # Full reference: presets, cameras, variation schema
/generate # Generate a document from description or JSON
/dataset # Generate large synthetic datasets
/add-template # Add a new document templatefrom penquify.agent_plugin import penquify_tools
agent = Agent(model="claude-sonnet-4-6", tools=penquify_tools)docker build -t penquify .
docker run -p 8080:8080 -e GEMINI_API_KEY=xxx penquifyGEMINI_API_KEY=xxx docker-compose upkubectl apply -f k8s/secret.yaml # set GEMINI_API_KEY first
kubectl apply -f k8s/deployment.yamlpenquify/
templates/ Jinja2 HTML per doc type
generators/
pdf.py HTML → PDF/PNG (Playwright)
photo.py PNG → realistic photo (Gemini image gen)
config.py text → variation JSON (Gemini text)
models/
document.py DocHeader + DocItem + Document
variation.py PhotoVariation + Stain + 8 presets
cameras.py 22 camera presets + free text
api/server.py FastAPI REST
mcp.py MCP server (5 tools)
agent_plugin.py Agent SDK plugin
storage/s3.py AWS S3 upload
cli.py CLI entry point
- Jinja2 templates + Playwright PDF/PNG
- Gemini photo gen with system instruction + variation config
- 8 photo presets + 22 camera presets
- CLI (
penquify demo/pdf/photos/dataset) - FastAPI REST server (8 endpoints)
- MCP server (5 tools)
- Agent SDK plugin
- Claude Code skills (4 commands)
- Natural language → variation JSON (Gemini)
- S3 upload support
- Dockerfile + docker-compose + K8s manifests
- GitHub Actions CI
- CODE_OF_CONDUCT + CONTRIBUTING + LICENSE
- PostgreSQL persistent storage
- PostgREST auto-API
- More templates: factura SII, PO, BOL
- SII DTE XML generation
- Batch dataset generation with progress bar
- PyPI publish
- Demo images in README
MIT
penquify.com | Docs | GitHub
Built by Max Mardones






