🕵️ PDFOrensic CLI - Documentation

A forensic command-line tool for analyzing PDF files: metadata, hidden objects, OCR, image extraction, text anomalies, etc.

🧠 Features

Full support for****file or folder input
Detectsscanned vsdigitally created PDFs
Handles**** multiple tools :
- pdfinfo,****exiftool
- pdftotext,strings,grep
- pdfimages,qpdf,mutool
- ocrmypdf /****tesseract
- xxd,file,diff
Interactive menu if no arguments are given
Command-line flags for automation
Output ZIP , CSV and structure per run

🚀 Usage

python pdforensic_cli_ultimate.py [<file_or_folder>] [options]

🔤 Options

Flag	Tool	Description
`-m`	Metadata	Extract metadata with `pdfinfo`, `exiftool`
`-t`	Text	Extract text with `pdftotext`
`-s`	Structure	Analyze PDF internal structure
`-h`	Hidden text	Find hidden layers and text
`-i`	Image extraction	Extract all embedded images
`-o`	OCR	Run OCR (Tesseract or OCRmyPDF)
`-d`	Decode streams	QPDF --qdf to unpack objects
`-a`	All	Run all supported analyses

If no arguments are passed, you'll be prompted:

For****file/folder path
For****actions to perform (A, M, T, etc.)

📁 Output Structure

Each run will generate:

Folder:****forensic_results/<filename>_<timestamp>
Inside:
- pdf_info.txt,exif.json,** text.txt, images/, **ocr_output.pdf,ocr_output.txt, etc.
- Final:summary.csv,output.zip

⚙️ Config File (JSON)

We'll support external config for tools:

{
  "metadata": {
    "flag": "m",
    "description": "Extract metadata",
    "commands": [
      "pdfinfo {input} > {output}/pdfinfo.txt",
      "exiftool -j {input} > {output}/exif.json"
    ]
  },
  "ocr": {
    "flag": "o",
    "description": "OCR scanned PDF",
    "commands": [
      "tesseract {image} {output}/ocr --dpi 300 -l heb+eng --psm 3 --oem 1"
    ]
  }
}

🧪 Examples

# Full folder run with all tools
python pdforensic_cli_ultimate.py ./src -a

# Only OCR
python pdforensic_cli_ultimate.py document.pdf -o

# Interactive
python pdforensic_cli_ultimate.py

✅ TODO

OCR integration (force + sidecar)
Menu UI with ASCII banner
Support folders and loop through files
Image pre-processing for OCR (grayscale, binarize)
Generate ZIP + CSV summary for each run
Add JSON config support
Plugin support in future (pluggable tools)
Add PDF comparison (diff)

💡 Future Ideas

Markdown export with full breakdown
Raycast / Droplet mini app
Detect anomalies (OCR vs text mismatches)
Deep fake detection via ELA + ML

👥 Contributing

Want to add a forensic trick? Add a new tool in the****tools_config.json and submit a PR 💥

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.repomix		.repomix
docs		docs
tools		tools
.DS_Store		.DS_Store
.gitignore		.gitignore
My thinking process to generate the intr.md		My thinking process to generate the intr.md
README.md		README.md
pdforensic_launcher.py		pdforensic_launcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕵️ PDFOrensic CLI - Documentation

🧠 Features

🚀 Usage

🔤 Options

📁 Output Structure

⚙️ Config File (JSON)

🧪 Examples

✅ TODO

💡 Future Ideas

👥 Contributing

About

Uh oh!

Releases

Packages

Languages

Jakobish/pdforensic_toolkit

Folders and files

Latest commit

History

Repository files navigation

🕵️ PDFOrensic CLI - Documentation

🧠 Features

🚀 Usage

🔤 Options

📁 Output Structure

⚙️ Config File (JSON)

🧪 Examples

✅ TODO

💡 Future Ideas

👥 Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages