PDF Processor

A comprehensive PDF processing application that combines OCR (Optical Character Recognition) and PDF unlocking capabilities. Available as both a GUI application and a Discord bot!

Features

OCR Processing: Extract text from PDF files and save it to a text file
PDF Unlocking: Remove password protection from PDF files
- Manual password entry
- Brute force password cracking (for simple passwords)
Discord Bot Integration: Process PDFs directly through Discord commands

Prerequisites

Python 3.7 or higher
Tesseract OCR engine
Poppler (for PDF to image conversion)
Required Python packages:
- pikepdf
- pdf2image
- pytesseract
- discord.py
- python-dotenv
- customtkinter

Installing Prerequisites

Windows:

Install Tesseract OCR:
- Download the installer from GitHub
- Add Tesseract to your system PATH
Install Poppler:
- Download from poppler releases
- Extract to a folder (e.g., C:\poppler-xx.xx.x)
- Add the bin folder to your system PATH

Linux:

sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install poppler-utils

macOS:

brew install tesseract
brew install poppler

Installation

Clone this repository:

git clone https://github.com/yourusername/pdf-processor.git
cd pdf-processor

Install Python dependencies:

pip install -r requirements.txt

Usage

Website Documentation

The project also includes a simple website in the docs/ directory, which serves as documentation and provides direct links:

Download App: Links to the latest executable (.exe) file available on GitHub Releases.
GitHub Page: Links to the main GitHub repository for the project.
Invite Bot: Links to invite the Discord bot to your server (requires replacing YOUR_CLIENT_ID in the URL with your bot's actual client ID).

GUI Application

Run the application:

python app.py

The application has two main tabs:

OCR Tab

Click "Browse" to select a PDF file
Choose an export directory
Click "Start OCR" to begin text extraction
The extracted text will be saved to a text file in the chosen directory

Unlock PDF Tab

Click "Browse" to select a password-protected PDF
Choose an export directory
Either:
- Enter the known password and click "Unlock with Password"
- Click "Brute Force Unlock" to attempt to crack the password

Discord Bot

Create a new Discord application and bot at Discord Developer Portal
Get your bot token
Create a .env file in the project root and add your token:

DISCORD_TOKEN=<your-bot-token-here>

Run the bot:

python bot.py

Bot Commands

pdf help - Show help message
pdf ocr - Extract text from a PDF file (attach the PDF to your message)
pdf unlock <password> - Unlock a PDF with a password (attach the PDF)
pdf bruteforce - Attempt to crack the PDF password (attach the PDF)

Notes

The brute force feature is limited to simple passwords (length 1-4 characters) by default
OCR accuracy depends on the quality of the PDF and the Tesseract installation
The Discord bot creates a temporary directory for processing files, which are automatically cleaned up
Custom Theme: The GUI application uses a custom theme defined in the themes/ directory. You can modify themes/website_theme.json to customize the application's appearance.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
build/PDFProcessor		build/PDFProcessor
docs		docs
pdf		pdf
themes		themes
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
PDFProcessor.spec		PDFProcessor.spec
README.md		README.md
app.py		app.py
bot.py		bot.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Processor

Features

Prerequisites

Installing Prerequisites

Windows:

Linux:

macOS:

Installation

Usage

Website Documentation

GUI Application

OCR Tab

Unlock PDF Tab

Discord Bot

Bot Commands

Notes

License

About

Uh oh!

Releases 1

Languages

License

CsPS0/PDF-Processor

Folders and files

Latest commit

History

Repository files navigation

PDF Processor

Features

Prerequisites

Installing Prerequisites

Windows:

Linux:

macOS:

Installation

Usage

Website Documentation

GUI Application

OCR Tab

Unlock PDF Tab

Discord Bot

Bot Commands

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages