📄 Doc Split - PDF Document Splitter

💝 Support the Project

If Doc Split saves you time and makes your work easier, consider supporting its development!

Your support helps me add new features, fix bugs, and keep the project alive! 🙏

📄 Doc Split - PDF Document Splitter

A powerful desktop application for splitting PDF documents and extracting data with customizable criteria. Perfect for processing student records, certificates, invoices, or any multi-page PDF documents.

✨ Features

Tab 1: Split by ID / Multi-Criteria

Smart Grouping - Groups pages with same ID even if they're scattered throughout the document
Multi-Criteria Grouping - Group by any extracted field (not just ID)
CSV Export - Extract data to CSV with page information
CSV-Only Mode - Extract data without generating PDF files
No Grouping Mode - Export each page as a separate CSV row
Custom Extraction - Configure unlimited extraction criteria with regex patterns
Stop Text - Stop reading at specific phrases for multi-line values
Flexible Naming - Use up to 4 extracted fields for output filenames

Tab 2: Split by Name

CSV Mapping - Match names from PDF to IDs from a CSV file
Smart Matching - Direct, case-insensitive, and partial name matching
Batch Processing - Process entire documents page by page

Tab 3: Split by Page Range

Multiple Ranges - Extract multiple page ranges at once (e.g., 1-5,10-15,20)
Flexible Syntax - Supports single pages, ranges, or mixed (1,3-5,10)
CSV Logging - Track which pages were extracted from which ranges

Tab 4: PDF Merger

Merge Multiple PDFs - Combine several PDF files into one
Page Range Selection - Include only specific pages from each file
Reorder Files - Control merge order with up/down buttons
Table of Contents - Add an automatic TOC page (requires reportlab)
Bookmarks - Create outline bookmarks for each merged file

General Features

🎨 Modern Flat UI - Clean Windows 10-style interface
💾 Auto-Save Settings - All preferences persist between sessions
📊 CSV Output - Extracted data saved with full metadata
🔄 Multi-threaded - Responsive UI during processing
🌙 Light Theme Only - Clean, professional appearance
📝 Detailed Logging - See exactly what's happening
🚀 100% Offline - No data leaves your computer

🚀 Quick Start Example 1: Extract data from invoices Open Tab 1

Click "Browse" to select your PDF

Configure criteria:

Add "Invoice Number" with prefix "Invoice #:"

Add "Customer Name" with prefix "Customer:"

Add "Amount" with prefix "Total:"

Select which fields to include in filenames

Click "SPLIT BY ID"

Example 2: Merge multiple reports Go to Tab 4 (PDF Merger)

Click "Add PDF(s)" and select your files

Double-click any file to set page ranges

Use Move Up/Down to arrange order

Choose output folder and filename

Click "MERGE PDFS"

📖 Detailed Usage Tab 1: Split by ID / Multi-Criteria Setting Up Extraction Criteria Add Criterion: Go to Criteria → Tab 1: Add Criterion

Configure:

Display Name: What to call this field (e.g., "Document ID")

Text Prefix: The exact text before your value (e.g., "ID Number:")

Stop Text: (Optional) Stop reading when this text appears

Data Type: "ID" for grouping, "Text" for regular fields

Grouping Options Single Criterion (ID) : Groups pages by the ID field

Multiple Criteria (Custom) : Groups pages by any field you select

Export Modes Grouped by Key : Pages with same key become one PDF/CSV row

Each Page Separately : Every page becomes its own CSV row (no PDFs)

CSV-Only Mode : Extract data without creating PDF files

Tab 2: Split by Name Workflow First process documents in Tab 1 to create a CSV file

Switch to Tab 2 and select your PDF

Load the CSV mapping file (auto-detected from Tab 1)

Configure extraction criteria (matches the PDF structure)

Click split - each page becomes a PDF named after matched IDs

Matching Logic Direct match: "John Smith" → "John Smith"

Case-insensitive: "JOHN SMITH" → "John Smith"

Partial match: "J. Smith" → "John Smith" (if close)

Tab 3: Split by Page Range Syntax Examples 5-10 - Pages 5 through 10

1,3,5 - Pages 1, 3, and 5

1-5,10-15,20 - Multiple ranges (max 20)

Tab 4: PDF Merger Page Range Filters Leave blank: Include all pages

1-5: Include only pages 1-5

3,7,10: Include specific pages

1-5,8,10-15: Mixed ranges

Features Table of Contents: Adds a TOC page at the beginning

Bookmarks: Creates clickable outline entries for navigation

Live Preview: See file order and page counts

⚙️ Configuration Settings Saved Automatically Extraction criteria (separate for Tab 1 & 2)

File naming rules (selected fields, separator, suffix)

CSV output settings

Grouping preferences

Export mode selections

Window size and position

Menu Options Menu Action File Exit application Criteria Add/remove/reset extraction rules for Tab 1 and 2 View Restore informational tips Settings Customize button text, rename tabs, save preferences Help Quick start guide and about information 💡 Tips & Tricks Optimizing Extraction Use Stop Text for multi-line values:

text Description: This is a long description that spans multiple lines on the day of the event

Set Stop Text to "on the day of" to capture only the description Test with CSV-Only Mode before generating PDFs

Use No Grouping Mode to quickly audit all pages

Check Page Numbers in CSV when debugging grouping issues

Best Practices Always test with a small PDF first

Save criteria as they're auto-saved between sessions

Use descriptive names for extraction criteria

Keep CSV files for future reference and Tab 2 processing

Back up settings by saving the .doc_split_settings.json file

🔧 Troubleshooting Common Issues Issue Solution No IDs found Check prefix matches exactly (spaces, colons) Pages not grouping Ensure ID field type is set to "ID" CSV not loading Check column names contain "name" or "id" Merger TOC missing Install reportlab: pip install reportlab Large file slow Use CSV-only mode for data extraction Debugging Tools Debug Tab 1 Criteria button - Shows current configuration

Debug PDF Text button - Displays raw text from first pages

Debug CSV button - Analyzes CSV structure for mapping

Processing Log - Real-time operation details

Getting Help Check the Quick Start Guide (Help menu)

Review the processing log for error details

Use debug buttons to verify configurations

Test with a small sample PDF first

📄 License This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments Built with PyPDF2 for PDF manipulation

Uses PyMuPDF (fitz) for text extraction

UI designed with Python's tkinter

```

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
pdf_env		pdf_env
stuff		stuff
.gitignore		.gitignore
EULA.txt		EULA.txt
LICENSE		LICENSE
LICENSE.txt		LICENSE.txt
README.md		README.md
build_docsplit.py		build_docsplit.py
docsplit - Copy.py:Zone.Identifier		docsplit - Copy.py:Zone.Identifier
docu_split.py		docu_split.py
docu_split.py:Zone.Identifier		docu_split.py:Zone.Identifier
docu_split_public.py		docu_split_public.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💝 Support the Project

📄 Doc Split - PDF Document Splitter

✨ Features

Tab 1: Split by ID / Multi-Criteria

Tab 2: Split by Name

Tab 3: Split by Page Range

Tab 4: PDF Merger

General Features

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💝 Support the Project

📄 Doc Split - PDF Document Splitter

✨ Features

Tab 1: Split by ID / Multi-Criteria

Tab 2: Split by Name

Tab 3: Split by Page Range

Tab 4: PDF Merger

General Features

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages