Manual PDF/Image Extractor — fixed selection & pan behavior
A small GUI tool (Tkinter + Pillow + pdf2image) for visually selecting and saving cropped regions from PDFs or images. Designed for manual, precise extraction: draw a rectangle on a page to crop, preview it, and save it as PNG/JPG.
Key features
Open a single PDF (multi-page) or an image (JPEG/PNG).
Fast page navigation (Prev / Next).
Fit-to-window and zoom in/out (buttons and mouse wheel).
Two pan modes:
Middle-mouse drag to pan.
Hold SPACE and left-drag to pan.
Left-drag (always) to draw a selection rectangle; after releasing, a preview appears.
Save the selected crop to a chosen output folder in PNG or JPG format.
Filename format when saving: page_###_obj###.png (or .jpg) — object counter increments per save.
Installation / Requirements
Make sure you have Python 3.8+ and install the Python dependencies:
pip install pillow pdf2image
pdf2image requires a PDF rasterizer (Poppler). Install Poppler:
Linux (apt): sudo apt install poppler-utils
macOS (Homebrew): brew install poppler
Windows: download Poppler binaries and add to PATH (e.g., from https://poppler.freedesktop.org/ or a maintained Windows build).
How to run
Save the script as manual_extractor.py (or your preferred name) and run:
python manual_extractor.py
Usage (quick)
Click Select PDF/Image and choose a .pdf, .jpg, .jpeg or .png.
For PDFs, set the DPI (PDF) value before loading if you want higher/lower rasterization quality (default: 200).
Click Choose Output Folder to pick where crops will be saved.
Navigate pages with Prev Page / Next Page.
Zoom:
Click Zoom In / Zoom Out,
or use the mouse wheel.
Pan:
Middle-mouse button drag, OR
Hold SPACE and left-drag.
Selection & crop:
Left-drag to draw a selection rectangle (minimum ~6 canvas pixels).
Release to preview the crop on the right sidebar.
Click Save Crop to write the image (PNG or JPG) to the output folder.
Filenames are generated like page_001_obj001.png; obj increments for each saved crop on a page.
Controls & shortcuts
Left mouse:
Drag = draw selection rectangle (default).
Hold SPACE + Left drag = pan.
Middle mouse:
Drag = pan.
Mouse wheel:
Scroll = zoom in/out (platform dependent mappings handled).
Buttons:
Zoom In, Zoom Out, Fit to Window, Prev Page, Next Page.
DPI (PDF):
Set before opening a PDF to control page rasterization resolution.
Notes & tips
Minimum selection size is enforced (MIN_SELECT_PIX = 6); adjust in-code if you need smaller selections.
When opening large PDFs or high DPI, memory usage increases — lower DPI to reduce RAM usage.
Output image quality for JPG is saved with quality=95 by default; change in code if needed.
If crops appear blurred when zoomed, try increasing the DPI for PDFs or use higher-resolution source images.
Files produced
Saved crops are named by page and object counter, for example:
page_002_obj003.png
This keeps multiple saved objects on the same page uniquely identified.