The main aim of this project is to convert a photographed document (like a paper taken from a mobile camera) into a clean, flat, and readable scanned copy—just like how a physical scanner would do.
In short: 👉 Turn a tilted, shadowy photo of a paper into a clear top-down “scanned” PDF.
When we take a photo of a document using a mobile camera, the picture usually doesn’t look clean like a real scanned copy.
Common problems are:
- The document looks tilted or curved (not straight).
- Extra background like the table or floor also comes in the photo.
- Light or shadows make the text hard to read.
- This project uses image processing to fix all these problems automatically.
It:
Finds the edges of the paper. Straightens (unskews) the image to make it flat. Removes background and adjusts brightness. Converts it into a clear black-and-white scanned copy.
So basically, your computer acts like a virtual scanner that can turn a normal camera photo of a document into a neat, readable, and professional-looking scanned image.
Component Description Python Main programming language OpenCV For all image processing and computer vision operations NumPy For array and mathematical operations Pillow (PIL) To save the final image as a PDF Command Line Interface (CLI) You run commands in CMD to process your images
Step-by-Step Process:
- Input Image
- User provides a photo of a document (e.g., doc.jpg).
- The program reads the image using OpenCV.
- Preprocessing
- Convert the image to grayscale.
- Apply Gaussian blur to reduce noise.
- Use Canny edge detection to highlight strong edges (like the paper’s borders).
- Apply morphological operations (dilate/erode) to close gaps in edges.
- Find the Document Border
- Detect all contours (shapes) in the image.
- Choose the largest 4-corner contour—this is likely the document.
- Perspective Correction (Warping)
- The 4 corner points are used to perform a perspective transform.
- This "unskews" the paper, making it appear top-down and rectangular.
- Binarization (Black & White Effect)
- Apply adaptive thresholding to remove lighting effects.
- Output becomes crisp and clean like a scanned page.
Save the result as:
- .png – Binarized final scan
- .pdf – Printable document
- _warped_preview.jpg – Flat color preview
- _contour_debug.jpg – Green outline showing detected page
- _edged.png – Saved if detection fails (for debugging)
- cv2 - OpenCV library for all image operations.
- numpy - for mathematical and array operations.
- path - for safe file path handling.
- PIL.Image - used to save final output as PDF.
- four_point_transform - imported from four_point.py to fix perspective.