Skip to content

AI-powered document scanner that automatically detects, corrects perspective, and enhances scanned documents from photos using OpenCV

Notifications You must be signed in to change notification settings

LiteObject/doc-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Scanner

A Python-based document scanner that automatically detects document boundaries, applies perspective correction, and generates high-quality scanned images using OpenCV.

Features

  • Automatic Document Detection: Uses advanced contour detection to identify document boundaries
  • Perspective Correction: Applies four-point transformation for proper document alignment
  • Multiple Quality Options: Generates 7 different processing versions for optimal results
  • Enhanced Image Processing: Includes noise reduction, contrast enhancement, and sharpening
  • Flexible Output: Supports custom output directories with organized file structure
  • Debug Mode: Optional intermediate image saving for troubleshooting
  • Command Line Interface: Easy-to-use CLI with comprehensive options
  • Robust Fallbacks: Avoids blank outputs by using a full-frame fallback when the detected region is too small
  • RECOMMENDED Control: Choose which processed variant is saved as RECOMMENDED via --prefer
  • Profiles: Bias auto selection for tables vs text via --doc-type
  • Printer-Friendly Conversion: Remove background colors to save ink while preserving text and images

Installation

Prerequisites

  • Python 3.7 or higher
  • OpenCV (cv2)
  • NumPy

Setup

  1. Clone the repository:
git clone https://github.com/LiteObject/doc-scanner.git
cd doc-scanner
  1. Create a virtual environment (recommended):
python -m venv .venv
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate
  1. Install dependencies:
pip install opencv-python numpy

Usage

Basic Usage

python scanner.py input_image.jpg

Make Documents Printer-Friendly

Remove background colors to save printer ink while keeping text and images readable:

# Basic usage - auto-detect and remove background
python make_printable.py document.jpg

# Use aggressive background removal for heavily colored documents
python make_printable.py document.jpg --method aggressive

# Uneven lighting or shaded pages
python make_printable.py document.jpg --method adaptive

# Pages with strong colored backgrounds
python make_printable.py document.jpg --method color

# Set custom threshold for fine control
python make_printable.py document.jpg --method custom --threshold 210

# Skip enhancement for faster processing
python make_printable.py document.jpg --no-enhance

# Enable debug mode to see before/after comparison
python make_printable.py document.jpg --debug

# Specify output directory
python make_printable.py document.jpg --output ./printable_docs

Advanced Options

# Specify custom output directory
python scanner.py document.jpg --output ./my_scans

# Enable debug mode
python scanner.py document.jpg --debug

# Combine options
python scanner.py document.jpg --output ./scans --debug

# Tune detection thresholds
python scanner.py document.jpg --min-area 1500 --fallback-min-area 600 --min-area-frac 0.05

# Choose the RECOMMENDED variant
# Force clean binary (default behavior)
python scanner.py document.jpg --prefer combined

# Use enhanced grayscale as recommended
python scanner.py document.jpg --prefer grayscale

# Let the tool auto-select based on content scoring
python scanner.py document.jpg --prefer auto

# Bias auto selection for tables (crisp B/W lines) or text (smoother)
python scanner.py document.jpg --prefer auto --doc-type table
python scanner.py document.jpg --prefer auto --doc-type text

Command Line Arguments

  • input_file: Path to the input image file (required)
  • --output, -o: Custom output directory path (optional)
  • --debug: Enable debug mode to save intermediate processing images (optional)
  • --min-area: Minimum contour area (in resized-pixels) to accept as the document (default: 1000)
  • --fallback-min-area: Minimum area to allow a fallback quadrilateral if no primary match is found (default: 500)
  • --min-area-frac: If the selected quadrilateral covers less than this fraction of the resized image, use full-frame fallback (default: 0.04 = 4%)
  • --prefer: Which variant to save as RECOMMENDED. Options: combined (default), grayscale, original, otsu, adaptive-mean, adaptive-gaussian, niblack, auto (score-based)
  • --doc-type: Biases auto selection. Options: auto (default), table (favor crisp B/W and structured edges), text (favor smoother grayscale)
  • --help, -h: Show help message and usage examples

Output Structure

The scanner creates an organized directory structure with multiple quality options:

output_directory/
├── scanned_output_YYYYMMDD_HHMMSS/
│   ├── RECOMMENDED_scanned_document.jpg  # Main result
│   ├── GRAYSCALE_enhanced.jpg           # Enhanced grayscale version
│   ├── quality_comparison/              # All processing versions
│   │   ├── 00_original_perspective_corrected.jpg
│   │   ├── 01_enhanced_grayscale.jpg
│   │   ├── 02_otsu_threshold.jpg
│   │   ├── 03_adaptive_mean.jpg
│   │   ├── 04_adaptive_gaussian.jpg
│   │   ├── 05_niblack_local.jpg
│   │   ├── 06_combined_optimized.jpg
│   │   └── README.txt               # Selection guide
│   └── debug_processing/            # Debug images (if --debug enabled)
│       ├── debug_01_resized.jpg
│       ├── debug_02_gray.jpg
│       ├── debug_03_edges.jpg
│       ├── debug_04_warped.jpg
│       ├── debug_detected_contour.jpg
│       ├── debug_alt_edges_*.jpg
│       └── debug_region_info.txt    # Area stats and fallback info

Quality Processing Options

The scanner generates multiple versions using different image processing techniques:

  1. Original Perspective Corrected: Document after perspective transformation only
  2. Enhanced Grayscale: Noise reduction, contrast enhancement, and sharpening applied
  3. Otsu Threshold: Black and white using automatic threshold detection
  4. Adaptive Mean: Black and white using adaptive mean thresholding
  5. Adaptive Gaussian: Black and white using adaptive Gaussian thresholding
  6. Niblack Local: Black and white using Niblack-like local thresholding
  7. Combined Optimized: Recommended version with morphological cleanup

Image Processing Pipeline

  1. Preprocessing: Resize, convert to grayscale, apply Gaussian blur
  2. Edge Detection: Canny edge detection with multiple parameter sets
  3. Contour Detection: Find and analyze document boundaries
  4. Perspective Correction: Four-point transformation to correct document perspective
  5. Quality Enhancement: Apply denoising, CLAHE, and unsharp masking
  6. Thresholding: Multiple techniques for optimal text/background separation
  7. Post-processing: Morphological operations for cleanup

Error Handling

The scanner includes comprehensive error handling for:

  • File not found errors
  • Invalid image formats
  • Image loading failures
  • Contour detection issues
  • Perspective transformation problems
  • File I/O errors

Debug Mode

Enable debug mode with --debug to save intermediate processing images:

  • Resized input image, grayscale, and initial edges
  • Alternative edge images across multiple Canny thresholds
  • Detected contour overlay
  • Warped image (or full-frame fallback) and region stats

Full-frame fallback (anti-blank safeguard)

When the detected quadrilateral covers less than a configurable fraction of the resized image (--min-area-frac, default 4%), the scanner skips perspective warp and processes the full original frame. This prevents blank or near-blank outputs from tiny/noisy contours.

Technical Details

Dependencies

  • OpenCV (cv2): Computer vision and image processing
  • NumPy: Array operations and mathematical computations
  • argparse: Command line argument parsing
  • datetime: Timestamp generation for output folders
  • os/sys: File system operations and system interactions

Key Algorithms

  • Four-Point Transformation: Perspective correction using homography
  • Adaptive Thresholding: Multiple techniques for varying lighting conditions
  • Non-Local Means Denoising: Advanced noise reduction
  • CLAHE: Contrast Limited Adaptive Histogram Equalization
  • Morphological Operations: Image cleanup and enhancement

Troubleshooting

Common Issues

  1. "No contours found": Ensure the document has clear edges and good contrast
  2. "No rectangular contour found": Try with better lighting or clearer document boundaries
  3. Poor scan quality or harsh-looking “recommended”:
    • The scanner scores all variants and skips near-blank candidates.
    • If the recommended looks too bold/harsh, try --prefer grayscale or keep --prefer combined and adjust brightness/contrast in a viewer.
    • If results still look weak, raise --min-area-frac (e.g., 0.06–0.1) to force full-frame processing more often, or increase --min-area (e.g., 1500–3000).

Note: The console prints which variant was saved as RECOMMENDED (for example, Variant used: combined). 4. File permission errors: Ensure write permissions for the output directory

Tips for Better Results

  • Use good lighting with minimal shadows
  • Ensure the document has clear, straight edges
  • Place the document on a contrasting background
  • Keep the camera/phone steady when taking the photo
  • Avoid reflections and glare on the document surface

Scripts

scanner.py

The main document scanner that detects boundaries and applies perspective correction.

make_printable.py

Prepares documents for printing by:

  • Removing background colors to save ink
  • Preserving text and image quality
  • Creating multiple versions (color w/ white background, text-optimized, grayscale, B&W)
  • Estimating ink savings
  • Providing enhanced versions for better print quality

make_printable.py Options

  • input_file: Path to the input image file (required)
  • --output, -o: Output directory (default: ./output)
  • --method, -m: Background removal method
    • auto: Automatically determine threshold based on background
    • light: Light background removal (keeps more detail)
    • aggressive: Aggressive removal (maximum ink saving)
    • adaptive: Adaptive thresholding for uneven lighting
    • color: HSV color-based removal for colored backgrounds
    • custom: Use custom threshold value
  • --threshold, -t: Custom threshold value (0-255) for method=custom
  • --no-enhance: Skip enhancement step for faster processing
  • --debug: Save debug images including before/after comparison

Printable Output Structure

output_directory/
├── printable_YYYYMMDD_HHMMSS/
│   ├── PRINTABLE_document.jpg           # Recommended for printing
│   ├── versions/
│   │   ├── 01_document_background_removed.jpg
│   │   ├── 02_document_enhanced.jpg     # If enhancement enabled
│   │   ├── 03_document_text_optimized.jpg
│   │   ├── 04_document_black_white.jpg  # Maximum ink saving (adaptive)
│   │   └── 05_document_grayscale.jpg
│   ├── debug/                           # If --debug enabled
│   │   ├── original.jpg
│   │   ├── mask.jpg
│   │   └── before_after_comparison.jpg
│   └── README.txt                       # Usage guide

License

This project is open source. Please check the license file for specific terms.

Future Enhancements

Potential improvements for future versions:

  • Batch processing for multiple documents
  • GUI interface for easier use
  • Additional image enhancement algorithms
  • Support for different output formats (PDF, TIFF)
  • Configuration file support
  • Performance optimizations for large images

About

AI-powered document scanner that automatically detects, corrects perspective, and enhances scanned documents from photos using OpenCV

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages