Skip to content

OrderExtractor is a smart Python tool that automatically processes images of invoices, order confirmations, or receipts. It uses OCR to extract the Order Number and Order/Creation Date from each image with high accuracy. The program then organizes the extracted data into a clean list and saves it as a CSV file for easy management and use in Excel.

License

Notifications You must be signed in to change notification settings

Next-GenDeveloper/SmartOrderReader

Repository files navigation

SmartOrderReader

SmartOrderReader is a smart Python tool that automatically processes images of invoices, order confirmations, or receipts. It uses OCR (Optical Character Recognition) to extract the Order Number and Order/Creation Date from each image with high accuracy. The program then organizes the extracted data into a clean table and saves it as a CSV file for easy management and use in Excel or Google Sheets.

Features

  • ✅ Extract Order Number/Invoice Number/Order ID from images
  • ✅ Extract Order Date/Invoice Date/Creation Date from images
  • ✅ Support for multiple image formats (JPG, PNG, BMP, TIFF)
  • ✅ Process multiple images in a single run
  • ✅ Support for both English and Urdu text
  • ✅ Clean table output with professional formatting
  • ✅ CSV export for easy Excel/Google Sheets integration
  • ✅ Automatic image preprocessing for better OCR accuracy
  • ✅ Handles various date formats (MM/DD/YYYY, YYYY-MM-DD, Month DD, YYYY)
  • ✅ Smart pattern matching for different invoice formats

Prerequisites

  • Python 3.7 or higher
  • Tesseract OCR engine

Installing Tesseract OCR

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install tesseract-ocr
# For Urdu support (optional):
sudo apt-get install tesseract-ocr-urd

macOS:

brew install tesseract
# For Urdu support (optional):
brew install tesseract-lang

Windows:

Installation

  1. Clone the repository:
git clone https://github.com/Next-GenDeveloper/SmartOrderReader.git
cd SmartOrderReader
  1. Install Python dependencies:
pip install -r requirements.txt

Usage

Basic Usage

Process one or more images:

python order_reader.py invoice1.jpg invoice2.png receipt.jpg

Advanced Usage

Specify custom output file:

python order_reader.py *.jpg --output my_orders.csv

Process images with Urdu text support:

python order_reader.py invoice.png --lang eng+urd

Process all images in current directory:

python order_reader.py *.jpg *.png

Command-Line Options

usage: order_reader.py [-h] [-o OUTPUT] [-l LANG] images [images ...]

SmartOrderReader - Extract order data from invoice/order images

positional arguments:
  images                Image files to process (supports jpg, png, jpeg, bmp, tiff)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output CSV filename (default: order_data.csv)
  -l LANG, --lang LANG  OCR language (default: eng, use eng+urd for English+Urdu)

Output Format

The tool provides two types of output:

1. Table Format (Console)

================================================================================
EXTRACTED ORDER DATA
================================================================================

+-------------------+----------------+------------------+
| Image File Name   | Order Number   | Order Date       |
+===================+================+==================+
| invoice1.jpg      | ABC12345       | 12/25/2023       |
+-------------------+----------------+------------------+
| invoice2.png      | INV-2024-001   | January 15, 2024 |
+-------------------+----------------+------------------+

2. CSV Format

Image File Name,Order Number,Order Date
invoice1.jpg,ABC12345,12/25/2023
invoice2.png,INV-2024-001,January 15, 2024

The CSV file is automatically saved and can be directly imported into Excel or Google Sheets.

Supported Patterns

Order Number Patterns

  • Order #, Order No., Order Number
  • Invoice #, Invoice No., Invoice Number
  • Order ID
  • PO # (Purchase Order)
  • Reference #, Ref #

Date Patterns

  • Order Date, Invoice Date
  • Date, Issue Date
  • Created On, Transaction Date
  • Creation Date

Date Formats

  • MM/DD/YYYY or DD/MM/YYYY (e.g., 12/25/2023)
  • YYYY-MM-DD (e.g., 2023-12-25)
  • Month DD, YYYY (e.g., December 25, 2023)

How It Works

  1. Image Preprocessing: Images are converted to grayscale and optimized for better OCR accuracy
  2. OCR Text Extraction: Tesseract OCR extracts all text from the image
  3. Pattern Matching: Smart regex patterns identify order numbers and dates
  4. Data Extraction: Most relevant information is extracted based on priority
  5. Output Generation: Results are formatted as table and CSV

Troubleshooting

"pytesseract.TesseractNotFoundError"

  • Make sure Tesseract OCR is installed and in your system PATH
  • On Windows, you may need to set the path manually in the script

No text extracted from image

  • Ensure the image is clear and readable
  • Try increasing image resolution
  • Check if the text is in a supported language

Wrong data extracted

  • Make sure the image quality is good
  • Verify that order number/date labels are clearly visible
  • Try different image preprocessing settings

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Created by Next-GenDeveloper

Acknowledgments

  • Tesseract OCR for text recognition
  • OpenCV for image preprocessing
  • Python community for excellent libraries

About

OrderExtractor is a smart Python tool that automatically processes images of invoices, order confirmations, or receipts. It uses OCR to extract the Order Number and Order/Creation Date from each image with high accuracy. The program then organizes the extracted data into a clean list and saves it as a CSV file for easy management and use in Excel.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages