🧠 OCR Image Text Extractor

A Python-based Optical Character Recognition (OCR) tool to extract text from images using Tesseract OCR. This project supports both PIL-based and OpenCV-based image preprocessing for improved accuracy.

🚀 Features

Extracts text from images using pytesseract
Dual preprocessing methods: PIL (Python Imaging Library) and OpenCV
Contrast enhancement, binarization, blurring, dilation/erosion for noise reduction
Supports multiple languages via Tesseract's language packs
Command-line interface for quick usage

🛠️ Requirements

Python 3.6+
Tesseract OCR (must be installed and optionally linked in your environment)

Install dependencies:

pip install -r requirements.txt

⚙️ Installation

Clone the repository:

git clone https://github.com/HappySR/ocr-image-text-extractor.git
cd ocr-image-text-extractor

Install dependencies:

pip install -r requirements.txt

Install Tesseract OCR:

Ubuntu/Debian:

sudo apt update && sudo apt install tesseract-ocr

Mac (Homebrew):

brew install tesseract

Windows: Download and install from: https://github.com/UB-Mannheim/tesseract/wiki Note the path to the tesseract.exe for use in the script.

🖼️ Usage

python ocr_script.py path/to/image.jpg

Optional arguments:

--tesseract or -t: Path to the Tesseract executable (required if not in PATH)
--lang or -l: Language for OCR (default: eng)
--no-cv2: Use PIL-based preprocessing instead of OpenCV

Example:

python ocr_script.py sample.png --lang eng --tesseract "C:/Program Files/Tesseract-OCR/tesseract.exe"

📂 Project Structure

ocr-image-text-extractor/
├── ocr_script.py           # Main script with OCRProcessor class and CLI
├── README.md               # README file
├── requirements.txt        # Required Python packages

📌 Notes

OpenCV-based preprocessing generally yields better OCR accuracy for noisy images.
Tesseract supports multiple languages, but the relevant language data must be installed.
For best results, ensure the input image has good contrast and minimal background noise.

🧪 Sample Output

--- OCR Result ---
This is a sample text extracted
from an image using OCR!
------------------

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 OCR Image Text Extractor

🚀 Features

🛠️ Requirements

Install dependencies:

⚙️ Installation

🖼️ Usage

Optional arguments:

Example:

📂 Project Structure

📌 Notes

🧪 Sample Output

📄 License

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ocr_script.py		ocr_script.py
requirements.txt		requirements.txt

License

HappySR/OCR-Image-Text-Extraction

Folders and files

Latest commit

History

Repository files navigation

🧠 OCR Image Text Extractor

🚀 Features

🛠️ Requirements

Install dependencies:

⚙️ Installation

🖼️ Usage

Optional arguments:

Example:

📂 Project Structure

📌 Notes

🧪 Sample Output

📄 License

🙌 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages