GitHub - Adeyemi0/Python-OCR: This code extracts texts from images

Overview

This project is a console-based document scanner application that uses Optical Character Recognition (OCR) to extract text and tables from images and PDF files. It leverages the PaddleOCR and PPStructure libraries for accurate text recognition and structured data extraction, making it useful for automating data entry and document analysis tasks.

Features

Text Extraction: Extracts text from images and PDF documents using OCR.
Table Recognition: Detects and extracts tables from scanned documents and images.
Image Preprocessing: Enhances image quality for better OCR performance through various preprocessing techniques.
Output Export: Saves extracted data into an Excel file for easy sharing and further analysis.

Requirements

Python 3.8
opencv-python
numpy
paddleocr
ppstructure
pdfplumber
pandas
Pillow

Installation

Clone the repository:

git clone https://github.com/Adeyemi0/Python-OCR.git
cd document-scanner

Install the required packages:

pip install opencv-python numpy paddleocr ppstructure pdfplumber pandas Pillow

Usage

python python-ocr.py

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md
packages.txt		packages.txt
python-ocr.py		python-ocr.py
requirements.txt		requirements.txt
streamlitocr.py		streamlitocr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Features

Requirements

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Adeyemi0/Python-OCR

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Requirements

Installation

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages