Character Recognition,Translation and Summarization Model

https://adaptxt.streamlit.app/

Overview:

This project is an end-to-end Optical Character Recognition (OCR) and translation model designed to extract text from images and PDFs, and then translate it into multiple languages. The implementation is done in Python using the Tesseract OCR engine, OpenCV for image processing, Googletrans for translation, and Streamlit for the frontend.

Key Features:

OCR Processing:
- Utilizes Tesseract OCR to extract text from images and PDFs.
- Implements image preprocessing techniques, including conversion to grayscale, removal of table lines, and noise reduction, to enhance OCR accuracy.
Translation:
- Translates extracted text into five different languages (Hindi, French, Spanish, Mandarin, English) using the Googletrans API.
PDF Support:
- Supports PDF extraction, recognizing text from each page and translating it.
User Interaction:
- Allows the user to choose the target language for translation.

Frontend with Streamlit:

Implements a user-friendly interface using Streamlit for easy interaction with the OCR and translation model.
Users can upload images or PDFs, and the application displays the recognized text along with translation options.
Provides a dropdown menu for selecting the target language, enhancing user customization.
Streamlit simplifies the deployment process, making the application accessible through a web browser.

Dependencies:

Usage:

Access the OCR and translation model through the Streamlit web interface.
Upload images or PDFs using the provided file upload functionality.
The application processes the input, displays the recognized text, and allows translation into the user-selected language.

How to Run:

Ensure the required libraries are installed using:

pip install pillow
pip install pytesseract
pip install opencv-python
pip install googletrans
pip install pyPDF2
pip install streamlit
pip install nltk

Make sure Tesseract OCR is properly installed on your system.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
README.md		README.md
hindi_words.txt		hindi_words.txt
ocr.py		ocr.py
packages.txt		packages.txt
requirements.txt		requirements.txt
streamlit_script.py		streamlit_script.py
stt.py		stt.py
summary.py		summary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Character Recognition,Translation and Summarization Model

About

Releases

Packages

Contributors 2

Languages

RahulJ15/Adaptxt

Folders and files

Latest commit

History

Repository files navigation

Character Recognition,Translation and Summarization Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages