OCR_excercise

Một bài tập lớn cho môn Xử Lý Ảnh khi học tại trường HUST *.py linguist-language=python

Vietnamese Recognition and Translation window app use Tkinter

Description

The project aims to develop an application that can recognize Vietnamese and translate it into the desired language. The application will use machine learning algorithms to recognize text and then translate it using googletrans.

Getting Started

Dependencies

Python 3
OpenCV
Tkinter
googletrans
pytesseract

Installing

Clone the repository
Install dependencies using pip

or pip install -r requirements.txt

Executing program

Run the application using the command "python main.py"
Write the text in Vietnamese on the screen
The application will recognize the text and translate it into the desired language

Preprocess the image:

Resize image

def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
    # initialize the dimensions of the image to be resized and
    # grab the image size
    dim = None
    (h, w) = image.shape[:2]
    ...

Adjust contrast brightness

def adjust_contrast_brightness(img, contrast:float=1.0, brightness:int=0):
    """
    Adjusts contrast and brightness of an uint8 image.
    contrast:   (0.0,  inf) with 1.0 leaving the contrast as is
    brightness: [-255, 255] with 0 leaving the brightness as is
    """
    brightness += int(round(255*(1-contrast)/2))
    return cv2.addWeighted(img, contrast, img, 0, brightness)

Convert to grayscale

def get_grayscale(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Apply AdaptiveThreshold

--> Apply OCR on the preprocessed image using Pytesseract.

--> Extract the text output and save it to a text file.

--> Visualize the preprocessed image and the extracted text output.

--> Save the results to file.

--> Translate

User interface

Quá trình

Training:

Công cụ để train Tesseract – jTessBoxEditor.
File font chữ cần đào tạo có đuôi là ttf. Ví dụ: TimeNewRoman.ttf
File văn bản khoảng 600KB đến 1MB để học được nhiều kí tự khác nhau.
Cài Java Runtime Environment (JRE) là một lớp phần mềm cung cấp các dịch vụ cần thiết để thực thi những ứng dụng Java.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
Train-data-ocr1		Train-data-ocr1
__pycache__		__pycache__
build/IP		build/IP
dist		dist
imageicon		imageicon
.gitattributes		.gitattributes
OCR.py		OCR.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR_excercise

Vietnamese Recognition and Translation window app use Tkinter

Description

Getting Started

Dependencies

Installing

Executing program

Preprocess the image:

User interface

Quá trình

Authors

License

Acknowledgments

About

Releases

Packages

Languages

Thuylt185411/OCRExercise

Folders and files

Latest commit

History

Repository files navigation

OCR_excercise

Vietnamese Recognition and Translation window app use Tkinter

Description

Getting Started

Dependencies

Installing

Executing program

Preprocess the image:

User interface

Quá trình

Authors

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages