Skip to content

Nnamaka/OCR_with_TFOD_and_EasyOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR_with_TFOD_and_EasyOCR

TFOD and EasyOCR for a robust OCR engine

EasyOCR

EasyOCR is a deep learning model trained for OCR(optical character recognition). It's code base is based on the pytorch framework. The model is able to recognize 83+ languages.

Introduction

Optical character recognition is the conversion of images of typed, handwritten, or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo, or subtitle text superimposed on an image. The OCR application developed here combines TFOD and EasyOCR to create a robust OCR system.

This README is a brief walkthrough of the major steps carried out to create this application. Refer to TFOD_and_EasyOCR.ipynb for the full procedures

I used the labellimg tool to label and annotate my images. My images are saved in the pascalVOC format and transformed to TFRecords to be fed into the TFOD pipeline.

Steps

step 1 - Download the TFOD repo and requirments

if not os.path.exists(os.path.join(paths['APIMODEL_PATH'], 'research', 'object_detection')):
    !git clone https://github.com/tensorflow/models {paths['APIMODEL_PATH']}

After that we:

  • Install TFOD
  • Install Dependencies
  • Run Verification Script
  • Creat Label map and TFRecords
  • Train and Evaluate the model

step 2 - Install EasyOCR and Import it to our enviroment

!pip install easyocr
import easyocr

Step 3 - Filter the detections from our TFOD model

scores = list(filter(lambda x: x >thresh, detections['detection_scores']))
boxes = detections['detection_boxes'][:len(scores)]
classes = detections['detection_classes'][:len(scores)]

step 4 - Make inference on the OCR Model

Now we loop throug the detection(s) to get our final text recognition.

Note: We need to Renormalize the detection box:

The coordinates of the bounding box from the output of the TFOD pipeline needs to be renormalized in other to correspond with the original image size. This is done because the image document fed into the TFOD model was pre-processed and transformed. This reduces the image size and now the final Output bounding box coordinates now reflects the size of the pre-processed image, which is not what we want.

height, width = image_np_with_detections.shape[0], image_np_with_detections.shape[1]
for idx, box in enumerate(boxes):
  roi = box * [height, width, height, width]
  region = image_np_with_detections[int(roi[0]) : int(roi[2]), int(roi[1]) : int(roi[3])]
  ocr_result = reader.readtext(region)
  print(ocr_result)

Credits

  • EasyOCR
  • labelImg

Tweet me at Dike Nnamaka

About

TFOD and EasyOCR for a robust OCR engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published