TFOD and EasyOCR for a robust OCR engine
![](https://github.com/Nnamaka/OCR_with_TFOD_and_EasyOCR/raw/main/annotating%20(1).gif)
EasyOCR is a deep learning model trained for OCR(optical character recognition). It's code base is based on the pytorch framework. The model is able to recognize 83+ languages.
Optical character recognition is the conversion of images of typed, handwritten, or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo, or subtitle text superimposed on an image. The OCR application developed here combines TFOD and EasyOCR to create a robust OCR system.
This README is a brief walkthrough of the major steps carried out to create this application. Refer to TFOD_and_EasyOCR.ipynb for the full procedures
I used the labellimg tool to label and annotate my images. My images are saved in the pascalVOC format and transformed to TFRecords to be fed into the TFOD pipeline.
if not os.path.exists(os.path.join(paths['APIMODEL_PATH'], 'research', 'object_detection')): !git clone https://github.com/tensorflow/models {paths['APIMODEL_PATH']}
After that we:
- Install TFOD
- Install Dependencies
- Run Verification Script
- Creat Label map and TFRecords
- Train and Evaluate the model
!pip install easyocr
import easyocr
scores = list(filter(lambda x: x >thresh, detections['detection_scores'])) boxes = detections['detection_boxes'][:len(scores)] classes = detections['detection_classes'][:len(scores)]
Now we loop throug the detection(s) to get our final text recognition.
Note: We need to Renormalize the detection box:The coordinates of the bounding box from the output of the TFOD pipeline needs to be renormalized in other to correspond with the original image size. This is done because the image document fed into the TFOD model was pre-processed and transformed. This reduces the image size and now the final Output bounding box coordinates now reflects the size of the pre-processed image, which is not what we want.
height, width = image_np_with_detections.shape[0], image_np_with_detections.shape[1]
for idx, box in enumerate(boxes): roi = box * [height, width, height, width] region = image_np_with_detections[int(roi[0]) : int(roi[2]), int(roi[1]) : int(roi[3])] ocr_result = reader.readtext(region) print(ocr_result)
- EasyOCR
- labelImg