<a href="https://colab.research.google.com/github/Nnamaka/OCR_with_TFOD_and_EasyOCR/blob/main/TFOD_and_EasyOCR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Perform OCR on a selected ROI(Region Of Interest) custom document 
To achieve this, the task is divided into two major parts. Text Detection and Text recognition. For the Text Detection, We will use a different model today,  CenterNet_MobileNetV2_FPN_512x512 from the TFOD(Tensorflow object detection) model zoo, perform transfer learning on it and train the model to detect certain ROIs on the custom document. Now for the Text Recognition, I will use an OCR model EasyOCR. There are other great OCR models to use eg Tesseract, PaddleOCR etc. 


#**Part 1 - ROI detection(Text detection)**

###Creat our folder structure

In [None]:
import os

In [None]:
import tensorflow as tf
print(tf.__version__)

Declaring and Assigning variable names.

In [None]:
CUSTOM_MODEL_NAME = 'my_ssd_mobnet' 
PRETRAINED_MODEL_NAME = 'ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8'
PRETRAINED_MODEL_URL = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz'
TF_RECORD_SCRIPT_NAME = 'generate_tfrecord.py'
LABEL_MAP_NAME = 'label_map.pbtxt'

Store paths in `path` dictionary.

In [None]:
paths = {
    'WORKSPACE_PATH': os.path.join('Tensorflow', 'workspace'),
    'SCRIPTS_PATH': os.path.join('Tensorflow','scripts'),
    'APIMODEL_PATH': os.path.join('Tensorflow','models'),
    'ANNOTATION_PATH': os.path.join('Tensorflow', 'workspace','annotations'),
    'IMAGE_PATH': os.path.join('Tensorflow', 'workspace','images'),
    'MODEL_PATH': os.path.join('Tensorflow', 'workspace','models'),
    'PRETRAINED_MODEL_PATH': os.path.join('Tensorflow', 'workspace','pre-trained-models'),
    'CHECKPOINT_PATH': os.path.join('Tensorflow', 'workspace','models',CUSTOM_MODEL_NAME), 
    'OUTPUT_PATH': os.path.join('Tensorflow', 'workspace','models',CUSTOM_MODEL_NAME, 'export'), 
    'TFJS_PATH':os.path.join('Tensorflow', 'workspace','models',CUSTOM_MODEL_NAME, 'tfjsexport'), 
    'TFLITE_PATH':os.path.join('Tensorflow', 'workspace','models',CUSTOM_MODEL_NAME, 'tfliteexport'), 
    'PROTOC_PATH':os.path.join('Tensorflow','protoc')
 }

In [None]:
files = {
    'PIPELINE_CONFIG':os.path.join('Tensorflow', 'workspace','models', CUSTOM_MODEL_NAME, 'pipeline.config'),
    'TF_RECORD_SCRIPT': os.path.join(paths['SCRIPTS_PATH'], TF_RECORD_SCRIPT_NAME), 
    'LABELMAP': os.path.join(paths['ANNOTATION_PATH'], LABEL_MAP_NAME)
}

Create directories.

In [None]:
for path in paths.values():
    if not os.path.exists(path):
        if os.name == 'posix':
            !mkdir -p {path}
        if os.name == 'nt':
            !mkdir {path}

###Download model and install TFOD(Tensorflow object detection)

Install model.

In [None]:
if not os.path.exists(os.path.join(paths['APIMODEL_PATH'], 'research', 'object_detection')):
    !git clone https://github.com/tensorflow/models {paths['APIMODEL_PATH']}

Install TFOD.

`posix` is for linux based system eg ubuntu, Mac OS.
> 
`nt` is windows.

In [None]:
if os.name=='posix':  
    !apt-get install protobuf-compiler
    !cd Tensorflow/models/research && protoc object_detection/protos/*.proto --python_out=. && cp object_detection/packages/tf2/setup.py . && python -m pip install . 
    
if os.name=='nt':
    url="https://github.com/protocolbuffers/protobuf/releases/download/v3.15.6/protoc-3.15.6-win64.zip"
    wget.download(url)
    !move protoc-3.15.6-win64.zip {paths['PROTOC_PATH']}
    !cd {paths['PROTOC_PATH']} && tar -xf protoc-3.15.6-win64.zip
    os.environ['PATH'] += os.pathsep + os.path.abspath(os.path.join(paths['PROTOC_PATH'], 'bin'))   
    !cd Tensorflow/models/research && protoc object_detection/protos/*.proto --python_out=. && copy object_detection\\packages\\tf2\\setup.py setup.py && python setup.py build && python setup.py install
    !cd Tensorflow/models/research/slim && pip install -e .

Install extra dependencies.

In [None]:
!pip install --upgrade opencv-contrib-python

!pip uninstall opencv-python==4.1.2.30 -y
!pip install opencv-python==4.5.5.64

!pip uninstall opencv-python-headless==4.1.2.30 -y
!pip install opencv-python-headless==4.5.5.64

In [None]:
!pip install dill==0.3.4 cloudpickle==1.2.0 requests==2.23.0 folium==0.2.1 imgaug==0.2.5

Run Verification Script.

In [None]:
VERIFICATION_SCRIPT = os.path.join(paths['APIMODEL_PATH'], 'research', 'object_detection', 'builders', 'model_builder_tf2_test.py')
# Verify Installation
!python {VERIFICATION_SCRIPT}

Install/Upgrade Tensorflow if necessary.

In [None]:
#!pip install tensorflow --upgrade
#!pip install tensorflow --upgrade tensorflow==1.15
# !pip install tensorflow --upgrade tensorflow==2.8.0
# !pip install tensorflow --upgrade tensorflow==2.5

Import the Object detection model for sanity check.

In [None]:
import object_detection

Download Pretrained Model.

In [None]:
# the name of the model is different when you download it
# name_ext = "centernet_mobilenetv2fpn_512x512_coco17_od"

if os.name =='posix':
    !wget {PRETRAINED_MODEL_URL}

    !mv {PRETRAINED_MODEL_NAME+'.tar.gz'} {paths['PRETRAINED_MODEL_PATH']}
    # !mv {name_ext+'.tar.gz'} {paths['PRETRAINED_MODEL_PATH']}

    !cd {paths['PRETRAINED_MODEL_PATH']} && tar -zxvf {PRETRAINED_MODEL_NAME+'.tar.gz'}
    # !cd {paths['PRETRAINED_MODEL_PATH']} && tar -zxvf {name_ext+'.tar.gz'}

if os.name == 'nt':
    wget.download(PRETRAINED_MODEL_URL)

    !mv {PRETRAINED_MODEL_NAME+'.tar.gz'} {paths['PRETRAINED_MODEL_PATH']}
    # !mv {name_ext+'.tar.gz'} {paths['PRETRAINED_MODEL_PATH']}

    !cd {paths['PRETRAINED_MODEL_PATH']} && tar -zxvf {PRETRAINED_MODEL_NAME+'.tar.gz'}
    # !cd {paths['PRETRAINED_MODEL_PATH']} && tar -zxvf {name_ext+'.tar.gz'}

###Create Label Map.

Here you modify the values of the list `labels` according to the labels you want and have annotated your Dataset to detect.

For this particular OCR task, I am targeting two certain ROI's(Region of interest) on a document.

> Therefore, I have them labeled as `chapter` and `title`.


> please note that these labels are case sensitive. You should be consistent with Whatever label name you used in annotating your dataset.

 

In [None]:
# example:
# labels = [{'name':'ThumbsUp', 'id':1}, {'name':'ThumbsDown', 'id':2}, {'name':'ThankYou', 'id':3}, {'name':'LiveLong', 'id':4}]

labels = [{'name':'chapter', 'id':1}, {'name':'title', 'id':2}]


with open(files['LABELMAP'], 'w') as f:
    for label in labels:
        f.write('item { \n')
        f.write('\tname:\'{}\'\n'.format(label['name']))
        f.write('\tid:{}\n'.format(label['id']))
        f.write('}\n')

###Create TF Records

I stored my dataset in google drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')



> Note: My dataset has already been annotated and splitted into `train`-`test` dataset. After that I compressed the dataset, named it `archive.tar.gz` and sent it to my google drive.

The code to compress your images is:

> `tar -czf {ARCHIVE_PATH} {TRAIN_PATH} {TEST_PATH}`







In [None]:
!cp '/content/drive/MyDrive/TFOD images/archive.tar.gz' {paths['IMAGE_PATH']}

Uncompress the file and move them to the images path

In [None]:
ARCHIVE_FILES = os.path.join(paths['IMAGE_PATH'], 'archive.tar.gz')
if os.path.exists(ARCHIVE_FILES):
  !tar -zxvf {ARCHIVE_FILES}
  !mv '/content/test' '/content/train' {paths['IMAGE_PATH']}

In [None]:
  !mv '/content/newSet/test' '/content/newSet/train' {paths['IMAGE_PATH']}

Get the TR Record Script and create the TF Record

In [None]:
if not os.path.exists(files['TF_RECORD_SCRIPT']):
    !git clone https://github.com/nicknochnack/GenerateTFRecord {paths['SCRIPTS_PATH']}

In [None]:
!python {files['TF_RECORD_SCRIPT']} -x {os.path.join(paths['IMAGE_PATH'], 'train')} -l {files['LABELMAP']} -o {os.path.join(paths['ANNOTATION_PATH'], 'train.record')} 
!python {files['TF_RECORD_SCRIPT']} -x {os.path.join(paths['IMAGE_PATH'], 'test')} -l {files['LABELMAP']} -o {os.path.join(paths['ANNOTATION_PATH'], 'test.record')}


###Copy Model config file to training folder

In [None]:
# the folder from the decompressed model changed
# folder_name = "centernet_mobilenetv2_fpn_od"
if os.name =='posix':
    
    !cp {os.path.join(paths['PRETRAINED_MODEL_PATH'], PRETRAINED_MODEL_NAME, 'pipeline.config')} {os.path.join(paths['CHECKPOINT_PATH'])}
    # !cp {os.path.join(paths['PRETRAINED_MODEL_PATH'], folder_name, 'pipeline.config')} {os.path.join(paths['CHECKPOINT_PATH'])}
if os.name == 'nt':
    !copy {os.path.join(paths['PRETRAINED_MODEL_PATH'], PRETRAINED_MODEL_NAME, 'pipeline.config')} {os.path.join(paths['CHECKPOINT_PATH'])}
    # !copy {os.path.join(paths['PRETRAINED_MODEL_PATH'], folder_name, 'pipeline.config')} {os.path.join(paths['CHECKPOINT_PATH'])}

update config file for transfer learning

In [None]:
import tensorflow as tf
from object_detection.utils import config_util
from object_detection.protos import pipeline_pb2
from google.protobuf import text_format

In [None]:
config = config_util.get_configs_from_pipeline_file(files['PIPELINE_CONFIG'])

In [None]:
config

In [None]:
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.io.gfile.GFile(files['PIPELINE_CONFIG'], "r") as f:                                                                                                                                                                                                                     
    proto_str = f.read()                                                                                                                                                                                                                                          
    text_format.Merge(proto_str, pipeline_config)

In [None]:
pipeline_config.model.ssd.num_classes = len(labels)
pipeline_config.train_config.batch_size = 4
pipeline_config.train_config.fine_tune_checkpoint = os.path.join(paths['PRETRAINED_MODEL_PATH'], PRETRAINED_MODEL_NAME, 'checkpoint', 'ckpt-0')
pipeline_config.train_config.fine_tune_checkpoint_type = "detection"
pipeline_config.train_input_reader.label_map_path= files['LABELMAP']
pipeline_config.train_input_reader.tf_record_input_reader.input_path[:] = [os.path.join(paths['ANNOTATION_PATH'], 'train.record')]
pipeline_config.eval_input_reader[0].label_map_path = files['LABELMAP']
pipeline_config.eval_input_reader[0].tf_record_input_reader.input_path[:] = [os.path.join(paths['ANNOTATION_PATH'], 'test.record')]

In [None]:
config_text = text_format.MessageToString(pipeline_config)                                                                                                                                                                                                        
with tf.io.gfile.GFile(files['PIPELINE_CONFIG'], "wb") as f:                                                                                                                                                                                                                     
    f.write(config_text)

###Train the model

In [None]:
TRAINING_SCRIPT = os.path.join(paths['APIMODEL_PATH'], 'research', 'object_detection', 'model_main_tf2.py')

In [None]:
command = "python {} --model_dir={} --pipeline_config_path={} --num_train_steps=2000".format(TRAINING_SCRIPT, paths['CHECKPOINT_PATH'],files['PIPELINE_CONFIG'])


In [None]:
print(command)

In [None]:
!{command}

###Evaluate the Model

In [None]:
command = "python {} --model_dir={} --pipeline_config_path={} --checkpoint_dir={}".format(TRAINING_SCRIPT, paths['CHECKPOINT_PATH'],files['PIPELINE_CONFIG'], paths['CHECKPOINT_PATH'])


In [None]:
print(command)


In [None]:
!{command}


###Load Trained Model from Checkpoint

In [None]:
import os
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder
from object_detection.utils import config_util

In [None]:
# gpu_device = tf.config.list_physical_devices('GPU')
# mem_alloc = [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5120)]
# tf.config.experimental.set_virtual_device_configuration(
#  gpu_device[0],mem_alloc)

NOTE: for loading the correct and the latest checkpoint, go into Tensorflow/workspace/models/my_ssd_mobnet and see the number of the last checkpoint, then make the changes accordingly in second argument of ckpt.restore() function

In [None]:
# Load pipeline config and build a detection model
configs = config_util.get_configs_from_pipeline_file(files['PIPELINE_CONFIG'])
detection_model = model_builder.build(model_config=configs['model'], is_training=False)

# Restore checkpoint
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ckpt.restore(os.path.join(paths['CHECKPOINT_PATH'], 'ckpt-3')).expect_partial()

@tf.function
def detect_fn(image):
    image, shapes = detection_model.preprocess(image)
    prediction_dict = detection_model.predict(image, shapes)
    detections = detection_model.postprocess(prediction_dict, shapes)
    return detections

###Detect from an Image

In [None]:
import cv2 
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

In [None]:
category_index = label_map_util.create_category_index_from_labelmap(files['LABELMAP'])

get Image

In [None]:
name = 'IMG_20220514_171412_601.jpg'
IMAGE_PATH = os.path.join(paths['IMAGE_PATH'], 'test', name)

In [None]:
img = cv2.imread(IMAGE_PATH)
image_np = np.array(img)

input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.float32)
detections = detect_fn(input_tensor)

num_detections = int(detections.pop('num_detections'))
detections = {key: value[0, :num_detections].numpy()
              for key, value in detections.items()}
detections['num_detections'] = num_detections

# detection_classes should be ints.
detections['detection_classes'] = detections['detection_classes'].astype(np.int64)

label_id_offset = 1
image_np_with_detections = image_np.copy()

viz_utils.visualize_boxes_and_labels_on_image_array(
            image_np_with_detections,
            detections['detection_boxes'],
            detections['detection_classes']+label_id_offset,
            detections['detection_scores'],
            category_index,
            use_normalized_coordinates=True,
            max_boxes_to_draw=5,
            min_score_thresh=.8,
            agnostic_mode=False)

plt.imshow(cv2.cvtColor(image_np_with_detections, cv2.COLOR_BGR2RGB))
plt.show()

#**Part2 - Applying OCR(Text Recognition)**

In part 1, we detected the Region of interest on our document ( Text detection ). Now we will use the extracted region of interest and run it through an OCR model in other to interprete the text in it. This is called Text Recognition.

[EasyOCR](https://github.com/JaidedAI/EasyOCR) is the OCR model we will use in this project. It runs on GPU, so we need to share it some GPU memory.

If you think you don't have enough GPU memory, then follow along the instruction in the cell below.

***Follow this cell only if you don't have enough GPU memory***

Just before we run our text detection model, we will need to partition our GPU memory so we can run text recognition with EasyOCR right after. 

To do this we inserted and commented out a piece of code right after where we made imports at the***load trained model from checkpoint*** section of our notebook.Uncomment the code to use it.

see the code below


>
.
```
gpu_device = tf.config.list_physical_devices('GPU')
mem_alloc = [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5120)]
tf.config.experimental.set_virtual_device_configuration(
 gpu_device[0],mem_alloc)
```


lets see what our `detections` dictionary contains

In [None]:
detections.keys()

install EasyOCR

In [None]:
!pip install easyocr

In [None]:
import easyocr

thresh = 0.7

Recall that our images with its detections is saved in `image_with_detections` variable

In [None]:
scores = list(filter(lambda x: x >thresh, detections['detection_scores']))
boxes = detections['detection_boxes'][:len(scores)]
classes = detections['detection_classes'][:len(scores)]

renormalize our detection boxes with respect to the image size.

In [None]:
height, width = image_np_with_detections.shape[0], image_np_with_detections.shape[1]

In [None]:
height

In [None]:
width

lets go through our detections and apply OCR to those regions

In [None]:
for idx, box in enumerate(boxes):
  print(box)