# Optical Character Recognition using TAO OCRNet-ViT

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## Sample prediction of OCRNet
<img align="center" src="https://github.com/vpraveen-nv/model_card_images/blob/main/cv/notebook/ocrnet/OCRNet_inference.png?raw=true" width="960">

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained OCRNet-ViT model and train OCRNet-ViT model on the ICDAR15 dataset
* Prune the trained OCRNet-ViT model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Run Inference on the trained model
* Export the pruned, and retrained model to a .onnx file for deployment to DeepStream

## Table of Contents

This notebook shows an example usecase of OCRNet-ViT using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2) <br>
    2.1 [Download pre-trained model](#head-2-1) <br>
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Prune trained models](#head-6)
7. [Retrain pruned models](#head-7)
8. [Evaluate retrained model](#head-8)
9. [Visualize inferences](#head-9)
10. [Model Export](#head-10)
11. [Verify deployed model](#head-11)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "ocrnet")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "ocrnet")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=/path/to/local/tao-experiments/ocrnet
# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

# Set your encryption key, and use the same key for all commands
%env KEY = nvidia_tao

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tlt_configs = {
   "Mounts":[
       # Mapping the data directory
       {
           "source": os.environ["LOCAL_PROJECT_DIR"],
           "destination": "/workspace/tao-experiments"
       },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       },
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         }
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tlt_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in PyPI. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.


In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

 We will be using the ICDAR15 word recognition dataset for the tutorial. To find more details please visit
https://rrc.cvc.uab.es/?ch=4&com=tasks. Please download the ICDAR15 word recognition train dataset (https://rrc.cvc.uab.es/?ch=4&com=downloads) to `$HOST_DATA_DIR/train` and test dataset to `$HOST_DATA_DIR/test`.

In [None]:
# Create local dir
!mkdir -p $HOST_DATA_DIR
!mkdir -p $HOST_RESULTS_DIR

In [None]:
# Check the dataset is present
!if [ ! -f $HOST_DATA_DIR/test/ch4_test_word_images_gt.zip ]; then echo 'Test Image zip file not found, please download.'; else echo 'Found Test Image zip file.';fi
!if [ ! -f $HOST_DATA_DIR/test/Challenge4_Test_Task3_GT.txt ]; then echo 'Test Label file not found, please download.'; else echo 'Found Test Labels file.';fi
!if [ ! -f $HOST_DATA_DIR/train/ch4_training_word_images_gt.zip ]; then echo 'Train zip file not found, please download.'; else echo 'Found Train zip file.';fi

In [None]:
# unpack 
!unzip -u $HOST_DATA_DIR/test/ch4_test_word_images_gt.zip -d $HOST_DATA_DIR/test
!unzip -u $HOST_DATA_DIR/train/ch4_training_word_images_gt.zip -d $HOST_DATA_DIR/train

In [None]:
# verify
!ls -l $HOST_DATA_DIR/

In [None]:
# The characters_list.txt will contain all the characters found in the dataset. Each character occupies one line. 
# The following code will process the labels to align with character_list.txt of the pretrained model
# Clean the label to alphanumeric, non-sensitive (lower case). Filter the label with length larger than 25
import re

def preprocess_label(gt_file, filtered_file):
    gt_list = open(gt_file, "r").readlines()
    filtered_list = []

    character_list = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    for label_line in gt_list:
        try:
            path, label = label_line.strip().split()
        except Exception:
            continue
        path = path[:-1]
        label = label.strip("\"")
        if re.search(f"[^{character_list}]", label):
            continue
        else:
            if len(label) <= 25:
                label = label.lower() # ignore the case
                filtered_list.append(f"{path}\t{label}\n")

    with open(filtered_file, "w") as f:
        f.writelines(filtered_list)

orig_train_gt_file=os.path.join(os.getenv("HOST_DATA_DIR"), "train", "gt.txt")
processed_train_gt_file=os.path.join(os.getenv("HOST_DATA_DIR"), "train", "gt_new.txt")
orig_test_gt_file=os.path.join(os.getenv("HOST_DATA_DIR"), "test", "Challenge4_Test_Task3_GT.txt")
processed_test_gt_file=os.path.join(os.getenv("HOST_DATA_DIR"), "test", "gt_new.txt")
preprocess_label(orig_train_gt_file, processed_train_gt_file)
preprocess_label(orig_test_gt_file, processed_test_gt_file)


In [None]:
# Set the path from the perspective of the TAO docker container
%env DATA_DIR = /data
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

Then we will convert the raw dataset (images + labels list) to LMDB format. LMDB is a key-value memory database. With storing the dataset in RAM memory, we can enjoy a better data IO bandwidth. But if we're working with a remote file system which is used by multiple persons at the same time, we should skip the following steps and use raw dataset loader of OCRNet.

In [None]:
# Convert the raw train dataset to lmdb
print("Converting the training set to LMDB.")
!tao model ocrnet dataset_convert -e $SPECS_DIR/experiment-vit.yaml \
                            dataset_convert.input_img_dir=$DATA_DIR/train \
                            dataset_convert.gt_file=$DATA_DIR/train/gt_new.txt \
                            dataset_convert.results_dir=$DATA_DIR/train/lmdb

In [None]:
# Convert the raw test dataset to lmdb
print("Converting the testing set to LMDB.")
!tao model ocrnet dataset_convert -e $SPECS_DIR/experiment-vit.yaml \
                            dataset_convert.input_img_dir=$DATA_DIR/test \
                            dataset_convert.gt_file=$DATA_DIR/test/gt_new.txt \
                            dataset_convert.results_dir=$DATA_DIR/test/lmdb

In [None]:
# The characters_list.txt will contain all the characters found in the dataset. Each character occupies one line. The model will only classify the characters in this list.
# Generate the character list file for the model:
character_list = "!#$%&'()*+,-./0123456789:;<=>[]^_abcdefghijklmnopqrstuvwxyz|~"
with open(os.path.join(os.getenv("HOST_DATA_DIR"), "character_list"), "w") as f:
     for ch in character_list:
            f.write(f"{ch}\n")

In [None]:
!ls -rlt $HOST_DATA_DIR/train/lmdb

Additionally, if you have your own dataset already in a volume (or folder), you can mount the volume on `HOST_DATA_DIR` (or create a soft link). Below shows an example:
```bash
# if your dataset is in /dev/sdc1
mount /dev/sdc1 $HOST_DATA_DIR

# if your dataset is in folder /var/dataset
ln -sf /var/dataset $HOST_DATA_DIR
```

### 2.1 Download pre-trained model <a class="anchor" id="head-2-1"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $HOST_RESULTS_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $HOST_RESULTS_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $HOST_RESULTS_DIR/ngccli
!unzip -u "$HOST_RESULTS_DIR/ngccli/$CLI" -d $HOST_RESULTS_DIR/ngccli/
!rm $HOST_RESULTS_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("HOST_RESULTS_DIR", ""), os.getenv("PATH", ""))

In [None]:
!ngc registry model list nvidia/tao/ocrnet:*

In [None]:
!mkdir -p $HOST_RESULTS_DIR/pretrained_ocrnet/

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/ocrnet:trainable_v2.0 --dest $HOST_RESULTS_DIR/pretrained_ocrnet

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $HOST_RESULTS_DIR/pretrained_ocrnet/ocrnet_vtrainable_v2.0

## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Dataset for the train datasets
    * In order to use the newly generated dataset, update the dataset_config parameter in the spec file at `$HOST_SPECS_DIR/experiment.yaml`
    * You also need to prepare the new `charater_list_file`.
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [None]:
!cat $HOST_SPECS_DIR/experiment-vit.yaml

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [None]:
!mkdir -p $HOST_RESULTS_DIR/experiment_dir_unpruned

In [None]:
!tao model ocrnet train -e $SPECS_DIR/experiment-vit.yaml \
              train.results_dir=$RESULTS_DIR/experiment_dir_unpruned \
              train.pretrained_model_path=$RESULTS_DIR/pretrained_ocrnet/ocrnet_vtrainable_v2.0/ocrnet-vit.pth \
              dataset.train_dataset_dir=[$DATA_DIR/train/lmdb] \
              dataset.val_dataset_dir=$DATA_DIR/test/lmdb \
              dataset.character_list_file=$DATA_DIR/character_list

In [None]:
## Training command for multi-gpu training. We can define the number of gpus and specify which GPU's are to be used by setting the `train.gpu_ids` parameter.
## The following command will trigger multi-gpu training on gpu 0 and gpu 1.
# !tao model ocrnet train -e $SPECS_DIR/experiment-vit.yaml \
#               train.gpu_ids=[0,1] \
#               train.results_dir=$RESULTS_DIR/experiment_dir_unpruned \
#               train.pretrained_model_path=$RESULTS_DIR/pretrained_ocrnet/ocrnet_vtrainable_v2.0/ocrnet-vit.pth \
#               dataset.train_dataset_dir=[$DATA_DIR/train/lmdb] \
#               dataset.val_dataset_dir=$DATA_DIR/test/lmdb \
#               dataset.character_list_file=$DATA_DIR/character_list

In [None]:
print('Trained:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/experiment_dir_unpruned/

## 5. Evaluate trained models <a class="anchor" id="head-5"></a>

In [None]:
!tao model ocrnet evaluate -e $SPECS_DIR/experiment-vit.yaml \
                 evaluate.results_dir=$RESULTS_DIR/experiment_dir_unpruned \
                 evaluate.checkpoint=$RESULTS_DIR/experiment_dir_unpruned/best_accuracy.pth \
                 evaluate.test_dataset_dir=$DATA_DIR/test/lmdb \
                 dataset.character_list_file=$DATA_DIR/character_list

## 6. Prune trained models <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Choose the pruning method from [`amount`, `threshold`, `experimental_hybrid`]. Default to be `experimental_hybrid` in this notebook.
* `threshold` or `amount` for pruning .
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust threshold or amount for accuracy and model size trade off. `amount` is for `amount` and `experimental_hybrid` pruning. The smaller amount, the smaller the pruned model will be. `threshold` is for `threshold` pruning, the higher threshold value, the smaller the pruned model will be. Users can try multiple times to find the best trade-off between the model size and model accruracy. For more details about pruning algorithms, please refer to [TAO-Toolkit documentation]()

In [None]:
!mkdir -p $HOST_RESULTS_DIR/experiment_dir_pruned

In [None]:
!tao model ocrnet prune -e $SPECS_DIR/experiment-vit.yaml \
              prune.checkpoint=$RESULTS_DIR/experiment_dir_unpruned/best_accuracy.pth \
              prune.results_dir=$RESULTS_DIR/experiment_dir_pruned/ \
              prune.pruned_file=$RESULTS_DIR/experiment_dir_pruned/pruned_0.1.pth \
              dataset.character_list_file=$DATA_DIR/character_list

In [None]:
!ls -rlth $HOST_RESULTS_DIR/experiment_dir_pruned/

## 7. Retrain pruned models <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
!mkdir -p $HOST_RESULTS_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights 
!tao model ocrnet train -e $SPECS_DIR/experiment-vit.yaml \
              train.results_dir=$RESULTS_DIR/experiment_dir_retrain \
              train.resume_training_checkpoint_path=$RESULTS_DIR/experiment_dir_pruned/pruned_0.1.pth \
              dataset.train_dataset_dir=[$DATA_DIR/train/lmdb] \
              dataset.val_dataset_dir=$DATA_DIR/test/lmdb \
              dataset.character_list_file=$DATA_DIR/character_list

In [None]:
# Listing the newly retrained model.
!ls -rlth $HOST_RESULTS_DIR/experiment_dir_retrain/

## 8. Evaluate retrained model <a class="anchor" id="head-8"></a>

In [None]:
!tao model ocrnet evaluate -e $SPECS_DIR/experiment-vit.yaml \
                 evaluate.results_dir=$RESULTS_DIR/experiment_dir_retrain \
                 evaluate.checkpoint=$RESULTS_DIR/experiment_dir_retrain/best_accuracy.pth \
                 evaluate.test_dataset_dir=$DATA_DIR/test/lmdb \
                 dataset.character_list_file=$DATA_DIR/character_list

## 9. Inferences <a class="anchor" id="head-9"></a>
In this section, we run the `infer` tool to generate inferences on the trained models. The predicted label will be printed out in the log.

In [None]:
# Copy some test images
!mkdir -p $HOST_DATA_DIR/test_samples
!cp $HOST_DATA_DIR/test/word_100* $HOST_DATA_DIR/test_samples

In [None]:
# Running inference for detection on n images
!tao model ocrnet inference -e $SPECS_DIR/experiment-vit.yaml \
                  inference.checkpoint=$RESULTS_DIR/experiment_dir_retrain/best_accuracy.pth \
                  inference.inference_dataset_dir=$DATA_DIR/test_samples \
                  inference.results_dir=$RESULTS_DIR/experiment_dir_retrain/ \
                  dataset.character_list_file=$DATA_DIR/character_list

## 10. Model Export <a class="anchor" id="head-10"></a>

If you trained a non-QAT model, you may export in FP32, FP16 or INT8 mode using the code block below. For INT8, you need to provide calibration image directory.

In [None]:
# tao <task> export will fail if .onnx already exists. So we clear the export folder before tao <task> export
!rm -rf $HOST_RESULTS_DIR/export
# Generate .onnx file using tao container
!mkdir -p $HOST_RESULTS_DIR/export

# Export the model to .onnx
!tao model ocrnet export -e $SPECS_DIR/experiment-vit.yaml \
               export.results_dir=$RESULTS_DIR/export/ \
               export.checkpoint=$RESULTS_DIR/experiment_dir_retrain/best_accuracy.pth \
               export.onnx_file=$RESULTS_DIR/export/best_accuracy.onnx \
               dataset.character_list_file=$DATA_DIR/character_list

Using the `tao deploy` container, you can generate a TensorRT engine and verify the correctness of the generated through evaluate and inference. 

The `tao deploy` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please run `tao deploy` command which will instantiate a deploy container, with the exported `.onnx` file on your target device. The `tao deploy` container only works for x86, with discrete NVIDIA GPU's. 

For the jetson devices, please download the tao-converter for jetson from the dev zone link [here](https://developer.nvidia.com/tao-converter). 

In [None]:
# Convert to TensorRT engine (FP32). Change --data_type to fp16 for FP16 mode
!tao deploy ocrnet gen_trt_engine -e $SPECS_DIR/experiment-vit.yaml \
                               gen_trt_engine.onnx_file=$RESULTS_DIR/export/best_accuracy.onnx \
                               gen_trt_engine.trt_engine=$RESULTS_DIR/export/trt.engine \
                               gen_trt_engine.tensorrt.min_batch_size=1 \
                               gen_trt_engine.tensorrt.opt_batch_size=1 \
                               gen_trt_engine.tensorrt.max_batch_size=1 \
                               gen_trt_engine.tensorrt.data_type=fp32

In [None]:
print('Exported model:')
print('------------')
!ls -lh $HOST_RESULTS_DIR/export

## 11. Verify the deployed model <a class="anchor" id="head-11"></a>
Verify the converted engine by TensorRT inferences.

In [None]:
# Infer using TensorRT engine

# The engine batch size once created, cannot be alterred. So if you wish to run with a different batch-size,
# please re-run tao deploy.

!tao deploy ocrnet inference -e $SPECS_DIR/experiment-vit.yaml \
                             inference.trt_engine=$RESULTS_DIR/export/trt.engine \
                             inference.inference_dataset_dir=$DATA_DIR/test_samples \
                             inference.input_width=200 \
                             inference.input_height=64 \
                             dataset.character_list_file=$DATA_DIR/character_list

In [None]:
# Evaluation using TensorRT engine
!tao deploy ocrnet evaluate -e $SPECS_DIR/experiment.yaml \
                            evaluate.trt_engine=$RESULTS_DIR/export/trt.engine \
                            evaluate.test_dataset_dir=$DATA_DIR/test \
                            evaluate.test_dataset_gt_file=$DATA_DIR/test/gt_new.txt \
                            evaluate.input_width=200 \
                            evaluate.input_height=64 \
                            dataset.character_list_file=$DATA_DIR/character_list