# Optical Character Recognition using TAO OCRNet

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained OCDNet/OCRNet model and train OCDNet/OCRNet model on the IAMDATA Handwritting dataset

## Table of Contents

This notebook shows an example usecase of OCDNet/OCRNet using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset](#head-2) <br>
3. [Creating a custom character detection (OCDNet) model](#head-3)
4. [Creating a custom character recognition (OCRNet) model](#head-4)


## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [4]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
# %env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments
%env LOCAL_PROJECT_DIR=/hdd_10t/tylerz/devblog/

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "ocdr")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "ocdr")
os.environ["PRE_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "iamdata")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=/path/to/local/tao-experiments/ocrnet
# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)


env: LOCAL_PROJECT_DIR=/hdd_10t/tylerz/devblog/


In [5]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR
! mkdir -p $PRE_DATA_DIR

In [6]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tlt_configs = {
   "Mounts":[
       # Mapping the data directory
       {
           "source": os.environ["LOCAL_PROJECT_DIR"],
           "destination": "/workspace/tao-experiments"
       },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       },
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         }
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tlt_configs, mfile, indent=4)

In [7]:
!cat ~/.tao_mounts.json

{
    "Mounts": [
        {
            "source": "/hdd_10t/tylerz/devblog/",
            "destination": "/workspace/tao-experiments"
        },
        {
            "source": "/hdd_10t/tylerz/devblog/data/ocdr",
            "destination": "/data"
        },
        {
            "source": "/hdd_10t/tylerz/devblog/specs",
            "destination": "/specs"
        },
        {
            "source": "/hdd_10t/tylerz/devblog/ocdr",
            "destination": "/results"
        }
    ],
    "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
        }
    }
}

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in PyPI. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.6.9 < 3.8.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.


In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-tao

In [8]:
# View the versions of the TAO launcher
!tao info

Configuration of the TAO Toolkit Instance
task_group: ['model', 'dataset', 'deploy']
format_version: 3.0
toolkit_version: 5.0.0
published_date: 07/12/2023


## 2. Prepare dataset <a class="anchor" id="head-2"></a>

We will be using the IAM handwritting dataset. To find more details please visit
https://fki.tic.heia-fr.ch/databases/iam-handwriting-database. You will need need to register for a free account and manually downlaod the required files. Please review the licensing requirements. These files are to be used for research use only.  

There are several zipped files to download, the ascii.tgz and the document images which are each stored in one of 3 zipped files based on the document title.
Place these files in your data/iamdata folder ($PRE_DATA_DIR)
ascii.tgz
formsA-D.tgz
formsE-H.tgz
formsI-Z.tgz
Please download the IAMDATA (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) to `$PRE_DATA_DIR`.

In [9]:
#Now lets extract just the words.txt file into our data/iamdata directory
#The words.txt file contains the words detected and  x,y,h,w info for each for each image
!tar -xf $PRE_DATA_DIR/ascii.tgz --directory $PRE_DATA_DIR/ words.txt 

In [10]:
#create directories to hold image data
!mkdir -p $PRE_DATA_DIR/train/img
!mkdir -p $PRE_DATA_DIR/test/img
!mkdir -p $PRE_DATA_DIR/train/gt
!mkdir -p $PRE_DATA_DIR/test/gt

In [11]:
#unpack the images, let's use the first two groups of images for training and the last for validation

!tar -xzf $PRE_DATA_DIR/formsA-D.tgz --directory $PRE_DATA_DIR/train/img
!tar -xzf $PRE_DATA_DIR/formsE-H.tgz --directory $PRE_DATA_DIR/train/img 
!tar -xzf $PRE_DATA_DIR/formsI-Z.tgz --directory $PRE_DATA_DIR/test/img 

In [14]:
# verify
!ls -l $PRE_DATA_DIR/test

total 24
drwxrwxr-x 2 tylerz tylerz  4096 Jul 13 15:42 gt
drwxrwxr-x 2 tylerz tylerz 20480 Jul 13 15:43 img


### 2.1 Prepare dataset for OCDNet

In [15]:
import pandas as pd

def extract_columns(line):
    #filename = line[:line.index('-') + 5]
    parts = line.split()
    file = parts[0]
    if len(file) == 13:
        filename = 'gt_' + line[:line.index('-') + 4] + '.txt'
    else:
        filename = 'gt_' + line[:line.index('-') + 5] + '.txt'
    
    word = parts[-1]
    x = parts[3]
    y = parts[4]
    w = parts[5]
    h = parts[6]
    x2 = int(x) + int(w)
    y2 = y
    x3 = int(x) + int(w)
    y3 = int(y) + int(h)
    x4 = x
    y4 = int(y) + int(h)
    return filename,  x, y, x2, y2, x3, y3, x4, y4, word

def process_text_file(file_path):
    data = []
    with open(file_path, 'r') as file:
        #skip the first 18 lines in the words.txt file as that is just meta data 
        lines = file.readlines()[19:] 
        for line in lines:
            filename, x, y, x2, y2, x3, y3, x4, y4,word = extract_columns(line.strip()) 
            if not word == ', ,' and not word == '. .' or not word == ''and not word == '*- -' and not word == '; ;' and not word == '.' and not word == ',':
                data.append([filename, x, y, x2, y2, x3, y3, x4, y4, word])
    # Create a DataFrame from the extracted data
    columns = ['filename', 'x', 'y','x2', 'y2', 'x3', 'y3', 'x4', 'y4', 'word' ] 
    df = pd.DataFrame(data, columns=columns)
    return df

# Example usage:
pfile = os.environ["PRE_DATA_DIR"] + '/words.txt'
df = process_text_file(pfile)
        
df.head()

Unnamed: 0,filename,x,y,x2,y2,x3,y3,x4,y4,word
0,gt_a01-000u.txt,507,766,720,766,720,814,507,814,MOVE
1,gt_a01-000u.txt,796,764,866,764,866,814,796,814,to
2,gt_a01-000u.txt,919,757,1085,757,1085,835,919,835,stop
3,gt_a01-000u.txt,1185,754,1311,754,1311,815,1185,815,Mr.
4,gt_a01-000u.txt,1438,746,1820,746,1820,819,1438,819,Gaitskell


This command will group the dataframe from the previous step and create a txt file for each image with the 8 point coordinates and the identified words.


In [16]:
#write out the label files for each image and save to the applicable test or training gt folder
groups = df.groupby('filename')
for filename, group in groups:
    gdf = pd.DataFrame(data=group)
    pdf = gdf[['x', 'y', 'x2', 'y2','x3','y3','x4','y4','word']]
    if filename[3] in ['j','k','l','m','n','p','r']:
        tefile = os.environ["PRE_DATA_DIR"] + '/test/gt/' + filename
        #pdf.to_csv(f'data/iamdata\test\gt\{filename}', index=False, header=False)
        pdf.to_csv(tefile, index=False, header=False)
    else:
        trfile = os.environ["PRE_DATA_DIR"] + '/train/gt/' + filename
        pdf.to_csv(trfile, index=False, header=False)

In [17]:
#lets copy our data from the data prep directory to the host data directory
#you may wish to remove the original zip files and images in the data prep directory once you copy the data to the host data directory
!cp -a $PRE_DATA_DIR/train/. $HOST_DATA_DIR/train/
!cp -a $PRE_DATA_DIR/test/. $HOST_DATA_DIR/test/

### 2.2 Prepare dataset for OCRNet

In [18]:
# Then convert the IAMDATA train split to TAO Toolkit OCRNet format
!python preprocess_data.py --images_dir=$HOST_DATA_DIR/train/img \
                           --labels_dir=$HOST_DATA_DIR/train/gt \
                           --output_images_dir=$HOST_DATA_DIR/train/processed \
                           --gt_file_path=$HOST_DATA_DIR/train/gt.txt \
                           --character_list_path=$HOST_DATA_DIR/train/character_list.txt


100%|███████████████████████████████████████| 1051/1051 [02:38<00:00,  6.64it/s]


In [20]:
# Then convert the IAMDATA test split to TAO Toolkit OCRNet format
!python preprocess_data.py --images_dir=$HOST_DATA_DIR/test/img \
                           --labels_dir=$HOST_DATA_DIR/test/gt \
                           --output_images_dir=$HOST_DATA_DIR/test/processed \
                           --gt_file_path=$HOST_DATA_DIR/test/gt.txt \
                           --character_list_path=$HOST_DATA_DIR/test/character_list.txt

100%|█████████████████████████████████████████| 488/488 [01:24<00:00,  5.77it/s]


In [21]:
# Set the path from the perspective of the TAO docker container
%env DATA_DIR = /data
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

env: DATA_DIR=/data
env: SPECS_DIR=/specs
env: RESULTS_DIR=/results


Then we will convert the raw dataset (images + labels list) to LMDB format. LMDB is a key-value memory database. With storing the dataset in RAM memory, we can enjoy a better data IO bandwidth. But if we're working with a remote file system which is used by multiple persons at the same time, we should skip the following steps and use raw dataset loader of OCRNet.

In [22]:
# Convert the raw train dataset to lmdb
print("Converting the training set to LMDB.")
!tao model ocrnet dataset_convert -e $SPECS_DIR/ocr/experiment.yaml \
                            dataset_convert.input_img_dir=$DATA_DIR/train/processed \
                            dataset_convert.gt_file=$DATA_DIR/train/gt.txt \
                            dataset_convert.results_dir=$DATA_DIR/train/lmdb

Converting the training set to LMDB.
2023-07-13 15:54:40,414 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 15:54:40,581 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
2023-07-13 15:54:41,574 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 297: The required docker doesn't exist locally/the manifest has changed. Pulling a new docker.
2023-07-13 15:54:41,575 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 156: Pulling the required container. This may take several minutes if you're doing this for the first time. Please wait here.
...
Pulling from repository: nvcr.io/nvstaging/tao/tao-toolkit-pyt
2023-07-13 15:55:03,838 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 165: Container pull complete.
Docker will run the commands as root. If you would like to ret

In [23]:
# Convert the raw test dataset to lmdb
print("Converting the testing set to LMDB.")
!tao model ocrnet dataset_convert -e $SPECS_DIR/ocr/experiment.yaml \
                            dataset_convert.input_img_dir=$DATA_DIR/test/processed \
                            dataset_convert.gt_file=$DATA_DIR/test/gt.txt \
                            dataset_convert.results_dir=$DATA_DIR/test/lmdb

Converting the testing set to LMDB.
2023-07-13 15:57:23,429 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 15:57:23,612 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 15:57:24,244 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 07:57:30,044 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmp38fyu2ns
[2023-07-13 07:57:30,044 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmp38fyu2ns/_remote

In [25]:
!ls -rlt $HOST_DATA_DIR/train/lmdb

total 739548
-rw-r--r-- 1 root root      8192 Jul 13 15:55 lock.mdb
-rw-r--r-- 1 root root 757284864 Jul 13 15:56 data.mdb
-rw-r--r-- 1 root root       264 Jul 13 15:56 status.json


Additionally, if you have your own dataset already in a volume (or folder), you can mount the volume on `HOST_DATA_DIR` (or create a soft link). Below shows an example:
```bash
# if your dataset is in /dev/sdc1
mount /dev/sdc1 $HOST_DATA_DIR

# if your dataset is in folder /var/dataset
ln -sf /var/dataset $HOST_DATA_DIR
```

### 2.3 Prepare NGC CLI for downloading pretrained model <a class="anchor" id="head-2-1"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [26]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $HOST_RESULTS_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $HOST_RESULTS_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $HOST_RESULTS_DIR/ngccli
!unzip -u "$HOST_RESULTS_DIR/ngccli/$CLI" -d $HOST_RESULTS_DIR/ngccli/
!rm $HOST_RESULTS_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("HOST_RESULTS_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_cat_linux.zip
--2023-07-13 15:58:06--  https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 13.226.120.58, 13.226.120.126, 13.226.120.123, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.226.120.58|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 43272608 (41M) [application/zip]
Saving to: ‘/hdd_10t/tylerz/devblog/ocdr/ngccli/ngccli_cat_linux.zip’


2023-07-13 15:58:08 (25.3 MB/s) - ‘/hdd_10t/tylerz/devblog/ocdr/ngccli/ngccli_cat_linux.zip’ saved [43272608/43272608]

Archive:  /hdd_10t/tylerz/devblog/ocdr/ngccli/ngccli_cat_linux.zip
   creating: /hdd_10t/tylerz/devblog/ocdr/ngccli/ngc-cli/
   creating: /hdd_10t/tylerz/devblog/ocdr/ngccli/ngc-cli/opentelemetry_semantic_conventions-0.38b0.dist-info/
  inflating: /hdd_10t/tylerz/devblog/ocdr/ngccli/ngc-cli/opentelemetry_semantic_conventions-0.38b0.dist-info/RECORD  
  inflating: /hdd_10t/tylerz/devblog/ocdr/ngccli/ngc-cli/opentelemetry_semantic_con

## 3 Creating a custom character detection (OCDNet) model <a class="anchor" id="head-3"></a>

### 3.1 Download pre-trained model

In [29]:
# View the current list available OCDNet models
!ngc registry model list nvidia/tao/ocdnet:*

+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU   | Memor | File  | Statu | Creat |
| on    | acy   | s     | Size  | Model | y Foo | Size  | s     | ed    |
|       |       |       |       |       | tprin |       |       | Date  |
|       |       |       |       |       | t     |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| train |       |       | 1     | V100  | 401.6 | 401.5 | UPLOA | Jun   |
| able_ |       |       |       |       |       | 8 MB  | D_COM | 02,   |
| resne |       |       |       |       |       |       | PLETE | 2023  |
| t50_v |       |       |       |       |       |       |       |       |
| 1.0   |       |       |       |       |       |       |       |       |
| train |       |       | 1     | V100  | 195.2 | 195.1 | UPLOA | Jun   |
| able_ |       |       |       |       |       | 6 MB  | D_COM | 02,   |
| resne |       |       |       |     

In [30]:
!mkdir -p $HOST_RESULTS_DIR/pretrained_ocdnet/

In [32]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/ocdnet:trainable_resnet18_v1.0 --dest $HOST_RESULTS_DIR/pretrained_ocdnet/

Getting files to download...
[?25l[32m⠋[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠙[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠹[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠼[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠴[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠧[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElaps

In [55]:
print("Check that model is downloaded into dir.")
!ls -l $HOST_RESULTS_DIR/pretrained_ocdnet/ocdnet_vtrainable_resnet18_v1.0

Check that model is downloaded into dir.
total 199848
-rw------- 1 tylerz tylerz 204640685 Jul 13 16:00 ocdnet_deformable_resnet18.pth


### 3.2 Provide training specification

We provide specification files to configure the training parameters including:

* num_gpus: number of gpus 
* train: configure the training hyperparameters
    * results_dir: Path to restore training result
    * resume_training_checkpoint_path: Resume training from a checkpoint.
    * num_epochs: The total epochs for training
    * validation_interval: validation interval
    * checkpoint_interval: checkpoint interval
    * optimizer
        * type: Defaults to Adam.
        * lr: Initial learning rate 
    * lr_scheduler
        * type: Only supports WarmupPolyLR
        * warmup_epoch: The epoch numbers for warm up to initinal learning rate. It should be different from num_epochs. 
    * post_processing
        * type: Only supports SegDetectorRepresenter
        * thresh: The threshold for binarization.
        * box_thresh: The threshold for bounding box.
        * unclip_ratio: Default to 1.5. The box will look larger if this ratio is set to larger.
    * Metric
        * type: Only supports QuadMetric
        * is_output_polygon: Defaults to false. False for bounding box. True for polygon.
* dataset: configure the dataset and augmentation methods
    * train_dataset:
        * data_path: Path to train images. If there are multi sources, set it looks like ['/path/1' , '/path/2']
        * pre_processes
            * size: Ramdom crop size during training. Defaults to [640, 640].
        * loader
            * batch_size: batch size for dataloader
            * num_workers: number of workers to do data loading 
    * validate_dataset: 
        * data_path: Path to validation images. If there are multi sources, set it looks like ['/path/1' , '/path/2']
        * pre_processes
            * short_size: Resize to width x height during evaluation. Defaults to [1280, 736].
            * resize_text_polys: Resize the coordinate of text groudtruth. Defaults to true.
        * loader
            * batch_size: batch size for dataloader
            * num_workers: number of workers to do data loading            
* model: configure the model setting
    * backbone: The backbone type. The deformable_resnet18 and deformable_resnet50 are supported.
    * load_pruned_graph: Defaults to False. Must set to true if train a model which is pruned. 
    * pruned_graph_path: The path to the pruned model graph.
    * pretrained_model_path: Finetune from a pretrained model. The `.pth` model is supported.

Please refer to the TAO documentation about OCDNet to get more parameters that are configurable.

In [52]:
!cat $HOST_SPECS_DIR/ocd/train.yaml

num_gpus: 1

model:
  load_pruned_graph: False
  pruned_graph_path: '/results/prune/pruned_0.1.pth'
  pretrained_model_path: '/data/ocdnet/ocdnet_deformable_resnet18.pth'
  backbone: deformable_resnet18

train:
  results_dir: /results/ocd/train
  num_epochs: 30
  #resume_training_checkpoint_path: '/results/train/resume.pth'
  checkpoint_interval: 1
  validation_interval: 1
  trainer:
    clip_grad_norm: 5.0

  optimizer:
    type: Adam
    args:
      lr: 0.001

  lr_scheduler:
    type: WarmupPolyLR
    args:
      warmup_epoch: 3

  post_processing:
    type: SegDetectorRepresenter
    args:
      thresh: 0.3
      box_thresh: 0.55
      max_candidates: 1000
      unclip_ratio: 1.5

  metric:
    type: QuadMetric
    args:
      is_output_polygon: false


dataset:
  train_dataset:
      data_path: ['/data/train']
      args:
        pre_processes:
          - type: IaaAugment
            args:
              - {'type':Fliplr, 'args':{'p':0.5}}
              - {'type': Affine, 'args':{

### 3.3 Run TAO training

In [56]:
#Train using TAO Launcher
print("Run training with ngc pretrained model.")
!tao model ocdnet train \
          -e $SPECS_DIR/ocd/train.yaml \
          -r $RESULTS_DIR/ocd/train \
          model.pretrained_model_path=$RESULTS_DIR/pretrained_ocdnet/ocdnet_vtrainable_resnet18_v1.0/ocdnet_deformable_resnet18.pth

Run training with ngc pretrained model.
2023-07-13 17:50:49,334 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 17:50:49,521 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 17:50:50,187 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 09:50:55,783 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmp1ocd4dky
[2023-07-13 09:50:55,783 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmp1ocd4dky/_re

### 3.4 Evaluate trained model

In [58]:
# Evaluate on model
!tao model ocdnet evaluate \
           -e $SPECS_DIR/ocd/evaluate.yaml \
           evaluate.checkpoint=$RESULTS_DIR/ocd/train/model_best.pth

2023-07-13 19:05:17,247 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 19:05:17,421 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 19:05:18,029 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 11:05:23,469 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmph3rf2c1h
[2023-07-13 11:05:23,469 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmph3rf2c1h/_remote_module_non_scriptable.py
[2023-07-1

### 3.5 Visualize the inference

In [59]:
#run inference using TAO
!tao model ocdnet inference \
           -e $SPECS_DIR/ocd/inference.yaml \
           inference.checkpoint=$RESULTS_DIR/ocd/train/model_best.pth \
           inference.input_folder=$DATA_DIR/test/img \
           inference.results_dir=$RESULTS_DIR/ocd/inference

2023-07-13 19:19:52,309 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 19:19:52,481 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 19:19:53,143 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 11:19:58,831 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmpm7jxota2
[2023-07-13 11:19:58,831 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmpm7jxota2/_remote_module_non_scriptable.py
[2023-07-1

### 3.6 Export the model for deployment

In [63]:
# Export pth model to ONNX model
!tao model ocdnet export \
           -e $SPECS_DIR/ocd/export.yaml \
           export.checkpoint=$RESULTS_DIR/ocd/train/model_best.pth \
           export.onnx_file=$RESULTS_DIR/ocd/export/model_best.onnx

2023-07-13 19:32:25,165 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 19:32:25,350 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 19:32:25,993 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 11:32:31,635 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmpspw21o6z
[2023-07-13 11:32:31,636 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmpspw21o6z/_remote_module_non_scriptable.py
[2023-07-1

## 4 Creating a custom character recognition (OCRNet) model <a class="anchor" id="head-4"></a>

### 4.1 Download pre-trained model

In [37]:
!ngc registry model list nvidia/tao/ocrnet:*

+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU   | Memor | File  | Statu | Creat |
| on    | acy   | s     | Size  | Model | y Foo | Size  | s     | ed    |
|       |       |       |       |       | tprin |       |       | Date  |
|       |       |       |       |       | t     |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| train |       |       |       |       |       | 186.9 | UPLOA | Apr   |
| able_ |       |       |       |       |       | 3 MB  | D_COM | 10,   |
| v1.0_ |       |       |       |       |       |       | PLETE | 2023  |
| decry |       |       |       |       |       |       |       |       |
| pt    |       |       |       |       |       |       |       |       |
| train |       |       | 1     | V100  | 186.9 | 186.9 | UPLOA | Jun   |
| able_ |       |       |       |       |       | 3 MB  | D_COM | 02,   |
| v1.0  |       |       |       |     

In [39]:
!mkdir -p $HOST_RESULTS_DIR/pretrained_ocrnet/

In [40]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/ocrnet:trainable_v1.0 --dest $HOST_RESULTS_DIR/pretrained_ocrnet

Getting files to download...
[?25l[32m⠋[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠙[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠹[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠼[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠴[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠧[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElaps

In [64]:
print("Check that model is downloaded into dir.")
!ls -l $HOST_RESULTS_DIR/pretrained_ocrnet/ocrnet_vtrainable_v1.0

Check that model is downloaded into dir.
total 191412
-rw------- 1 tylerz tylerz 196005541 Jul 13 16:40 ocrnet_resnet50.tlt


### 4.2 Provide training specification
* Dataset for the train datasets
    * In order to use the newly generated dataset, update the dataset_config parameter in the spec file at `$HOST_SPECS_DIR/experiment.yaml`
    * You also need to prepare the new `charater_list_file`.
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [42]:
!cat $HOST_SPECS_DIR/ocr/experiment.yaml

results_dir: /results
encryption_key: nvidia_tao
model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False
  input_width: 100
  input_height: 32
  input_channel: 1
dataset:
  train_dataset_dir: []
  val_dataset_dir: /data/test/lmdb
  character_list_file: /data/character_list
  max_label_length: 25
  batch_size: 32
  workers: 4
  augmentation:
    keep_aspect_ratio: False
train:
  seed: 1111
  gpu_ids: [0]
  optim:
    name: "adadelta"
    lr: 0.1
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1
evaluate:
  gpu_id: 0
  checkpoint: "??"
  test_dataset_dir: "??"
  results_dir: "${results_dir}/evaluate"
prune:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/prune"
  prune_setting:
    mode: experimental_hybrid
    amount: 0.4
    granularity: 8
    raw_prune_score: L1
inference:
  gpu_id: 0
  checkpoint: "??"
  inference_dataset_dir: "??"
  results_dir: "${resu

### 4.3 Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [45]:
!tao model ocrnet train -e $SPECS_DIR/ocr/experiment.yaml \
              train.results_dir=$RESULTS_DIR/ocr/train \
              train.pretrained_model_path=$RESULTS_DIR/pretrained_ocrnet/ocrnet_vtrainable_v1.0/ocrnet_resnet50.pth \
              train.num_epochs=20 \
              train.optim.lr=1.0 \
              dataset.train_dataset_dir=[$DATA_DIR/train/lmdb] \
              dataset.val_dataset_dir=$DATA_DIR/test/lmdb \
              dataset.character_list_file=$DATA_DIR/train/character_list.txt

2023-07-13 17:00:32,916 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 17:00:33,110 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 17:00:33,726 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 09:00:39,903 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmpv2hu0lj7
[2023-07-13 09:00:39,903 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmpv2hu0lj7/_remote_module_non_scriptable.py
'experimen

### 4.4 Evaluate trained models <a class="anchor" id="head-5"></a>

In [48]:
print('Trained:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/ocr/train/

Trained:
---------------------
total 187M
-rw-r--r-- 1 root root 1.2K Jul 13 17:14 log_val.txt
-rw-r--r-- 1 root root 187M Jul 13 17:14 best_accuracy.pth
drwxr-xr-x 3 root root 4.0K Jul 13 17:14 lightning_logs
-rw-r--r-- 1 root root 1.2K Jul 13 17:14 status.json


In [49]:
!tao model ocrnet evaluate -e $SPECS_DIR/ocr/experiment.yaml \
                 evaluate.results_dir=$RESULTS_DIR/ocr/evaluate \
                 evaluate.checkpoint=$RESULTS_DIR/ocr/train/best_accuracy.pth \
                 evaluate.test_dataset_dir=$DATA_DIR/test/lmdb \
                 dataset.character_list_file=$DATA_DIR/train/character_list.txt

2023-07-13 17:29:45,359 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 17:29:45,554 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 17:29:46,244 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 09:29:51,848 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmp0altodt2
[2023-07-13 09:29:51,848 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmp0altodt2/_remote_module_non_scriptable.py
'experimen

### 4.5 Inference

In [66]:
!tao model ocrnet inference -e $SPECS_DIR/ocr/experiment.yaml \
                 inference.results_dir=$RESULTS_DIR/ocr/inference \
                 inference.checkpoint=$RESULTS_DIR/ocr/train/best_accuracy.pth \
                 inference.inference_dataset_dir=$DATA_DIR/test/processed \
                 dataset.character_list_file=$DATA_DIR/train/character_list.txt

2023-07-15 09:27:34,525 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-15 09:27:34,706 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-15 09:27:35,411 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-15 01:27:40,935 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmp3r4l3dep
[2023-07-15 01:27:40,935 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmp3r4l3dep/_remote_module_non_scriptable.py
'experimen

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



### 4.6 Export the model for deployment

In [50]:
!tao model ocrnet export -e $SPECS_DIR/ocr/experiment.yaml \
                 export.results_dir=$RESULTS_DIR/ocr/export \
                 export.checkpoint=$RESULTS_DIR/ocr/train/best_accuracy.pth \
                 export.onnx_file=$RESULTS_DIR/ocr/export/ocrnet.onnx \
                 dataset.character_list_file=$DATA_DIR/train/character_list.txt

2023-07-13 17:41:18,473 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-07-13 17:41:18,653 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvstaging/tao/tao-toolkit-pyt:v3.22.11-1766-dev-cuda11.6
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tylerz/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-07-13 17:41:19,299 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-07-13 09:41:25,006 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmpu76a8ila
[2023-07-13 09:41:25,006 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmpu76a8ila/_remote_module_non_scriptable.py
'experimen