# Retail Object Recognition

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## Metric Learning Recognition

Retail Object Recognition pretrained models uses `metric learning recognition` pipeline in TAO. It is a classifier that encodes the input image to embedding vectors and predicts their labels based on the embedding vectors in the reference space. MLRecogNet consists of two parts

* Trunk: A backbone network that encodes the input image to a feature vector.
* Embedder: A fully connected layer that maps the feature vector to the embedding space.

The embedding space is a high-dimensional space where the distance between the embedding vectors of the same class is small and the distance between the embedding vectors of different classes is large. The embedder is trained to minimize the distance between the embedding vectors of the same class and maximize the distance between the embedding vectors of different classes. The embedding vectors of the query images are compared with the embedding vectors of the reference images to predict the labels of the query images.


## Learning Objectives

In this Notebook, you learn how to leverage the simplicity and convenience of TAO to:

* Train a model for object recogtion on an [ImageNet](https://www.image-net.org/) format example retail classification dataset.
* Evaluate the trained model & export results.
* Run Inference on the trained model.
* Export the trained model to an ONNX file for deployment to DeepStream or TensorRT.

At the end of this Notebook, you have a generated, trained `MLRecog` model which you can deploy using [DeepStream](https://developer.nvidia.com/deepstream-sdk).

## Table of Contents

This Notebook shows an example usecase of MLRecogNet using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Inferences](#head-6)
7. [Deploy](#head-7)


## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the Docker, they must be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Use it to configure your specific directories so that data, specs, results, and cache directories are correctly visible to the Docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "ml_recognition", "results")
os.environ["HOST_MODEL_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "ml_recognition", "models")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=/path/to/local/tao-experiments/metric_learning_recognition

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)


In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR
! mkdir -p $HOST_MODEL_DIR

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
       # Mapping the data directory
       {
           "source": os.environ["LOCAL_PROJECT_DIR"],
           "destination": "/workspace/tao-experiments"
       },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_MODEL_DIR"],
           "destination": "/model"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       }
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         }
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a Python package distributed as a Python wheel listed in PyPI. Install the launcher by executing the following cell.

TAO Toolkit recommends running the TAO launcher in a virtual env with Python 3.6.9. Follow the [instructions](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a Python virtual env using the `virtualenv` and `virtualenvwrapper` packages. After you have setup the virtualenvwrapper, set the version of Python in the virtual env with the `VIRTUALENVWRAPPER_PYTHON` variable, by running:

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the Notebook from the virtual environment. In addition to installing TAO python package, you must meet the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

After you have installed the pre-requisites, log in to the Docker registry nvcr.io using the following command:

```sh
docker login nvcr.io
```

Enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9, following these [instructions](https://virtualenvwrapper.readthedocs.io/en/latest/install.html), to set up a Python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info --verbose

## 2.1 Prepare dataset <a class="anchor" id="head-2"></a>

Here we use [Retail Product Checkout dataset](https://www.kaggle.com/datasets/diyer22/retail-product-checkout-dataset) to illustrate the method of training the metric recognition model for retail item recognition.

In [None]:
# [Action required] Download the dataset manually.
# [Action required] Put your downloaded .zip dataset file at $HOST_DATA_DIR/retail-product-checkout-dataset.zip

In [None]:
# Extract the files
# !apt-get install unzip
!mkdir $HOST_DATA_DIR

# set dataset root folder path
%env DATA_FOLDER=retail-product-checkout-dataset_classification_demo

# Run data processing script: 
# 1. crop the images and save as a classification dataset
# 2. split the dataset as train/val/test/reference sets
# 3. separate the classes to be known and unknown classes

# install the pkgs needed for process script if needed
!pip install Cython==0.29.36
!pip install opencv-python
!pip install pycocotools
!pip install tqdm
# now run the process script
!python $NOTEBOOK_ROOT/process_retail_product_checkout_dataset.py

In [None]:
# Verify
!ls -l $HOST_DATA_DIR/$DATA_FOLDER/known_classes

In [None]:
!ls -l $HOST_DATA_DIR/$DATA_FOLDER/unknown_classes

## 2.2 Prepare Pretrained Model

We will use NGC CLI to get the pre-trained models. For more details, go to ngc.nvidia.com and click the SETUP on the navigation bar.

In [None]:
# [Action required] Uncomment below code if you didn't have NGC CLI yet
# # Installing NGC CLI on the local machine.
# ## Download and install
import os
import platform

if platform.machine() == "x86_64":
    os.environ["CLI"]="ngccli_linux.zip"
else:
    os.environ["CLI"]="ngccli_arm64.zip"


# # Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
!ngc registry model list nvidia/tao/retail_object_recognition:*

In [None]:
# [Action required] Uncomment below code if you don't have pretrained model downloaded
# # Pull pretrained model from NGC

# # DINOv2-Large backbone and linear head (only available for NVAIE users)
# # coming: download links

# # FAN-Base backbone and linear head
# !ngc registry model download-version nvidia/tao/pretrained_fan_classification_nvimagenet:fan_base_hybrid_nvimagenet --dest $HOST_MODEL_DIR/
# !ngc registry model download-version nvidia/tao/retail_object_recognition:trainable_head_fan_base_v2.0 --dest $HOST_MODEL_DIR/

# # ResNet101 backbone
# !ngc registry model download-version nvidia/tao/trainable_v1.1:retail_object_recognition_v1.1.pth --dest $HOST_MODEL_DIR/

## 3. Provide training specification <a class="anchor" id="head-3"></a>

We provide specification files to configure the training parameters including:
 
* results_dir: a global setup for output directories, would create train/evaluation/inference/export subdirectories based on subtasks. Can be overwritten by subtask ``results_dir`` fields.

* model: configure the model setting
  * backbone: type of backbone architecture, supported backbones: `resnet_50`, `resnet_101`, `fan_tiny`, `fan_small`, `fan_base`, `fan_large`, `nvdinov2_vit_large_legacy`
  * pretrained_model_path: path for the pretrained model weights
  * pretrained_trunk_path: path for trunk pretrained weights
  * pretrained_embedder_path: path for embedder pretrained weights
  * input_width: width of an input image
  * input_height: height of an input image
  * input_channels: number of color channels for input images, always in channel first format
  * feat_dim: size of the output embedding

* train: configure the training hyperparameters
  * train_trunk: If false, the trunk parameters will be frozen. Default true.
  * train_embedder: If false, the embedder parameters will be frozen. Default true.
  * optim: configure optimizer
  * num_epochs: number of epochs
  * checkpoint_interval: enabling how often to store models
  * grad_clip: enabling gradient clipping
  * smooth_loss: enabling label smoothing feature, True/False
  * batch_size: number of images in 1 batch for training
  * val_batch_size: number of images in 1 batch for validation
  * resume_training_checkpoint_path: resume .pth model training from a saved checkpoint
  * report_accuracy_per_class: enabling accuracy per class report instead of average class accuracies, True/False
  
* dataset: configure the dataset and augmentation methods
  * train_dataset: path for the train dataset directory
  * val_dataset: map of the validation or test dataset directory. It contains reference and query set.
  * workers: number of workers to do data loading
  * pixel_mean: pixel mean in 3 channels for normalization
  * pixel_std: pixel standard deviation in 3 channels for normalization
  * prob: probability of randomly flipping images horizontally
  * re_prob: constant for random erasing
  * gaussian_blur: configurations for Gaussian blur
  * color_augmentation: configurations for color augmentation
  * num_instance: number of types 1 image is repeated in a batch
  * class_map: path to the YAML file mapping dataset class name to the new class names

* evaluate: configure evaluate subtask parameters
  * checkpoint: the .pth model for evaluation
  * trt_engine: path of the TensorRT engine for evaluate
  * report_accuracy_per_class: enabling accuracy per class report instead of average class accuracies, True/False
  * topk: get predictions by the k nearest neighbor
  * batch_size: the batch size for evaluate
  * results_dir: the evaluation output directory. Have priority over global `results_dir`

* inference: configure inference subtask parameters
  * inference_input_type: the format of query dataset, image/image_folder/classification_folder
  * checkpoint: the .pth model for inference
  * trt_engine: path of the TensorRT engine for inference
  * input_path: the inference image/image folder/classification dataset folder
  * topk: get predictions by the k nearest neighbors
  * batch_size: the batch size for inference
  * results_dir: the inference output directory. Have priority over global `results_dir`
 
* export: configure export subtask parameters
  * checkpoint: the .pth model for export (to ONNX file)
  * onnx_file: the exported ONNX model path. Have priority over the default ONNX name created from `export.results_dir`. 
  * gpu_id: the index of a single GPU only for export. default 0.
 
 
* gen_trt_engine: configure TensorRT generation subtask parameters
  * gpu_id: the index of a single GPU only for TensorRT engine generation. default 0.
  * onnx_file: path of the ONNX file that TensorRT engine converted from
  * trt_engine: path of the TensorRT engine to generate
  * batch_size: the batch size of the TensorRT engine. When `batch_size=-1`, a dynamic batch size trt engine would be generated.
  * verbose: If True, verbose information of TensorRT generation would be printed out
  * tensorrt: trt engine generation setup
  * results_dir: the trt engine generation output folder. Have priority over global `results_dir`

See the [TAO documentation - Metric Learning Recognition](https://docs.nvidia.com/tao/tao-toolkit/text/metric_learning_recognition/metric_learning_recognition.html) to get all the parameters that are configurable.

In [None]:
!cat $HOST_SPECS_DIR/train.yaml

The above example is used to train nvdinov2_vit_large backbone (trunk). To switch to other pretrained models with different backbones, you may want to refer to configurations at `$HOST_SPECS_DIR/train_resnet.yaml`, `$HOST_SPECS_DIR/train_fan.yaml`.

In [None]:
# [Action Required] Update `model.pretrained_model_path`/`model.pretrained_embedder_path`/`model.pretrained_trunk_path` if you want to try pretrained models

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models.
* WARNING: Training will take several hours or one day to complete.

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR = /data
%env MODEL_DIR = /model
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results


### 4.1 Train Metric Learning Recognition model

We will train a MLRecog model with ResNet101 backbone and 2048 embedding size output. The backbone would be loaded with weights trained by NVImageNetV2 (same classes as [ImageNet](https://www.image-net.org/) but using licensed datasets).

In [None]:
print("Train model")
! tao model ml_recog train \
              -e $SPECS_DIR/train.yaml \
              results_dir=$RESULTS_DIR \
              dataset.train_dataset=$DATA_DIR/$DATA_FOLDER/known_classes/train \
              dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/known_classes/reference \
              dataset.val_dataset.query=$DATA_DIR/$DATA_FOLDER/known_classes/val

In [None]:

## Training command for multi-gpu training. We can define the number of gpus and specify which GPU's are to be used by setting the `train.gpu_ids` parameter.
## The following command will trigger multi-gpu training on gpu 0 and gpu 1.
# ! tao model ml_recog train \
#               -e $SPECS_DIR/train.yaml \
#               results_dir=$RESULTS_DIR \
#               dataset.train_dataset=$DATA_DIR/$DATA_FOLDER/known_classes/train \
#               dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/known_classes/reference \
#               dataset.val_dataset.query=$DATA_DIR/$DATA_FOLDER/known_classes/val \
#               train.gpu_ids=[0,1]

In [None]:
print('checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
# %env NUM_EPOCH=029

# Get the name of the checkpoint corresponding to your set epoch
# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
# %env CHECKPOINT={tmp[0]}

# Or get the latest checkpoint
os.environ["CHECKPOINT"] = os.path.join(os.getenv("HOST_RESULTS_DIR"), "train/ml_model_latest.pth")

print('Rename a trained model: ')
print('---------------------')
!cp $CHECKPOINT $HOST_RESULTS_DIR/train/retail_object_recognition_model.pth
!ls -ltrh $HOST_RESULTS_DIR/train/retail_object_recognition_model.pth

## 5. Evaluate trained models <a class="anchor" id="head-5"></a>
Evaluate trained model.

In [None]:
# evaluate on known classes
! tao model ml_recog evaluate \
            -e $SPECS_DIR/evaluate.yaml \
            results_dir=$RESULTS_DIR \
            evaluate.checkpoint=$RESULTS_DIR/train/retail_object_recognition_model.pth \
            dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/known_classes/reference \
            dataset.val_dataset.query=$DATA_DIR/$DATA_FOLDER/known_classes/test 


In [None]:
# evaluate on unknown classes
! tao model ml_recog evaluate \
            -e $SPECS_DIR/evaluate.yaml \
            evaluate.results_dir=$RESULTS_DIR/eval_unknown \
            evaluate.checkpoint=$RESULTS_DIR/train/retail_object_recognition_model.pth \
            dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/unknown_classes/reference \
            dataset.val_dataset.query=$DATA_DIR/$DATA_FOLDER/unknown_classes/test 


## 6. Inferences <a class="anchor" id="head-6"></a>
In this section, we run the metric_learning inference tool to generate inferences with the trained models and save the results under `$RESULTS_DIR`. 

In [None]:
# run inference on known classes
! tao model ml_recog inference \
                    -e $SPECS_DIR/infer.yaml \
                    results_dir=$RESULTS_DIR \
                    inference.checkpoint=$RESULTS_DIR/train/retail_object_recognition_model.pth \
                    dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/known_classes/reference \
                    inference.input_path=$DATA_DIR/$DATA_FOLDER/known_classes/test 

In [None]:
# run inference on unknown classes
! tao model ml_recog inference \
                    -e $SPECS_DIR/infer.yaml \
                    inference.results_dir=$RESULTS_DIR/inference_unknown \
                    inference.checkpoint=$RESULTS_DIR/train/retail_object_recognition_model.pth \
                    dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/unknown_classes/reference \
                    inference.input_path=$DATA_DIR/$DATA_FOLDER/unknown_classes/test 

In [None]:
print('Inference results:')
print('------------')
!ls -lth $HOST_RESULTS_DIR/inference
!ls -lth $HOST_RESULTS_DIR/inference_unknown

## 7. Deploy <a class="anchor" id="head-7"></a>

In [None]:
# Export the model to ONNX model.
! tao model ml_recog export \
                   -e $SPECS_DIR/export.yaml \
                   results_dir=$RESULTS_DIR \
                   export.checkpoint=$RESULTS_DIR/train/retail_object_recognition_model.pth \
                   export.onnx_file=$RESULTS_DIR/export/retail_object_recognition_model.onnx
               

In [None]:
print('Exported model:')
print('------------')
!ls -lth $HOST_RESULTS_DIR/export

In [None]:
# Generate FP16 TensorRT engine using tao deploy
# ResNet backbones support INT8 as well. But NVDINOv2 and FAN backbones does not.
!tao deploy ml_recog gen_trt_engine -e $SPECS_DIR/gen_trt_engine.yaml \
                                       gen_trt_engine.onnx_file=$RESULTS_DIR/export/retail_object_recognition_model.onnx \
                                       gen_trt_engine.trt_engine=$RESULTS_DIR/gen_trt_engine/retail_object_recognition_model.engine \
                                       gen_trt_engine.tensorrt.data_type=fp16 \
                                       results_dir=$RESULTS_DIR


In [None]:
print('Generated tensorrt engines and calibration files:')
print('------------')
!ls -lth $HOST_RESULTS_DIR/gen_trt_engine

In [None]:
# Evaluate with generated TensorRT engine
!tao deploy ml_recog evaluate -e $SPECS_DIR/evaluate.yaml \
                                 evaluate.trt_engine=$RESULTS_DIR/gen_trt_engine/retail_object_recognition_model.engine \
                                 results_dir=$RESULTS_DIR \
                                 dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/known_classes/reference \
                                 dataset.val_dataset.query=$DATA_DIR/$DATA_FOLDER/known_classes/test


In [None]:
# Inference with generated TensorRT engine
!tao deploy ml_recog inference -e $SPECS_DIR/infer.yaml \
                                  inference.trt_engine=$RESULTS_DIR/gen_trt_engine/retail_object_recognition_model.engine \
                                  results_dir=$RESULTS_DIR \
                                  dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/known_classes/reference \
                                  inference.input_path=$DATA_DIR/$DATA_FOLDER/known_classes/test


In [None]:
print('TensorRT Inference results:')
print('------------')
!ls -lth $HOST_RESULTS_DIR/trt_inference

In [None]:
!pip3 install pandas matplotlib
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import ast

# Load the CSV file
csv_file = os.path.join(os.environ['HOST_RESULTS_DIR'], "trt_inference", "trt_result.csv")
df = pd.read_csv(csv_file, header=None, names=['image_path', 'prediction', 'distance'])

# Convert string representations of lists to actual lists
df['prediction'] = df['prediction'].apply(ast.literal_eval)
df['distance'] = df['distance'].apply(ast.literal_eval)

# Sample a few rows
sampled_df = df.sample(n=5)  # Change the number 5 to sample more or fewer images

# Visualize the images with predictions and distances
for index, row in sampled_df.iterrows():
    image_path = os.environ['HOST_DATA_DIR'] + row['image_path'][5:] # switch /data back to local data path
    img = Image.open(image_path)
    
    plt.imshow(img)
    plt.title(f"Prediction: {row['prediction'][0]}, Distance: {row['distance'][0]:.4f}")
    plt.axis('off')
    plt.show()
