## Switch to CPU Instance (Advisable only for Non Colab-Pro instance)

1. Switch to CPU Instance for until Step 2 for non GPU dependent tasks
2. This increases your time available for the GPU dependent tasks on a Colab instance
2. Change Runtime type to CPU by Runtime(Top Left tab)->Change Runtime Type->None(Hardware Accelerator)
3.   Then click on Connect (Top Right)



## Mounting Google drive
Mount your Google drive storage to this Colab instance

In [None]:
try:
    import google.colab
    %env GOOGLE_COLAB=1
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
except:
    %env GOOGLE_COLAB=0
    print("Warning: Not a Colab Environment")

## Multi-task classification using TAO

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/embedded-transfer-learning-toolkit-software-stack-1200x670px.png" width="1080">

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained resnet10 model and train a ResNet-10 Multi-task Classification model on fashion dataset
* Prune the trained model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Run Inference on the trained model
* Export the pruned and retrained model to a .etlt file for deployment to DeepStream

### Table of Contents
This notebook shows an example use case for classification using the Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables](#head-0)
1. [Prepare dataset and pre-trained model](#head-1) <br>
     1.1 [Download the dataset](#head-1-1)<br>
     1.2 [Verify the downloaded dataset](#head-1-2)<br>
     1.3 [Data preprocessing](#head-1-3)<br>
     1.4 [Download pretrained model](#head-1-4)
2. [Setup GPU environment](#head-2) <br>
    2.1 [Connect to GPU Instance](#head-2-1) <br>
    2.2 [Mounting Google drive](#head-2-2) <br>
    2.3 [Setup Python environment](#head-2-3) <br>
    2.4 [Reset env variables](#head-2-4) <br>
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Prune trained models](#head-6)
7. [Retrain pruned models](#head-7)
8. [Testing the model](#head-8)
9. [Inferences](#head-9)

## 0. Set up env variables <a class="anchor" id="head-0"></a>
When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

*Note: Please make sure to remove any stray artifacts/files from the `$EXPERIMENT_DIR` or `$DATA_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

*Note: This notebook currently is by default set up to run training using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` accordingly*

In [None]:
# Setting up env variables for cleaner command line commands.
import os

%env TAO_DOCKER_DISABLE=1

%env KEY=nvidia_tlt
%env NUM_GPUS=1

# Change the paths according to your directory structure, these are just examples
%env COLAB_NOTEBOOKS_PATH=/home_duplicate/rarunachalam/colab_notebooks
%env EXPERIMENT_DIR=/results/multitask_classification
%env DATA_DIR=/content/drive/MyDrive/multitask_classification_data/

SPECS_DIR=f"{os.environ['COLAB_NOTEBOOKS_PATH']}/tensorflow/multitask_classification/specs"
%env SPECS_DIR={SPECS_DIR}
# Showing list of specification files.
!ls -rlt $SPECS_DIR

!sudo mkdir -p $DATA_DIR && sudo chmod -R 777 $DATA_DIR
!sudo mkdir -p $EXPERIMENT_DIR && sudo chmod -R 777 $EXPERIMENT_DIR

## 1. Prepare datasets and pre-trained model <a class="anchor" id="head-1"></a>

We will be using the Fashion Product Images (Small) for the tutorial. This dataset is available on Kaggle.
 
In this tutorial, our trained classification network will perform three tasks: article category classification, base color classification and target season classification.

### 1.1 Download the dataset <a class="anchor" id="head-1-1"></a>

In [None]:
import os
!mkdir -p $DATA_DIR
!echo "Your DATA_DIR is: $DATA_DIR"

To download the dataset, you will need a Kaggle account. After login, you can download the dataset zip file here: https://www.kaggle.com/paramaggarwal/fashion-product-images-small

The downloaded file is `archive.zip` with a subfolder called `myntradataset`. Unzip contents in this subfolder to your `DATA_DIR` created in the cell above and you should have a folder called `images` and a CSV file called `styles.csv`

### 1.2 Verify the downloaded dataset <a class="anchor" id="head-1-2"></a>

In [None]:
# Check the dataset is present
!mkdir -p $DATA_DIR
!if [ ! -d $DATA_DIR/images ]; then echo 'images folder NOT found.'; else echo 'Found images folder.';fi
!if [ ! -f $DATA_DIR/styles.csv ]; then echo 'CSV file NOT found.'; else echo 'Found CSV file.';fi

### 1.3 Data preprocessing <a class="anchor" id="head-1-3"></a>

In order to make data trainable in TAO, we need to preprocess it and do train / val split.

TAO Multitask classification requires:   
1. A training label CSV file containing labels for training images
2. A validation label CSV file containing labels for validation images
3. An image folder containing all train and val images (may also contain other images, the images to be used is controlled by CSV files).

The CSV files for training / validation labels should have following patterns:
1. The first column should always be `fname` containing file names for images (without folder prefix)
2. Rest of columns should be the name of individual tasks. There're no limitations on the number of tasks

For example, if your validation set has 2 images, the CSV should look like this:

| fname     | base_color | category | season |
|-----------|------------|----------|--------|
| 10000.jpg | Blue       | Shoes    | Spring |
| 10001.jpg | White      | Bags     | Fall   |

We also need to do train/val split. Here, we use 10% of data (random chosen) as validation set.

In [None]:
import os
import numpy as np
import pandas as pd

df = pd.read_csv(os.environ['DATA_DIR'] + '/styles.csv', error_bad_lines=False, warn_bad_lines=False)
df = df[['id', 'baseColour', 'subCategory', 'season']]
df = df.dropna()
category_cls = df.subCategory.value_counts()[:10].index # 10-class classification
season_cls = ['Spring', 'Summer', 'Fall', 'Winter'] # 4-class classification
color_cls = df.baseColour.value_counts()[:11].index # 11-class classification

# Get all valid rows
df = df[df.subCategory.isin(category_cls) & df.season.isin(season_cls) & df.baseColour.isin(color_cls)]
df.columns = ['fname', 'base_color', 'category', 'season']
df.fname = df.fname.astype(str)
df.fname = df.fname + '.jpg'

# remove entries whose image file is missing
all_img_files = os.listdir(os.environ['DATA_DIR'] + '/images')
df = df[df.fname.isin(all_img_files)]

idx = np.arange(len(df))
np.random.shuffle(idx)
val_df = df.iloc[idx[:(len(df) // 10)]]
train_df = df.iloc[idx[(len(df) // 10):]]

# Add a simple sanity check
assert len(val_df.season.unique()) == 4 and len(val_df.base_color.unique()) == 11 and \
    len(val_df.category.unique()) == 10, 'Validation set misses some classes, re-run this cell!'
assert len(train_df.season.unique()) == 4 and len(train_df.base_color.unique()) == 11 and \
    len(train_df.category.unique()) == 10, 'Training set misses some classes, re-run this cell!'

# save processed data labels
train_df.to_csv(os.environ['DATA_DIR'] + '/train.csv', index=False)
val_df.to_csv(os.environ['DATA_DIR'] + '/val.csv', index=False)

In [None]:
# verify
import pandas as pd

print("Number of images in the train set. {}".format(
    len(pd.read_csv(os.environ['DATA_DIR'] + '/train.csv'))
))
print("Number of images in the validation set. {}".format(
    len(pd.read_csv(os.environ['DATA_DIR'] + '/val.csv'))
))

In [None]:
# Sample label.
pd.read_csv(os.environ['DATA_DIR'] + '/val.csv').head()

### 1.4 Download pre-trained model <a class="anchor" id="head-1-4"></a>

 We will use NGC CLI to get the pre-trained models. For more details, go to ngc.nvidia.com and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
%env LOCAL_PROJECT_DIR=/ngc_content/
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u -q "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))
!cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 $LOCAL_PROJECT_DIR/ngccli/ngc-cli/libstdc++.so.6

In [None]:
!ngc registry model list nvidia/tao/pretrained_classification:*

In [None]:
!mkdir -p $EXPERIMENT_DIR/pretrained_resnet10/

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_classification:resnet10 --dest $EXPERIMENT_DIR/pretrained_resnet10

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $EXPERIMENT_DIR/pretrained_resnet10/pretrained_classification_vresnet10

## 2. Setup GPU environment <a class="anchor" id="head-2"></a>


### 2.1 Connect to GPU Instance <a class="anchor" id="head-2-1"></a>

1. Move any data saved to the Colab Instance storage to Google Drive  
2. Change Runtime type to GPU by Runtime(Top Left tab)->Change Runtime Type->GPU(Hardware Accelerator)
3.   Then click on Connect (Top Right)



### 2.2 Mounting Google drive <a class="anchor" id="head-2-2"></a>
Mount your Google drive storage to this Colab instance

In [None]:
try:
    import google.colab
    %env GOOGLE_COLAB=1
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
except:
    %env GOOGLE_COLAB=0
    print("Warning: Not a Colab Environment")

### 2.3 Setup Python environment <a class="anchor" id="head-2-3"></a>
Setup the environment necessary to run the TAO Networks by running the bash script

In [None]:
#FIXME
%env GENERAL_WHL_PATH=/content/drive/MyDrive/tf/general_whl
#FIXME
%env CODEBASE_WHL_PATH=/content/drive/MyDrive/tf/codebase_whl

if os.environ["GOOGLE_COLAB"] == "1":
    os.environ["bash_script"] = "setup_env.sh"
else:
    os.environ["bash_script"] = "setup_env_desktop.sh"

!sed -i "s|PATH_TO_GENERAL_WHL|$GENERAL_WHL_PATH|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|PATH_TO_CODEBASE_WHL|$CODEBASE_WHL_PATH|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|PATH_TO_COLAB_NOTEBOOKS|$COLAB_NOTEBOOKS_PATH|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

!sh $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

In [None]:
import os
if os.environ.get("PYTHONPATH","") == "":
    os.environ["PYTHONPATH"] = ""
os.environ["PYTHONPATH"]+=":/opt/nvidia/"
if os.environ["GOOGLE_COLAB"] == "1":
    os.environ["PYTHONPATH"]+=":/usr/local/lib/python3.6/dist-packages/third_party/nvml"
else:
    os.environ["PYTHONPATH"]+=":/home_duplicate/rarunachalam/miniconda3/envs/tf_py_36/lib/python3.6/site-packages/third_party/nvml" # FIX MINICONDA PATH

### 2.4 Reset env variables <a class="anchor" id="head-2-4"></a>

In [None]:
# Setting up env variables for cleaner command line commands.
import os

%env TAO_DOCKER_DISABLE=1

%env KEY=nvidia_tlt
%env NUM_GPUS=1

# Change the paths according to your directory structure, these are just examples
%env COLAB_NOTEBOOKS_PATH=/home_duplicate/rarunachalam/colab_notebooks
%env EXPERIMENT_DIR=/results/multitask_classification
%env DATA_DIR=/content/drive/MyDrive/multitask_classification_data/

SPECS_DIR=f"{os.environ['COLAB_NOTEBOOKS_PATH']}/tensorflow/multitask_classification/specs"
%env SPECS_DIR={SPECS_DIR}
# Showing list of specification files.
!ls -rlt $SPECS_DIR

## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Training dataset
* Validation dataset
* Pre-trained models
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [None]:
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/mclassification_spec.cfg
!sed -i "s|EXPERIMENT_DIR_PATH|$EXPERIMENT_DIR/|g" $SPECS_DIR/mclassification_spec.cfg
!cat $SPECS_DIR/mclassification_spec.cfg

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models

In [None]:
!tao multitask_classification train -e $SPECS_DIR/mclassification_spec.cfg \
                                    -r $EXPERIMENT_DIR \
                                    -k $KEY \
                                    --gpus $NUM_GPUS

In [None]:
print("To resume from checkpoint, please change pretrain_model_path to resume_model_path in config file.")

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $EXPERIMENT_DIR/multitask_cls_training_log_resnet10.csv
%env EPOCH=010

## 5. Evaluate trained models <a class="anchor" id="head-5"></a>


In [None]:
!tao multitask_classification evaluate -m $EXPERIMENT_DIR/weights/multitask_cls_resnet10_epoch_$EPOCH.tlt \
                                       -e $SPECS_DIR/mclassification_spec.cfg \
                                       -k $KEY

## 6. Prune trained models <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Equalization criterion
* Threshold for pruning

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is depend on the dataset. A pth value 0.65 is just a starting point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $EXPERIMENT_DIR/resnet_pruned
!tao multitask_classification prune -m $EXPERIMENT_DIR/weights/multitask_cls_resnet10_epoch_$EPOCH.tlt \
                                    -o $EXPERIMENT_DIR/resnet_pruned/resnet10_pruned.tlt \
                                    -eq union \
                                    -pth 0.65 \
                                    -k $KEY \
                                    --results_dir $EXPERIMENT_DIR/logs

In [None]:
print('Pruned model:')
print('------------')
!ls -rlt $EXPERIMENT_DIR/resnet_pruned

## 7. Retrain pruned models <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification

In [None]:
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/mclassification_retrain_spec.cfg
!sed -i "s|EXPERIMENT_DIR_PATH|$EXPERIMENT_DIR/|g" $SPECS_DIR/mclassification_retrain_spec.cfg
!cat $SPECS_DIR/mclassification_retrain_spec.cfg

In [None]:
!tao multitask_classification train -e $SPECS_DIR/mclassification_retrain_spec.cfg \
                                    -r $EXPERIMENT_DIR/resnet_pruned \
                                    -k $KEY \
                                    --gpus $NUM_GPUS

## 8. Testing the model! <a class="anchor" id="head-8"></a>

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $EXPERIMENT_DIR/resnet_pruned/multitask_cls_training_log_resnet10.csv
%env EPOCH=010

In [None]:
!tao multitask_classification evaluate -m $EXPERIMENT_DIR/resnet_pruned/weights/multitask_cls_resnet10_epoch_$EPOCH.tlt \
                                       -e $SPECS_DIR/mclassification_retrain_spec.cfg \
                                       -k $KEY

TAO also provides `confmat` command to generate confusion matrix of the model on an unseen dataset. Users need to provide the image folder and the dataset labels. Here, we use the validation dataset as sample.

In [None]:
!tao multitask_classification confmat -m $EXPERIMENT_DIR/resnet_pruned/weights/multitask_cls_resnet10_epoch_$EPOCH.tlt \
                                      -i $DATA_DIR/images \
                                      -l $DATA_DIR/val.csv \
                                      -k $KEY

## 9. Inferences <a class="anchor" id="head-9"></a>

TAO provides `inference` command to infer on a single image. User needs to provide class mapping JSON file generated during training process.

In [None]:
!pip3 install matplotlib==3.3.3
import matplotlib.pyplot as plt
from PIL import Image 
import os

DEMO_IMAGE = '1654.jpg'
image_path = os.path.join(os.environ.get('DATA_DIR'), 'images', DEMO_IMAGE)
plt.imshow(Image.open(image_path))
os.environ['DEMO_IMG_PATH'] = os.path.join(os.environ.get('DATA_DIR'), 'images/', DEMO_IMAGE)

In [None]:
!tao multitask_classification inference -m $EXPERIMENT_DIR/resnet_pruned/weights/multitask_cls_resnet10_epoch_$EPOCH.tlt \
                                        -i $DEMO_IMG_PATH \
                                        -cm $EXPERIMENT_DIR/class_mapping.json \
                                        -k $KEY