### TAO remote client (Multitask Image classification)

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)


### The workflow in a nutshell

- Creating a dataset
- Upload VOC dataset to the service
- Running dataset convert
- Getting a PTM from NGC
- Model Actions
    - Train
    - Evaluate
    - Prune, retrain
    - Export
    - Convert
    - Inference on TAO
    - Inference on TRT

### Table of contents

1. [Install TAO remote client ](#head-1)
1. [Set the remote service base URL](#head-2)
1. [Access the shared volume](#head-3)
1. [Create the datasets](#head-4)
1. [List datasets](#head-5)
1. [Provide and customize dataset convert specs](#head-6)
1. [Run dataset convert](#head-7)
1. [Create a model experiment](#head-8)
1. [Find multitask-classification pretrained model](#head-9)
1. [Customize model metadata](#head-10)
1. [Provide train specs](#head-11)
1. [Run train](#head-12)
1. [Provide evaluate specs](#head-13)
1. [Run evaluate](#head-14)
1. [Provide prune specs](#head-15)
1. [Run prune](#head-16)
1. [Provide retrain specs](#head-17)
1. [Run retrain](#head-18)
1. [Run evaluate on retrain](#head-18-1)
1. [Provide FP32 export specs](#head-19)
1. [Run FP32 export](#head-20)
1. [Provide Int8 export specs](#head-21)
1. [Run Int8 export](#head-22)
1. [Provide model convert specs](#head-23)
1. [Run model convert](#head-24)
1. [Provide TAO inference specs](#head-25)
1. [Run TAO inference](#head-26)
1. [Delete experiment](#head-30)
1. [Delete datasets](#head-31)
1. [Unmount shared volume](#head-32)
1. [Uninstall TAO Remote Client](#head-33)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import os
import glob
import subprocess
import getpass
import uuid
import json

In [None]:
namespace = 'default'

### Install TAO remote client <a class="anchor" id="head-1"></a>

In [None]:
# SKIP this step IF you have already installed the TAO-Client wheel.
! pip3 install nvidia-tao-client

In [None]:
# View the version of the TAO-Client
! tao-client --version

### FIXME

1. Assign the ip_address and port_number in FIXME 1 and FIXME 2 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
2. Set NGC API key in FIXME 3
3. Assign path of data directory in FIXME 4
4. Choose between default or custom dataset in FIXME 5

### Set the remote service base URL <a class="anchor" id="head-2"></a>

In [None]:
# Define the node_addr and port number
node_addr = "<ip_address>" # FIXME1 example: 10.137.149.22
node_port = "<port_number>" # FIXME2 example: 32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'
%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1

In [None]:
# FIXME: Set ngc_api_key valiable
ngc_api_key = "<ngc_api_key>" # FIXME3 example: zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyM

# Exchange NGC_API_KEY for JWT
identity = json.loads(subprocess.getoutput(f'tao-client login --ngc-api-key {ngc_api_key}'))

%env USER={identity['user_id']}
%env TOKEN={identity['token']}

### Access the shared volume <a class="anchor" id="head-3"></a>

In [None]:
# Get PVC ID
pvc_id = subprocess.getoutput(f'kubectl get pvc tao-toolkit-api-pvc -n {namespace} -o jsonpath="{{.spec.volumeName}}"')
print(pvc_id)

In [None]:
# Get NFS server info
provisioner = json.loads(subprocess.getoutput(f'helm get values nfs-subdir-external-provisioner -o json'))
nfs_server = provisioner['nfs']['server']
nfs_path = provisioner['nfs']['path']
print(nfs_server, nfs_path)

In [None]:
user = getpass.getuser()
home = os.path.expanduser('~')

! echo "Password for {user}"
password = getpass.getpass()

In [None]:
# Mount shared volume 
! mkdir -p ~/shared

command = "apt-get -y install nfs-common >> /dev/null"
! echo {password} | sudo -S -k {command}

command = f"mount -t nfs {nfs_server}:{nfs_path}/{namespace}-tao-toolkit-api-pvc-{pvc_id} ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE

### Create the datasets <a class="anchor" id="head-4"></a>

We will be using the `Fashion Product Images (Small)` for the tutorial. This dataset is available on Kaggle.In this tutorial, our trained multitask-classification network will perform three tasks: article category multitask-classification, base color multitask-classification and target season multitask-classification.
 
To download the dataset, you will need a Kaggle account. After login, you can download the dataset zip file [here](https://www.kaggle.com/paramaggarwal/fashion-product-images-small). The downloaded file is archive.zip with a subfolder called myntradataset. Unzip contents in this subfolder to your workdir created in the cell above and you should have a folder called images and a CSV file called styles.csv

**If using custom dataset; it should follow this dataset structure**
```
DATA_DIR
├── images
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   |   ├── ...
├── styles.csv
```

In [None]:
DATA_DIR = "multitask-classification_data" # FIXME4
os.environ['DATA_DIR']= DATA_DIR
!mkdir -p $DATA_DIR

In [None]:
dataset_to_be_used = "default" # FIXME5 example: default/custom; default for the dataset used in this tutorial notebook; custom for a different dataset

In [None]:
if dataset_to_be_used == "default":
    if not os.path.exists(os.path.join(DATA_DIR,"archive.zip")):
        print("Download Fashion products data at ", DATA_DIR)
    else:
        !unzip -uq $DATA_DIR/archive.zip -d $DATA_DIR/

In [None]:
# Check the dataset is present
!if [ ! -d $DATA_DIR/images ]; then echo 'images folder NOT found.'; else echo 'Found images folder.';fi
!if [ ! -f $DATA_DIR/styles.csv ]; then echo 'CSV file NOT found.'; else echo 'Found CSV file.';fi

In [None]:
# Create subdirectories and remove existing files in them
!mkdir -p $DATA_DIR/images_train && rm -rf $DATA_DIR/images_train/*
!mkdir -p $DATA_DIR/images_val && rm -rf $DATA_DIR/images_val/*
!mkdir -p $DATA_DIR/images_test && rm -rf $DATA_DIR/images_test/*

In [None]:
!python3 -m pip install numpy
!python3 -m pip install pandas
import os
import shutil
import numpy as np
import pandas as pd

df = pd.read_csv(os.environ['DATA_DIR'] + '/styles.csv', error_bad_lines=False, warn_bad_lines=False)
df = df[['id', 'baseColour', 'subCategory', 'season']]
df = df.dropna()
category_cls = df.subCategory.value_counts()[:10].index # 10-class multitask-classification
season_cls = ['Spring', 'Summer', 'Fall', 'Winter'] # 4-class multitask-classification
color_cls = df.baseColour.value_counts()[:11].index # 11-class multitask-classification

# Get all valid rows
df = df[df.subCategory.isin(category_cls) & df.season.isin(season_cls) & df.baseColour.isin(color_cls)]
df.columns = ['fname', 'base_color', 'category', 'season']
df.fname = df.fname.astype(str)
df.fname = df.fname + '.jpg'

# remove entries whose image file is missing
all_img_files = os.listdir(os.environ['DATA_DIR'] + '/images')
df = df[df.fname.isin(all_img_files)]

idx = np.arange(len(df))
np.random.shuffle(idx)

train_split_idx = int(len(df)*0.8)
train_df = df.iloc[idx[:train_split_idx]]
val_df = df.iloc[idx[train_split_idx:train_split_idx+(len(df) // 10)]]
test_df = df.iloc[idx[train_split_idx+(len(df) // 10):]]

# Add a simple sanity check
assert len(train_df.season.unique()) == 4 and len(train_df.base_color.unique()) == 11 and \
    len(train_df.category.unique()) == 10, 'Training set misses some classes, re-run this cell!'
assert len(val_df.season.unique()) == 4 and len(val_df.base_color.unique()) == 11 and \
    len(val_df.category.unique()) == 10, 'Validation set misses some classes, re-run this cell!'
assert len(test_df.season.unique()) == 4 and len(test_df.base_color.unique()) == 11 and \
    len(test_df.category.unique()) == 10, 'Test set misses some classes, re-run this cell!'

for image_name in train_df["fname"]:
    source_file_name = os.path.join(DATA_DIR, "images", image_name)
    destination_file_name = os.path.join(DATA_DIR, "images_train", image_name)
    shutil.copy(source_file_name, destination_file_name)

for image_name in val_df["fname"]:
    source_file_name = os.path.join(DATA_DIR, "images", image_name)
    destination_file_name = os.path.join(DATA_DIR, "images_train", image_name)
    shutil.copy(source_file_name, destination_file_name)
    destination_file_name = os.path.join(DATA_DIR, "images_val", image_name)
    shutil.copy(source_file_name, destination_file_name)

for image_name in test_df["fname"]:
    source_file_name = os.path.join(DATA_DIR, "images", image_name)
    destination_file_name = os.path.join(DATA_DIR, "images_test", image_name)
    shutil.copy(source_file_name, destination_file_name)
    
# save processed data labels
train_df.to_csv(os.environ['DATA_DIR'] + '/train.csv', index=False)
val_df.to_csv(os.environ['DATA_DIR'] + '/val.csv', index=False)

In [None]:
# Check if the split made is valid
!if [ ! -f $DATA_DIR/train.csv ]; then echo 'train csv NOT found.'; else echo 'Found train csv.';fi
!if [ ! -f $DATA_DIR/val.csv ]; then echo 'val csv NOT found.'; else echo 'Found val csv.';fi

In [None]:
train_dataset_id = subprocess.getoutput("tao-client multitask-classification dataset-create --dataset_type image_classification --dataset_format custom")
print(train_dataset_id)

In [None]:
! rsync -ah --info=progress2 {DATA_DIR}/images_train/* ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/images/
! rsync -ah --info=progress2 {DATA_DIR}/train.csv ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
! rsync -ah --info=progress2 {DATA_DIR}/val.csv ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
! echo DONE

In [None]:
eval_dataset_id = subprocess.getoutput("tao-client multitask-classification dataset-create --dataset_type image_classification --dataset_format custom")
print(eval_dataset_id)

In [None]:
! rsync -ah --info=progress2 {DATA_DIR}/images_val/* ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/images/
! rsync -ah --info=progress2 {DATA_DIR}/val.csv ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
! echo DONE

In [None]:
infer_dataset_id = subprocess.getoutput("tao-client multitask-classification dataset-create --dataset_type image_classification --dataset_format raw")
print(infer_dataset_id)

In [None]:
! rsync -ah --info=progress2 {DATA_DIR}/images_test/* ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}/images/
! echo DONE

### List datasets <a class="anchor" id="head-5"></a>

In [None]:
pattern = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', '*', 'metadata.json')

datasets = []
for metadata_path in glob.glob(pattern):
    with open(metadata_path, 'r') as metadata_file:
        datasets.append(json.load(metadata_file))

print(json.dumps(datasets, indent=2))

### Create a model experiment <a class="anchor" id="head-8"></a>

In [None]:
network_arch = "multitask_classification"
model_id = subprocess.getoutput(f"tao-client multitask-classification model-create --network_arch {network_arch} --encryption_key tlt_encode ")
print(model_id)

### Find multitask_classification pretrained model <a class="anchor" id="head-9"></a>

In [None]:
pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')

ptm_id = None
for metadata_path in glob.glob(pattern):
  with open(metadata_path, 'r') as metadata_file:
    metadata = json.load(metadata_file)
    ngc_path = metadata.get("ngc_path")
    metadata_architecture = metadata.get("network_arch")
    if metadata_architecture == network_arch and ngc_path.endswith("pretrained_classification:resnet10"):
      ptm_id = metadata["id"]
      break

print(ptm_id)

### Customize model metadata <a class="anchor" id="head-10"></a>

In [None]:
metadata_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'metadata.json')

with open(metadata_path , "r") as metadata_file:
    metadata = json.load(metadata_file)

metadata["train_datasets"] = [train_dataset_id]
metadata["eval_dataset"] = eval_dataset_id
metadata["inference_dataset"] = infer_dataset_id
metadata["ptm"] = ptm_id

with open(metadata_path, "w") as metadata_file:
    json.dump(metadata, metadata_file, indent=2)

print(json.dumps(metadata, indent=2))

### Provide train specs <a class="anchor" id="head-11"></a>

In [None]:
# Default train model specs
! tao-client multitask-classification model-train-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/train.json

In [None]:
# Customize train model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'train.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["training_config"]["num_epochs"] = 2

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run train <a class="anchor" id="head-12"></a>

In [None]:
train_job_id = subprocess.getoutput("tao-client multitask-classification model-train --id " + model_id)
print(train_job_id)

In [None]:
def my_tail(logs_dir, log_file):
    %env LOG_FILE={logs_dir}/{log_file}
    ! mkdir -p {logs_dir}
    ! [ ! -f "$LOG_FILE" ] && touch $LOG_FILE && chmod 666 $LOG_FILE
    ! tail -f -n +1 $LOG_FILE | while read LINE; do echo "$LINE"; [[ "$LINE" == "EOF" ]] && pkill -P $$ tail; done

# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'logs')
log_file = f"{train_job_id}.txt"

my_tail(logs_dir, log_file)

### Provide evaluate specs <a class="anchor" id="head-13"></a>

In [None]:
# Default evaluate model specs
! tao-client multitask-classification model-evaluate-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/evaluate.json

In [None]:
# Customize evaluate model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'evaluate.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Change any spec if you wish

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run evaluate <a class="anchor" id="head-14"></a>

In [None]:
eval_job_id = subprocess.getoutput(f"tao-client multitask-classification model-evaluate --id {model_id} --job {train_job_id}")
print(eval_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{eval_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide prune specs <a class="anchor" id="head-15"></a>

In [None]:
# Default prune model specs
! tao-client multitask-classification model-prune-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/prune.json

### Run prune <a class="anchor" id="head-16"></a>

In [None]:
prune_job_id = subprocess.getoutput(f"tao-client multitask-classification model-prune --id {model_id} --job {train_job_id}")
print(prune_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{prune_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide retrain specs <a class="anchor" id="head-17"></a>

In [None]:
# Default retrain model specs
! tao-client multitask-classification model-retrain-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/retrain.json

In [None]:
# Customize retrain model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'retrain.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["training_config"]["num_epochs"] = 2

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run retrain <a class="anchor" id="head-18"></a>

In [None]:
retrain_job_id = subprocess.getoutput(f"tao-client multitask-classification model-retrain --id {model_id} --job {prune_job_id}")
print(retrain_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{retrain_job_id}.txt"
my_tail(logs_dir, log_file)

### Run evaluate on retrained model <a class="anchor" id="head-18-1"></a>

In [None]:
eval2_job_id = subprocess.getoutput(f"tao-client multitask-classification model-evaluate --id {model_id} --job {retrain_job_id}")
print(eval2_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{eval2_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide FP32 export specs <a class="anchor" id="head-19"></a>

In [None]:
# Default export model specs
! tao-client multitask-classification model-export-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/export.json

In [None]:
# Customize export model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'export.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["data_type"] = "fp32"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run FP32 export <a class="anchor" id="head-20"></a>

In [None]:
fp32_export_job_id = subprocess.getoutput(f"tao-client multitask-classification model-export --id {model_id} --job {train_job_id}")
print(fp32_export_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{fp32_export_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide Int8 export specs <a class="anchor" id="head-21"></a>

In [None]:
# Default export model specs
! tao-client multitask-classification model-export-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/export.json

In [None]:
# Customize export model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'export.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["data_type"] = "int8"
specs["batches"] = 10
specs["batch_size"] = 4

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run Int8 export <a class="anchor" id="head-22"></a>

In [None]:
int8_export_job_id = subprocess.getoutput(f"tao-client multitask-classification model-export --id {model_id} --job {train_job_id}")
print(int8_export_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{int8_export_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide model convert specs <a class="anchor" id="head-23"></a>

In [None]:
# Default convert model specs
! tao-client multitask-classification model-convert-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/convert.json

In [None]:
# Customize convert model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'convert.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["t"] = "int8"
specs["b"] = 8
specs["m"] = 16
specs["d"] = "3,80,60"
specs["i"] = "nchw"
specs["o"] = "base_color/Softmax,category/Softmax,season/Softmax"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run model convert <a class="anchor" id="head-24"></a>

In [None]:
convert_job_id = subprocess.getoutput(f"tao-client multitask-classification model-convert --id {model_id} --job {int8_export_job_id}")
print(convert_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{convert_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide TAO inference specs <a class="anchor" id="head-25"></a>

In [None]:
# Default inference model specs
! tao-client multitask-classification model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Change any spec if you wish

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TAO inference <a class="anchor" id="head-26"></a>

In [None]:
tlt_inference_job_id = subprocess.getoutput(f"tao-client multitask-classification model-inference --id {model_id} --job {train_job_id}")
print(tlt_inference_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{tlt_inference_job_id}.txt"
my_tail(logs_dir, log_file)

### Delete experiment <a class="anchor" id="head-30"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/models/{model_id}
! echo DONE

### Delete datasets <a class="anchor" id="head-31"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}
! echo DONE

### Unmount shared volume <a class="anchor" id="head-32"></a>

In [None]:
command = "umount ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE

### Uninstall TAO Remote Client <a class="anchor" id="head-33"></a>

In [None]:
! pip3 uninstall -y nvidia-tao-client