### Notebook to demonstrate TAO workflow on purpose built models

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)

### The workflow in a nutshell

- Creating a dataset
- Upload dataset to the service
- Running dataset convert (for specific models)
- Getting a PTM from NGC
- Model Actions
    - Train (Normal/AutoML)
    - Evaluate
    - Prune, retrain (for specific models)
    - Export
    - TAO-Deploy (for specific models)
    - Inference on TAO
    - Inference on TRT (for specific models)
    
### Table of contents

1. [Create datasets](#head-1)
1. [List the created datasets](#head-2)
1. [Dataset convert Action for train dataset](#head-3) (for specific models)
1. [Dataset convert Action for val dataset](#head-3.1) (for specific models)
1. [Create model](#head-4)
1. [List models](#head-5)
1. [Assign datasets](#head-6)
1. [Assign PTM](#head-7)
1. [View hyperparameters that are enabled by default](#head-8)
1. [Set AutoML related configurations](#head-9)
1. [Actions](#head-10)
1. [Train](#head-11)
1. [Evaluate](#head-12)
1. [Optimize: Apply specs for prune](#head-14) (for specific models)   
1. [Optimize: Apply specs for retrain](#head-15) (for specific models)
1. [Optimize: Run actions](#head-16) (for specific models)
1. [Export](#head-17)
1. [TRT Engine generation using TAO-Deploy](#head-18) (for specific models)
1. [TAO inference](#head-19)
1. [TRT inference](#head-20) (for specific models)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import json
import os
import requests
import uuid
import time
from IPython.display import clear_output

### FIXME

1. Assign a model_name in FIXME 1

    1.1 Assign model type for action_recognition/fpenet/lprnet/pose_classification in FIXME 1.1

    1.2 Assign platform for action_recognition in FIXME 1.2
    
    1.3 Assign model input type for action_recognition in FIXME 1.3
2. Assign a workdir in FIXME 2
3. Assign the ip_address and port_number in FIXME 3 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
4. Assign the ngc_api_key variable in FIXME 4
5. (Optional) Enable AutoML if needed in FIXME 5
6. Choose between default and custom dataset in FIXME 6
7. Assign path of DATA_DIR in FIXME 7
8. Choose between Bayesian and Hyperband automl_algorithm in FIXME 8 (If automl was enabled in FIXME5)

In [None]:
# Define model_name workspaces and other variables
# Available models (#FIXME 1):
# 1. action_recognition - https://docs.nvidia.com/tao/tao-toolkit/text/action_recognition_net.html
# 2. bpnet - https://docs.nvidia.com/tao/tao-toolkit/text/bodypose_estimation/bodyposenet.html
# 3. fpenet - https://docs.nvidia.com/tao/tao-toolkit/text/facial_landmarks_estimation/facial_landmarks_estimation.html
# 4. lprnet - https://docs.nvidia.com/tao/tao-toolkit/text/character_recognition/index.html
# 5. ml_recog - https://docs.nvidia.com/tao/tao-toolkit/text/ml_recog/index.html
# 6. ocdnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocdnet/index.html
# 7. ocrnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocrnet/index.html
# 8. optical_inspection - https://docs.nvidia.com/tao/tao-toolkit/text/optical_inspection/index.html
# 9. pose_classification - https://docs.nvidia.com/tao/tao-toolkit/text/pose_classification/index.html
# 10. pointpillars - https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html
# 11. re_identification - https://docs.nvidia.com/tao/tao-toolkit/text/re_identification/index.html

model_name = "action_recognition" # FIXME1 (Add the model name from the above mentioned list)

In [None]:
if model_name in ("action_recognition","fpenet","lprnet","pose_classification"):
    # FIXME1.1 - model_type - string
        # action-recognition: rgb/of/joint;
        # fpenet: 10/80 (value represents the number of keypoints)
        # lprnet: us/ch (us for United States, ch for China)
        # pose-classification: kinetics/nvidia
    model_type = "rgb"

    if model_name == "action_recognition":
        if model_type not in ("rgb","of","joint"):
            raise Exception("Choose one of rgb/of/joint for action recognition model_type")
    elif model_name == "fpenet":
        if model_type not in ("10","80"):
            raise Exception("Choose one of 10/80 for FPENET model_type")
    elif model_name == "lprnet":
        if model_type not in ("us","ch"):
            raise Exception("Choose one of us/ch for LPRNET model_type")
    elif model_name == "pose_classification":
        if model_type not in ("kinetics","nvidia"):
            raise Exception("Choose one of kinetics/nvidia for pose classification model_type")

    if model_name == "action_recognition":
        platform = "a100" # FIXME1.2 a100/xavier - valid only for model_type that is not rgb
        model_input_type = "3d" # FIXME1.3 3d/2d

In [None]:
workdir = "workdir_purpose_built_models" # FIXME2
host_url = "http://<ip_address>:<port_number>" # FIXME3 example: https://10.137.149.22:32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'
ngc_api_key = "<ngc_api_key>" # FIXME4 example: (Add NGC API key)

In [None]:
automl_enabled = False # FIXME5 set to True if you want to run automl for the model chosen in the previous cell

In [None]:
# Exchange NGC_API_KEY for JWT
response = requests.get(f"{host_url}/api/v1/login/{ngc_api_key}")
user_id = response.json()["user_id"]
print("User ID",user_id)
token = response.json()["token"]
print("JWT",token)

# Set base URL
base_url = f"{host_url}/api/v1/user/{user_id}"
print("API Calls will be forwarded to",base_url)

headers = {"Authorization": f"Bearer {token}"}

In [None]:
# Creating workdir
if not os.path.isdir(workdir):
    os.makedirs(workdir)

### Create datasets <a class="anchor" id="head-1"></a>

**Action Recognition:** We will be using the HMDB51 [dataset](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) for the tutorial. (We choose catch/smile for this tutorial):

**BPNET:** We will be using the `COCO dataset` for Instance segmentation - MaskRCNN. `download_coco.sh` script from dataset prepare will be used to download and unzip the coco2017 dataset from [here](https://cocodataset.org/#download)

**FPENET:** We will be using `AFW dataset`. Download it from [here](https://ibug.doc.ic.ac.uk/download/annotations/afw.zip/) and place it in $DATA_DIR.

**LPRNET**: We will be using the `OpenALPR benchmark dataset` for the tutorial. The following script will download the dataset automatically and convert it to the format used by TAO.  

**MLRecogNet** We will be using the `Retail Product Checkout Dataset` for the tutorial. Downdload the datsaet from [here](https://www.kaggle.com/datasets/diyer22/retail-product-checkout-dataset) and place it under $DATA_DIR/metric_learning_recognition

**OCDNET**: We will be using the ICDAR2015 dataset for the ocdnet tutorial. Please access the dataset [here](https://rrc.cvc.uab.es/?ch=4&com=tasks) to register and download the data from Task 4.1: Text Localization. Unzip the files to DATA_DIR

**OCRNET**: We will be using the ICDAR15 word recognition dataset for the tutorial. To find more details please visit [here](
https://rrc.cvc.uab.es/?ch=4&com=tasks). Please download the ICDAR15 word recognition train dataset and test_dataset [here](https://rrc.cvc.uab.es/?ch=4&com=downloads) to DATA_DIR.

**Pointpillars:** We will be using the `kitti object detection dataset` for this example. To find more details, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)

**Pose Classification:** We will be using the Kinetics dataset from [Deepmind](https://deepmind.com/research/open-source/kinetics) or NVIDIA created dataset. For kinetics based dataset set model_type as `kinetics` and for nvidia based dataset set model_type as `nvidia`

**Re-Identification:** We will be using the [Market-1501](https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html) dataset. Download the dataset [here](https://drive.google.com/file/d/1TwkgQcIa_EgRjVMPSbyEKtcfljqURrzi/view?usp=sharing) and extract it.

In [None]:
dataset_to_be_used = "default" #FIXME6 #default/custom; default for the dataset used in this tutorial notebook; custom for a different dataset
DATA_DIR = os.path.abspath(model_name) # FIXME7 (set absolute path of the data_directory)
os.environ['DATA_DIR']= DATA_DIR
!mkdir -p $DATA_DIR

In [None]:
if dataset_to_be_used == "default":
    if model_name == "action_recognition":
        !sudo apt-get update -y && sudo apt-get install unrar-free -y
        !wget -P $DATA_DIR http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/hmdb51_org.rar
        !mkdir -p $DATA_DIR/videos && unrar x -o+ $DATA_DIR/hmdb51_org.rar $DATA_DIR/videos
        !mkdir -p $DATA_DIR/raw_data
        !unrar x -o+ $DATA_DIR/videos/catch.rar $DATA_DIR/raw_data
        !unrar x -o+ $DATA_DIR/videos/smile.rar $DATA_DIR/raw_data
    elif model_name == "bpnet":
        !bash dataset_prepare/coco/download_coco.sh $DATA_DIR
        # Remove existing data
        !rm -rf $DATA_DIR/train2017/images
        !rm -rf $DATA_DIR/val2017/images
        # Rearrange data in the required format
        !mv $DATA_DIR/raw-data/* $DATA_DIR/
        !cp dataset_prepare/bpnet/* $DATA_DIR/
    elif model_name == "fpenet":
        !if [ ! -f $DATA_DIR/afw.zip ]; then echo 'afw zip file not found, please download.'; else echo 'Found afw zip file.';fi
        !mkdir $DATA_DIR/data
        !unzip -uq $DATA_DIR/afw.zip -d $DATA_DIR/data/afw
        !cp dataset_prepare/fpenet/data.json $DATA_DIR/
    elif model_name == "lprnet":
        !python3 -m pip install --upgrade pip
        !python3 -m pip install "opencv-python>=3.4.0.12,<=4.5.5.64"
        !bash dataset_prepare/lprnet/download_and_prepare_data.sh $DATA_DIR
    elif model_name == "ml_recog":
        !if [ ! -f $DATA_DIR/metric_learning_recognition/retail-product-checkout-dataset.zip ]; then echo 'retail-product-checkout-dataset.zip file not found, please download.'; else echo 'Found retail product dataset zip file.';fi
        !unzip -uq $DATA_DIR/metric_learning_recognition/retail-product-checkout-dataset.zip -d $DATA_DIR/metric_learning_recognition
    elif model_name == "ocdnet":
        !if [ ! -d $DATA_DIR/train/img ]; then echo 'Train image folder not found, please download.'; else echo 'Found Train image folder.';fi
        !if [ ! -d $DATA_DIR/train/gt ]; then echo 'Train ground truth folder not found, please download.'; else echo 'Found Train ground truth folder.';fi
        !if [ ! -d $DATA_DIR/test/img ]; then echo 'Val image folder not found, please download.'; else echo 'Found Val image folder.';fi
        !if [ ! -d $DATA_DIR/test/gt ]; then echo 'Val ground truth folder not found, please download.'; else echo 'Found Val ground truth folder.';fi
    elif model_name == "ocrnet":
        !mkdir -p $DATA_DIR/train && rm -rf $DATA_DIR/train/*
        !mkdir -p $DATA_DIR/test && rm -rf $DATA_DIR/test/*
        !if [ ! -f $DATA_DIR/ch4_test_word_images_gt.zip ]; then echo 'Test Image zip file not found, please download.'; else echo 'Found Test Image zip file.';fi
        !if [ ! -f $DATA_DIR/Challenge4_Test_Task3_GT.txt ]; then echo 'Test Label file not found, please download.'; else echo 'Found Test Labels file.';fi
        !if [ ! -f $DATA_DIR/ch4_training_word_images_gt.zip ]; then echo 'Train zip file not found, please download.'; else echo 'Found Train zip file.';fi
        !unzip -u $DATA_DIR/ch4_test_word_images_gt.zip -d $DATA_DIR/test
        !cp $DATA_DIR/Challenge4_Test_Task3_GT.txt -d $DATA_DIR/test
        !unzip -u $DATA_DIR/ch4_training_word_images_gt.zip -d $DATA_DIR/train    
    elif model_name == "optical_inspection":
        !if [ ! -d $DATA_DIR/train/images ]; then echo 'Train image folder not found'; else echo 'Found train image folder';fi
        !if [ ! -f $DATA_DIR/train/dataset.csv ]; then echo 'Train label file not found'; else echo 'Found train label file';fi
        !if [ ! -d $DATA_DIR/val/images ]; then echo 'Val image folder not found'; else echo 'Found val image folder';fi
        !if [ ! -f $DATA_DIR/val/dataset.csv ]; then echo 'Val label file not found'; else echo 'Found val label file';fi
        !if [ ! -d $DATA_DIR/test/images ]; then echo 'Test image folder not found'; else echo 'Found test image folder';fi
        !if [ ! -f $DATA_DIR/test/dataset.csv ]; then echo 'Test label file not found'; else echo 'Found test label file';fi
    elif model_name == "pointpillars":
        !if [ ! -f $DATA_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
        !if [ ! -f $DATA_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi
        !if [ ! -f $DATA_DIR/data_object_velodyne.zip ]; then echo 'Velodyne zip file not found, please download.'; else echo 'Found Velodyne zip file.';fi
        !if [ ! -f $DATA_DIR/data_object_calib.zip ]; then echo 'Calib zip file not found, please download.'; else echo 'Found Calib zip file.';fi
        !unzip -u $DATA_DIR/data_object_image_2.zip -d $DATA_DIR
        !unzip -u $DATA_DIR/data_object_label_2.zip -d $DATA_DIR
        !unzip -u $DATA_DIR/data_object_velodyne.zip -d $DATA_DIR
        !unzip -u $DATA_DIR/data_object_calib.zip -d $DATA_DIR
    elif model_name == "pose_classification":
        !pip3 install -U gdown
        if model_type == "kinetics":
            !gdown https://drive.google.com/uc?id=1dmzCRQsFXJ18BlXj1G9sbDnsclXIdDdR -O $DATA_DIR/st-gcn-processed-data.zip
            !unzip $DATA_DIR/st-gcn-processed-data.zip -d $DATA_DIR
            !mv $DATA_DIR/data/Kinetics/kinetics-skeleton $DATA_DIR/kinetics
            !rm -r $DATA_DIR/data
            !rm $DATA_DIR/st-gcn-processed-data.zip
        elif model_type == "nvidia":
            !gdown https://drive.google.com/uc?id=1GhSt53-7MlFfauEZ2YkuzOaZVNIGo_c- -O $DATA_DIR/data_3dbp_nvidia.zip
            !mkdir -p $DATA_DIR/nvidia
            !unzip $DATA_DIR/data_3dbp_nvidia.zip -d $DATA_DIR/nvidia
            !rm $DATA_DIR/data_3dbp_nvidia.zip
    elif model_name == "re_identification":
        !pip3 install -U gdown
        !gdown https://drive.google.com/uc?id=0B8-rUzbwVRk0c054eEozWG9COHM -O $DATA_DIR/market1501.zip
        !unzip -u $DATA_DIR/market1501.zip -d $DATA_DIR
        !rm -rf $DATA_DIR/market1501
        !mv $DATA_DIR/Market-1501-v15.09.15 $DATA_DIR/market1501
        !rm $DATA_DIR/market1501.zip

In [None]:
if model_name == "lprnet":
    ds_type = "character_recognition"
    ds_format = "lprnet"
else:
    ds_type = model_name
    ds_format = "default"

if model_name in ("lprnet","ocdnet","ocrnet", "optical_inspection"):
    eval_dataset_path = f"{DATA_DIR}/purpose_built_models_val.tar.gz"
if model_name in ("lprnet", "optical_inspection"):
    test_dataset_path = f"{DATA_DIR}/purpose_built_models_test.tar.gz"
train_dataset_path = f"{DATA_DIR}/purpose_built_models_train.tar.gz"

In [None]:
# Create train dataset
data = json.dumps({"type":ds_type,"format":ds_format})

endpoint = f"{base_url}/dataset"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
dataset_id = response.json()["id"]

In [None]:
if dataset_to_be_used == "default":
    USER_EXPERIMENT_DIR = os.path.join("/shared/users",user_id,"datasets",dataset_id)
    if model_name == "action_recognition":
        !python3 -m pip install opencv-python numpy
        # For rgb action recognition
        !if [ -d tao_toolkit_recipes ]; then rm -rf tao_toolkit_recipes; fi
        !git clone https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes
        !cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && bash ./preprocess_HMDB_RGB.sh $DATA_DIR/raw_data $DATA_DIR/processed_data 

        # For optical flow, comment the above 3 lines and uncomment the below (Note: for generating optical flow, a Turing or Ampere above GPU is needed.)
        #!echo <passwd> | sudo -S apt install -y libfreeimage-dev
        #!cp dataset_prepare/action_recognition/AppOFCuda tao_toolkit_recipes/tao_action_recognition/data_generation/
        #!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && bash ./preprocess_HMDB.sh $DATA_DIR/raw_data $DATA_DIR/processed_data

        # download the split files and unrar
        !wget -P $DATA_DIR http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/test_train_splits.rar
        !mkdir -p $DATA_DIR/splits && unrar x -o+ $DATA_DIR/test_train_splits.rar $DATA_DIR/splits
        # run split_HMDB to generate training split
        !if [ -d $DATA_DIR/train ]; then rm -rf $DATA_DIR/train $DATA_DIR/test; fi
        !cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && python3 ./split_dataset.py $DATA_DIR/processed_data $DATA_DIR/splits/testTrainMulti_7030_splits $DATA_DIR/train  $DATA_DIR/test

    elif model_name == "fpenet":
        !pip3 install numpy opencv-python
        if model_type == "80":
            output_json_path = os.path.join(os.environ['DATA_DIR'], 'data/afw/afw.json')
        elif model_type == "10":
            output_json_path = os.path.join(os.environ['DATA_DIR'], 'data/afw_10/afw_10.json')
        !python3 dataset_prepare/fpenet/data_utils.py --afw_data_path $DATA_DIR/data/afw --output_json_path $output_json_path --afw_image_save_path $DATA_DIR/data/afw --num_key_points $model_type --container_root_path $USER_EXPERIMENT_DIR

    elif model_name == "lprnet":
        character_file_link = "https://api.ngc.nvidia.com/v2/models/nvidia/tao/lprnet/versions/trainable_v1.0/files/{}_lp_characters.txt".format(model_type)
        !wget -q -O $DATA_DIR/train/characters.txt $character_file_link
        !cp $DATA_DIR/train/characters.txt $DATA_DIR/val/characters.txt

    elif model_name == "ocrnet":
        !python3 -m pip install tqdm
        orig_train_gt_file=os.path.join(os.getenv("DATA_DIR"), "train", "gt.txt")
        processed_train_gt_file=os.path.join(os.getenv("DATA_DIR"), "train", "gt_new.txt")
        orig_test_gt_file=os.path.join(os.getenv("DATA_DIR"), "test", "Challenge4_Test_Task3_GT.txt")
        processed_test_gt_file=os.path.join(os.getenv("DATA_DIR"), "test", "gt_new.txt")
        !python3 dataset_prepare/ocrnet/preprocess_label.py $orig_train_gt_file $processed_train_gt_file
        !python3 dataset_prepare/ocrnet/preprocess_label.py $orig_test_gt_file $processed_test_gt_file

    elif model_name == "pointpillars":
        !python3 -m pip install scikit-image numpy
        !mkdir -p $DATA_DIR/train/lidar $DATA_DIR/train/label $DATA_DIR/val/lidar $DATA_DIR/val/label

        !python3 dataset_prepare/pointpillars/gen_lidar_points.py -p $DATA_DIR/training/velodyne \
                                               -c $DATA_DIR/training/calib    \
                                               -i $DATA_DIR/training/image_2  \
                                               -o $DATA_DIR/train/lidar  # Convert labels from Camera coordinate system to LIDAR coordinate system, etc
        !python3 dataset_prepare/pointpillars/gen_lidar_labels.py -l $DATA_DIR/training/label_2 \
                                               -c $DATA_DIR/training/calib \
                                               -o $DATA_DIR/train/label  # Drop DontCare class
        !python3 dataset_prepare/pointpillars/drop_class.py $DATA_DIR/train/label DontCare  # train/val split
        # Change the val set id's if you need a different set of validation images
        !python3 dataset_prepare/pointpillars/kitti_split.py dataset_prepare/pointpillars/val.txt \
                                          $DATA_DIR/train/lidar \
                                          $DATA_DIR/train/label \
                                          $DATA_DIR/val/lidar \
                                          $DATA_DIR/val/label

    elif model_name == "pose_classification" and model_type == "kinetics":
        !pip3 install numpy
        # select actions
        !python3 dataset_prepare/pose_classification/select_subset_actions.py

    elif model_name == "re_identification":
        #100 is the number of samples to be present in the subset data - you can choose any number <= total samples in the dataset
        !python3 dataset_prepare/re_identification/obtain_subset_data.py 100
    
    elif model_name == "ml_recog":
        # crops images from detection set and form a classification set
        # splits to reference/train/val/test set
        !sudo apt-get install gcc -y
        !python3 -m pip install opencv-python numpy pycocotools tqdm
        !python3 dataset_prepare/metric_learning_recognition/process_retail_product_checkout_dataset.py

In [None]:
# Update
dataset_information = {"name":"Train dataset",
                       "description":"My train dataset"}
data = json.dumps(dataset_information)

endpoint = f"{base_url}/dataset/{dataset_id}"

response = requests.patch(endpoint, data=data, headers=headers)

print(response)
print(response.json())

In [None]:
if model_name == "action_recognition":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train test
elif model_name == "bpnet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train2017 val2017 annotations bpnet_18joints.json  coco_spec.json  infer_spec.yaml
elif model_name == "fpenet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz data data.json
elif model_name == "lprnet":
    !tar -C $DATA_DIR/train/ -czf $DATA_DIR/purpose_built_models_train.tar.gz image label characters.txt
    !tar -C $DATA_DIR/val/ -czf $DATA_DIR/purpose_built_models_val.tar.gz image label characters.txt
    !tar -C $DATA_DIR/val/ -czf $DATA_DIR/purpose_built_models_test.tar.gz image
elif model_name == "ml_recog":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz metric_learning_recognition/retail-product-checkout-dataset_classification_demo/
elif model_name == "ocdnet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_val.tar.gz test
elif model_name == "ocrnet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train character_list
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_val.tar.gz test character_list
elif model_name == "optical_inspection":
    !tar -C $DATA_DIR/train -czf $DATA_DIR/purpose_built_models_train.tar.gz images dataset.csv
    !tar -C $DATA_DIR/val -czf $DATA_DIR/purpose_built_models_val.tar.gz images dataset.csv
    !tar -C $DATA_DIR/test -czf $DATA_DIR/purpose_built_models_test.tar.gz images dataset.csv
elif model_name == "pointpillars":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train val
elif model_name == "pose_classification":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz $model_type
elif model_name == "re_identification":
    !tar -C $DATA_DIR/market1501 -czf $DATA_DIR/purpose_built_models_train.tar.gz sample_train sample_test sample_query

In [None]:
# Upload
files = [("file",open(train_dataset_path,"rb"))]

endpoint = f"{base_url}/dataset/{dataset_id}/upload"

response = requests.post(endpoint, files=files, headers=headers)

print(response)
print(response.json())

In [None]:
# Create eval dataset
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection"):
    data = json.dumps({"type":ds_type,"format":ds_format})

    endpoint = f"{base_url}/dataset"

    response = requests.post(endpoint,data=data,headers=headers)

    print(response)
    print(response.json())
    eval_dataset_id = response.json()["id"]

In [None]:
# Update
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection"):
    dataset_information = {"name":"Eval dataset",
                           "description":"My eval dataset with OpenALPR"}
    data = json.dumps(dataset_information)

    endpoint = f"{base_url}/dataset/{eval_dataset_id}"

    response = requests.patch(endpoint, data=data, headers=headers)

    print(response)
    print(response.json())

In [None]:
# Upload
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection"):
    files = [("file",open(eval_dataset_path,"rb"))]

    endpoint = f"{base_url}/dataset/{eval_dataset_id}/upload"

    response = requests.post(endpoint, files=files, headers=headers)

    print(response)
    print(response.json())

In [None]:
# Create testing dataset for inference
if model_name in ("lprnet", "optical_inspection"):
    if model_name == "lprnet":
        ds_type = "character_recognition"
        ds_format = "raw"
    else:
        ds_type = model_name
        ds_format = "default"

    data = json.dumps({"type":ds_type,"format":ds_format})

    endpoint = f"{base_url}/dataset"

    response = requests.post(endpoint,data=data, headers=headers)

    print(response)
    print(response.json())
    test_dataset_id = response.json()["id"]

In [None]:
# Upload
if model_name in ("lprnet", "optical_inspection"):
    files = [("file",open(test_dataset_path,"rb"))]

    endpoint = f"{base_url}/dataset/{test_dataset_id}/upload"

    response = requests.post(endpoint, files=files, headers=headers)

    print(response)
    print(response.json())

### List the created datasets <a class="anchor" id="head-2"></a>

In [None]:
endpoint = f"{base_url}/dataset"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
print("id\t\t\t\t\t type\t\t\t format\t\t name")
for rsp in response.json():
    print(rsp["id"],"\t",rsp["type"],"\t",rsp["format"],"\t\t",rsp["name"])

### Dataset convert Action for train dataset <a class="anchor" id="head-3"></a>

In [None]:
convert_action = "dataset_convert"

In [None]:
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    # Get default spec schema
    endpoint = f"{base_url}/dataset/{dataset_id}/specs/{convert_action}/schema"

    response = requests.get(endpoint, headers=headers)

    print(response)
    # print(response.json()) ## Uncomment for verbose schema

    specs = response.json()["default"]

    print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs dictionary if necessary
if model_name == "bpnet":
    specs["mode"] = "train"
elif model_name == "fpenet":
    specs["num_keypoints"] = int(model_type)

In [None]:
# Post spec
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    data = json.dumps(specs)

    endpoint = f"{base_url}/dataset/{dataset_id}/specs/{convert_action}"

    response = requests.post(endpoint,data=data,headers=headers)

    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    parent = None
    actions = [convert_action]
    data = json.dumps({"job":parent,"actions":actions})

    endpoint = f"{base_url}/dataset/{dataset_id}/job"

    response = requests.post(endpoint, data=data, headers=headers)

    print(response)
    print(response.json())

    train_ds_convert_id = response.json()[0]

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    job_id = train_ds_convert_id
    endpoint = f"{base_url}/dataset/{dataset_id}/job/{job_id}"

    while True:
        clear_output(wait=True) 
        response = requests.get(endpoint, headers=headers)
        print(response)
        print(response.json())
        if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Dataset convert Action for val dataset <a class="anchor" id="head-3.1"></a>

In [None]:
if model_name in ("bpnet", "ocrnet"):
    # Get default spec schema
    if model_name == "bpnet":
        endpoint = f"{base_url}/dataset/{dataset_id}/specs/{convert_action}/schema"
    else:
        endpoint = f"{base_url}/dataset/{eval_dataset_id}/specs/{convert_action}/schema"

    response = requests.get(endpoint, headers=headers)

    print(response)
    # print(response.json()) ## Uncomment for verbose schema

    specs = response.json()["default"]

    print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs dictionary if necessary
if model_name == "bpnet":
    specs["mode"] = "test"

In [None]:
# Post spec
if model_name in ("bpnet", "ocrnet"):
    data = json.dumps(specs)

    if model_name == "bpnet":
        endpoint = f"{base_url}/dataset/{dataset_id}/specs/{convert_action}"
    else:
        endpoint = f"{base_url}/dataset/{eval_dataset_id}/specs/{convert_action}"

    response = requests.post(endpoint,data=data,headers=headers)

    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("bpnet", "ocrnet"):
    parent = None
    actions = [convert_action]
    data = json.dumps({"job":parent,"actions":actions})

    if model_name == "bpnet":
        endpoint = f"{base_url}/dataset/{dataset_id}/job"
    else:
        endpoint = f"{base_url}/dataset/{eval_dataset_id}/job"
    
    response = requests.post(endpoint, data=data, headers=headers)

    print(response)
    print(response.json())

    eval_ds_convert_id = response.json()[0]

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "ocrnet"):
    job_id = eval_ds_convert_id
    if model_name == "bpnet":
        endpoint = f"{base_url}/dataset/{dataset_id}/job/{job_id}"
    else:
        endpoint = f"{base_url}/dataset/{eval_dataset_id}/job/{job_id}"

    while True:
        clear_output(wait=True) 
        response = requests.get(endpoint, headers=headers)
        print(response)
        print(response.json())
        if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Create model <a class="anchor" id="head-4"></a>

In [None]:
if model_name in ("action_recognition", "pose_classification", "ml_recog", "ocrnet", "ocdnet", "optical_inspection", "re_identification"):
    encode_key = "nvidia_tao"
elif model_name == "pointpillars":
    encode_key = "tlt_encode"
else:
    encode_key = "nvidia_tlt"

data = json.dumps({"network_arch":model_name,"encryption_key":encode_key})

endpoint = f"{base_url}/model"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
model_id = response.json()["id"]

### List models <a class="anchor" id="head-5"></a>

In [None]:
endpoint = f"{base_url}/model"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
print("model id\t\t\t     network architecture")
for rsp in response.json():
    print(rsp["id"],rsp["network_arch"])

### Assign datasets <a class="anchor" id="head-6"></a>

In [None]:
dataset_information = {}
dataset_information["train_datasets"] = [dataset_id]
if model_name in ("bpnet","fpenet","lprnet","ml_recog","ocdnet","ocrnet"):
    dataset_information["calibration_dataset"] = dataset_id
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection"):
    dataset_information["eval_dataset"] = eval_dataset_id
if model_name in ("lprnet", "optical_inspection"):
    dataset_information["inference_dataset"] = test_dataset_id

data = json.dumps(dataset_information)

endpoint = f"{base_url}/model/{model_id}"

response = requests.patch(endpoint, data=data, headers=headers)

print(response)
print(response.json())

### Assign PTM <a class="anchor" id="head-7"></a>

Search for PTM on NGC for the Purpose built model chosen

In [None]:
# List all pretrained models for the chosen network architecture
model_list = f"{base_url}/model"
response = requests.get(model_list, headers=headers)

response_json = response.json()

# Search for ptm with given ngc path
for rsp in response_json:
   if rsp["network_arch"] == model_name:
       if "encryption_key" not in rsp.keys():
           print(f'PTM Name: {rsp["name"]}; PTM version: {rsp["version"]}; NGC PATH: {rsp["ngc_path"]}; Additional info: {rsp["additional_id_info"]}')

In [None]:
# Assigning pretrained models to different purpose built models versions
# From the output of previous cell make the appropriate changes to this map if you want to change the default PTM backbone.
# Changing the default backbone here requires changing default spec/config during train/eval etc like for example
# If you are changing the ptm to resnet34, then you have to modify the config key num_layers if it exists to 34 manually
pretrained_map = {"action_recognition":"actionrecognitionnet:trainable_v1.0",
                  "bpnet" : "bodyposenet:trainable_v1.0",
                  "fpenet" : "fpenet:trainable_v1.0",
                  "lprnet": "lprnet:trainable_v1.0",
                  "ml_recog": "retail_object_recognition:trainable_v1.0",
                  "ocdnet": "ocdnet:trainable_resnet18_v1.0",
                  "ocrnet": "ocrnet:trainable_v1.0",
                  "optical_inspection": "optical_inspection:trainable_v1.0",
                  "pointpillars":"pointpillarnet:trainable_v1.0",
                  "pose_classification":"poseclassificationnet:trainable_v1.0",
                  "re_identification":"reidentificationnet:trainable_v1.1"}

if model_name == "action_recognition":
    if model_type == "of":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v2.0"
    elif model_type == "joint":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v1.0,actionrecognitionnet:trainable_v2.0"
        
no_ptm_models = set([])

In [None]:
if model_name not in no_ptm_models:
    # Get pretrained model
    model_list = f"{base_url}/model"
    response = requests.get(model_list, headers=headers)

    response_json = response.json()

    ptm_model_names = pretrained_map[model_name].split(",")
    ptm = []

    # Search for ptm with given ngc path
    for ptm_model_name in ptm_model_names:
        ptm_id = None
        for rsp in response_json:
            if rsp["network_arch"] == model_name and rsp["ngc_path"].endswith(ptm_model_name):
                additional_id_info = []
                if rsp["additional_id_info"]:
                    additional_id_info = rsp["additional_id_info"].split(",")
                if (len(additional_id_info) == 0) or \
                    (model_name == "lprnet" and len(additional_id_info) == 1 and additional_id_info[0] == model_type) or \
                    (model_name == "action_recognition" and len(additional_id_info) == 1 and additional_id_info[0] == model_input_type) or \
                    (model_name == "action_recognition" and len(additional_id_info) == 2 and additional_id_info[0] == platform and additional_id_info[1] == model_input_type):
                    ptm_id = rsp["id"]
                    print("Metadata for model with requested NGC Path")
                    print(rsp)
                    break
        ptm.append(ptm_id)

In [None]:
if model_name not in no_ptm_models:
    ptm_information = {"ptm":ptm}
    data = json.dumps(ptm_information)

    endpoint = f"{base_url}/model/{model_id}"

    response = requests.patch(endpoint, data=data, headers=headers)

    print(response)
    print(response.json())

### View hyperparameters that are enabled for AutoML by default <a class="anchor" id="head-8"></a>

In [None]:
if automl_enabled:
    # Get default spec schema
    endpoint = f"{base_url}/model/{model_id}/specs/train/schema"
    response = requests.get(endpoint, headers=headers)
    specs = response.json()["automl_default_parameters"]
    print(json.dumps(specs, sort_keys=True, indent=4))

### Set AutoML related configurations <a class="anchor" id="head-9"></a>
Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters:

[ActionRecognitionNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/action_recognition/action_recognition%20-%20train.csv), 
[BPNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/bpet/bpnet%20-%20train.csv), 
[FPENET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/fpenet/fpenet%20-%20train.csv), 
[LPRNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/lprnet/lprnet%20-%20train.csv), 
[MetricLearningRecognition](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ml_recog/ml_recog%20-%20train.csv), 
[OCDNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocdnet/ocdnet%20-%20train.csv), 
[OCRNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocrnet/ocrnet%20-%20train.csv), 
[OpticalInspection](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/optical_inspection/optical_inspection%20-%20train.csv), 
[Pointpillars](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pointpillars/pointpillars%20-%20train.csv), 
[PoseClassificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pose_classification/pose_classification%20-%20train.csv), 
[ReIdentificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/re_identification/re_identification%20-%20train.csv)

In [None]:
if automl_enabled:
    # Choose automl algorithm between "Bayesian" and "HyperBand".
    automl_algorithm="Bayesian" # FIXME8 example: Bayesian/HyperBand

    #Don't change this, in future multiple metrics will be supported
    metric = "kpi"

    additional_automl_parameters = [] #Refer to parameter list mentioned in the above links and add any extra parameter in addition to the default enabled ones
    remove_default_automl_parameters = [] #Remove any hyperparameters that are enabled by default for AutoML

    automl_information = {"automl_enabled":automl_enabled,
                          "automl_algorithm":automl_algorithm,
                          "epoch_multiplier": 1, # Will be considered for Hyperband only
                          "metric":metric,
                          "automl_add_hyperparameters":str(additional_automl_parameters),
                          "automl_remove_hyperparameters":str(remove_default_automl_parameters)
                         }
    data = json.dumps(automl_information)

    endpoint = f"{base_url}/model/{model_id}"

    response = requests.patch(endpoint, data=data, headers=headers)

    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

### Actions <a class="anchor" id="head-10"></a>

For all actions:
1. Get default spec schema and derive the default values
2. Modify defaults if needed
3. Post spec dictionary to the service
4. Run model action
5. Monitor job using retrieve
6. Download results using job download endpoint (if needed)

In [None]:
job_map = {}

### Train <a class="anchor" id="head-11"></a>

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{model_id}/specs/train/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
if model_name == "action_recognition":
    specs["model"]["model_type"] = model_type
    specs["model"]["input_type"] = model_input_type
    specs["train"]["num_epochs"] = 20
    specs["train"]["gpu_ids"] = [0]
elif model_name == "bpnet":
    specs["num_epoch"] = 20
    specs["finetuning_config"]["checkpoint_path"] = None
    specs["gpus"] = 1
elif model_name == "fpenet":
    specs["dataloader"]["dataset_info"]["root_path"] = None
    specs["num_keypoints"] = int(model_type)
    specs["dataloader"]["num_keypoints"] = int(model_type)
    specs["gpus"] = 1
elif model_name == "lprnet":
    specs["training_config"]["num_epochs"] = 24
    specs["gpus"] = 1
elif model_name == "ml_recog":
    specs["train"]["num_epochs"] = 30
    specs["train"]["gpu_ids"] = [0]
elif model_name == "ocdnet":
    specs["train"]["num_epochs"] = 30
    specs["train"]["gpu_id"] = [0]
    specs["num_gpus"] = 1
elif model_name == "ocrnet":
    specs["train"]["num_epochs"] = 20
    specs["train"]["gpu_ids"] = [0]
elif model_name == "optical_inspection":
    specs["train"]["num_epochs"] = 30
    specs["train"]["gpu_ids"] = [0]
elif model_name == "pose_classification":
    specs["train"]["num_epochs"] = 50
    specs["train"]["gpu_ids"] = [0]
    if model_type == "nvidia":
        specs["dataset"]["num_classes"] = 6
        specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        specs["dataset"]["num_classes"] = 5
        specs["model"]["graph_layout"] = "openpose"
elif model_name == "pointpillars":
    specs["train"]["num_epochs"] = 80
    specs["gpus"] = 1
elif model_name == "re_identification":
    specs["train"]["num_epochs"] = 120
    specs["train"]["gpu_ids"] = [0]
    specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
    specs["dataset"]["num_workers"] = 4 #Modify the num_workers according to your hardware setup
    specs["dataset"]["batch_size"] = 16 #Modify the batch_size according to your hardware setup

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{model_id}/specs/train"

response = requests.post(endpoint,data=data, headers=headers)

print(response)
print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Run action
parent = None
actions = ["train"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{model_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["train"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
# For automl: Training times for different models benchmarked on 1 GPU V100 machine can be found here: https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#results-of-automl-experiments

job_id = job_map['train']
endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    if "error_desc" in response.json().keys() and response.json()["error_desc"] in ("Job not found", "No AutoML run found"):
        print("Job is being created")
        time.sleep(5)
        continue
    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
## To Stop an AutoML JOB
#    1. Stop the 'Monitor job status by repeatedly running this cell' cell (the cell right before this cell) manually
#    2. Uncomment the snippet in the next cell and run the cell

In [None]:
# if automl_enabled:
#     job_id = job_map['train']
#     endpoint = f"{base_url}/model/{model_id}/job/{job_id}/cancel"

#     response = requests.post(endpoint, headers=headers)

#     print(response)
#     print(response.json())

In [None]:
## Resume AutoML

In [None]:
# Uncomment the below snippet if you want to resume an already stopped AutoML job and then run the 'Monitor job status by repeatedly running this cell' cell above (4th cell above from this cell)
# if automl_enabled:
#     job_id = job_map['train']
#     endpoint = f"{base_url}/model/{model_id}/job/{job_id}/resume"

#     response = requests.post(endpoint, headers=headers)

#     print(response)
#     print(response.json())

In [None]:
# Download job contents once the above job shows "Done" status
# Download output of train (Note: will take time)
job_id = job_map["train"]
endpoint = f'{base_url}/model/{model_id}/job/{job_id}/download'

# Save
temptar = f'{job_id}.tar.gz'
with requests.get(endpoint, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(temptar, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

print("Untarring")
# Untar to destination
tar_command = f'tar -xf {temptar} -C {workdir}/'
os.system(tar_command)
os.remove(temptar)
print(f"Results at {workdir}/{job_id}")
model_downloaded_path = f"{workdir}/{job_id}"

In [None]:
# View the checkpoints generated for the training job and for automl jobs, in addition view: best performing model's config and the results of all automl experiments

if automl_enabled:
    !python3 -m pip install pandas==1.5.1
    import pandas as pd
    model_downloaded_path = f"{model_downloaded_path}/best_model"

if os.path.exists(model_downloaded_path):        
    #List the binary model file
    print("\nCheckpoints for the training experiment")
    if os.path.exists(model_downloaded_path+"/train/weights") and len(os.listdir(model_downloaded_path+"/train/weights")) > 0:
        print(f"Folder: {model_downloaded_path}/train/weights")
        print("Files:", os.listdir(model_downloaded_path+"/train/weights"))
    elif os.path.exists(model_downloaded_path+"/weights") and len(os.listdir(model_downloaded_path+"/weights")) > 0:
        print(f"Folder: {model_downloaded_path}/weights")
        print("Files:", os.listdir(model_downloaded_path+"/weights"))
    else:
        print(f"Folder: {model_downloaded_path}")
        print("Files:", os.listdir(model_downloaded_path))

    if automl_enabled:
        experiment_artifacts = json.load(open(f"{model_downloaded_path}/controller.json","r"))
        data_frame = pd.DataFrame(experiment_artifacts)
        # Print experiment id/number and the corresponding result
        print("\nResults of all experiments")
        with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):
            print(data_frame[["id","result"]])

### Evaluate <a class="anchor" id="head-12"></a>

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{model_id}/specs/evaluate/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
if model_name == "action_recognition":
    specs["model"]["model_type"] = model_type
    specs["model"]["input_type"] = model_input_type
elif model_name == "fpenet":
    specs["dataloader"]["dataset_info"]["root_path"] = None
    specs["num_keypoints"] = int(model_type)
    specs["dataloader"]["num_keypoints"] = int(model_type)
elif model_name == "pose_classification":
    if model_type == "nvidia":
        specs["dataset"]["num_classes"] = 6
        specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        specs["dataset"]["num_classes"] = 5
        specs["model"]["graph_layout"] = "openpose"
elif model_name == "re_identification":
    specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{model_id}/specs/evaluate"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Run action
parent = job_map["train"]
actions = ["evaluate"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{model_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["evaluate"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['evaluate']
endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### Optimize <a class="anchor" id="head-13"></a>

- We optimize the trained model by pruning and retraining in the following cells

### Apply specs for prune <a class="anchor" id="head-14"></a>

In [None]:
# Get default spec schema
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    endpoint = f"{base_url}/model/{model_id}/specs/prune/schema"

    response = requests.get(endpoint, headers=headers)

    print(response)
    #print(response.json()) ## Uncomment for verbose schema
    specs = response.json()["default"]
    print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
# None for prune

In [None]:
# Post spec
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    data = json.dumps(specs)

    endpoint = f"{base_url}/model/{model_id}/specs/prune"

    response = requests.post(endpoint,data=data,headers=headers)

    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

### Apply specs for retrain <a class="anchor" id="head-15"></a>

In [None]:
# Get default spec schema
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    endpoint = f"{base_url}/model/{model_id}/specs/retrain/schema"

    response = requests.get(endpoint,headers=headers)

    print(response)
    #print(response.json()) ## Uncomment for verbose schema
    specs = response.json()["default"]
    print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
if model_name == "bpnet":
    specs["num_epoch"] = 20
    specs["finetuning_config"]["checkpoint_path"] = None
    specs["gpus"] = 1
elif model_name == "ocdnet":
    specs["train"]["num_epochs"] = 30
    specs["train"]["gpu_id"] = [0]
    specs["num_gpus"] = 1
elif model_name == "ocrnet":
    specs["train"]["num_epochs"] = 20
    specs["train"]["gpu_ids"] = [0]
elif model_name == "pointpillars":
    specs["train"]["num_epochs"] = 80
    specs["gpus"] = 1

In [None]:
# Post spec
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    data = json.dumps(specs)

    endpoint = f"{base_url}/model/{model_id}/specs/retrain"

    response = requests.post(endpoint,data=data,headers=headers)

    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

### Run Actions <a class="anchor" id="head-16"></a>

We use the API's job chaining feature to prune, retrain and evaluate the retrained model

In [None]:
# Run actions
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    parent = job_map["train"]
    actions = ["prune","retrain","evaluate"]
    data = json.dumps({"job":parent,"actions":actions})

    endpoint = f"{base_url}/model/{model_id}/job"

    response = requests.post(endpoint, data=data, headers=headers)

    print(response)
    print(response.json())

    job_map["prune"] = response.json()[0]
    job_map["retrain"] = response.json()[1]
    job_map["eval_retrain"] = response.json()[2]
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell (prune)
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map['prune']
    endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        print(response)
        print(response.json())
        if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
            break
        time.sleep(15)

In [None]:
# Monitor job status by repeatedly running this cell (retrain)
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map['retrain']
    endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        print(response)
        print(response.json())
        if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
            break
        time.sleep(15)

In [None]:
# Monitor job status by repeatedly running this cell (evaluate)
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map['eval_retrain']
    endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        print(response)
        print(response.json())
        if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
            break
        time.sleep(15)

In [None]:
# Optional cancel job - for jobs that are pending/running (retrain)

# if model_name == "pointpillars":
#     job_id = job_map['retrain']
#     endpoint = f"{base_url}/model/{model_id}/job/{job_id}/cancel"

#     response = requests.post(endpoint, headers=headers)

#     print(response)
#     print(response.json())

In [None]:
# Optional delete job - for jobs that are error/done (retrain)

# if model_name == "pointpillars":
#     job_id = job_map['retrain']
#     endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

#     response = requests.delete(endpoint, headers=headers)

#     print(response)
#     print(response.json())

### Export <a class="anchor" id="head-17"></a>

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{model_id}/specs/export/schema"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the specs dictionary if necessary
if model_name == "action_recognition":
    specs["model"]["model_type"] = model_type
    specs["model"]["input_type"] = model_input_type
elif model_name == "bpnet":
    specs["data_type"] = "int8"
    specs["max_batch_size"] = 1
    specs["batches"] = 50
elif model_name == "lprnet":
    specs["data_type"] = "fp32"
elif model_name == "pose_classification":
    if model_type == "nvidia":
        specs["dataset"]["num_classes"] = 6
        specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        specs["dataset"]["num_classes"] = 5
        specs["model"]["graph_layout"] = "openpose"
elif model_name == "re_identification":
    specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{model_id}/specs/export"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Run action
parent = job_map["train"]
actions = ["export"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{model_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["export"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['export']
endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
# Download job contents once the above job shows "Done" status
job_id = job_map["export"]
endpoint = f'{base_url}/model/{model_id}/job/{job_id}/download'

# Save
temptar = f'{job_id}.tar.gz'
with requests.get(endpoint, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(temptar, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

print("Untarring")
# Untar to destination
tar_command = f'tar -xf {temptar} -C {workdir}/'
os.system(tar_command)
os.remove(temptar)
print(f"Results at {workdir}/{job_id}")
model_downloaded_path = f"{workdir}/{job_id}"

In [None]:
# Look for the generated .onnx file
!ls {model_downloaded_path}

### TRT Engine generation using TAO-Deploy <a class="anchor" id="head-18"></a>

- Here, we use the exported model to convert to target platform

In [None]:
# Get default spec schema
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog"):
    if model_name == "bpnet":
        engine_generation_action = "trtexec"
    else:
        engine_generation_action = "gen_trt_engine"
        
    endpoint = f"{base_url}/model/{model_id}/specs/{engine_generation_action}/schema"

    response = requests.get(endpoint, headers=headers)

    print(response)
    #print(response.json()) ## Uncomment for verbose schema
    specs = response.json()["default"]
    print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
if model_name == "lprnet":
    specs["data_type"] = "fp32"
elif model_name in ("ml_recog", "ocdnet"):
    specs["gen_trt_engine"]["tensorrt"]["data_type"] = "int8"
elif model_name in ("ocrnet", "optical_inspection"):
    specs["gen_trt_engine"]["tensorrt"]["data_type"] = "fp16"

In [None]:
# Post spec
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog"):
    data = json.dumps(specs)

    endpoint = f"{base_url}/model/{model_id}/specs/{engine_generation_action}"

    response = requests.post(endpoint,data=data,headers=headers)

    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog"):
    parent = job_map["export"]
    actions = [engine_generation_action]
    data = json.dumps({"job":parent,"actions":actions})

    endpoint = f"{base_url}/model/{model_id}/job"

    response = requests.post(endpoint, data=data, headers=headers)

    print(response)
    print(response.json())

    job_map[engine_generation_action] = response.json()[0]
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog"):
    job_id = job_map[engine_generation_action]
    endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

    while True:    
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        print(response)
        print(response.json())
        if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### TAO inference <a class="anchor" id="head-19"></a>

- Run inference on a set of images using the .tlt model created at train step

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{model_id}/specs/inference/schema"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(json.dumps(specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the specs dictionary if necessary
if model_name == "action_recognition":
    specs["model"]["model_type"] = model_type
    specs["model"]["input_type"] = model_input_type
elif model_name == "fpenet":
    specs["num_keypoints"] = int(model_type)
    specs["dataloader"]["num_keypoints"] = int(model_type)
elif model_name == "pose_classification":
    if model_type == "nvidia":
        specs["dataset"]["num_classes"] = 6
        specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        specs["dataset"]["num_classes"] = 5
        specs["model"]["graph_layout"] = "openpose"
elif model_name == "re_identification":
    specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{model_id}/specs/inference"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Run action
parent = job_map["train"]
actions = ["inference"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{model_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["inference_tlt"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['inference_tlt']
endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
# Download job contents once the above job shows "Done" status
job_id = job_map["inference_tlt"]
endpoint = f'{base_url}/model/{model_id}/job/{job_id}/download'

# Save
temptar = f'{job_id}.tar.gz'
with requests.get(endpoint, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(temptar, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

print("Untarring")
# Untar to destination
tar_command = f'tar -xf {temptar} -C {workdir}/'
os.system(tar_command)
os.remove(temptar)
print(f"Results at {workdir}/{job_id}")
inference_out_path = f"{workdir}/{job_id}"

In [None]:
# Inference output must be here
if model_name in ("action_recognition","lprnet","ocrnet"):
    !cat {inference_out_path}/logs_from_toolkit.txt
elif model_name in ("bpnet","pointpillars"):
    !python3 -m pip install matplotlib
    import glob
    import matplotlib.pyplot as plt
    import matplotlib.image as mpimg
    if model_name == "bpnet":
        sample_image = glob.glob(f"{inference_out_path}/images_annotated/*.png")[0]
    elif model_name == "pointpillars":
        sample_image = glob.glob(f"{inference_out_path}/infer/detected_boxes/*.png")[0]
    def display_photo(path):
        img = mpimg.imread(path)
        plt.figure(figsize = (int(img.shape[0]/100)*2,int(img.shape[1]/100)*2))
        plt.axis('off')
        imgplot = plt.imshow(img, aspect='auto')
        plt.show()
    display_photo(sample_image)
elif model_name == "fpenet":
    !cat {inference_out_path}/result.txt
elif model_name == "ml_recog":
    !cat {inference_out_path}/inference/result.csv
elif model_name == "optical_inspection":
    !cat {inference_out_path}/inference/inference.csv
elif model_name == "pose_classification":
    !cat {inference_out_path}/results.txt
elif model_name == "re_identification":
    !cat {inference_out_path}/inference.json

### TRT inference <a class="anchor" id="head-20"></a>

- no need to change the specs since we already uploaded it at the tlt inference step

In [None]:
# Run action
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection"):
    parent = job_map[engine_generation_action]
    actions = ["inference"]
    data = json.dumps({"job":parent,"actions":actions})

    endpoint = f"{base_url}/model/{model_id}/job"

    response = requests.post(endpoint, data=data, headers=headers)

    print(response)
    print(response.json())

    job_map["inference_trt"] = response.json()[0]
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection"):
    job_id = job_map['inference_trt']
    endpoint = f"{base_url}/model/{model_id}/job/{job_id}"

    while True:    
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        print(response)
        print(response.json())
        if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
            break
        time.sleep(15)

In [None]:
# Download job contents once the above job shows "Done" status
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection"):
    job_id = job_map["inference_trt"]
    endpoint = f'{base_url}/model/{model_id}/job/{job_id}/download'

    # Save
    temptar = f'{job_id}.tar.gz'
    with requests.get(endpoint, headers=headers, stream=True) as r:
        r.raise_for_status()
        with open(temptar, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)

    print("Untarring")
    # Untar to destination
    tar_command = f'tar -xf {temptar} -C {workdir}/'
    os.system(tar_command)
    os.remove(temptar)
    print(f"Results at {workdir}/{job_id}")
    inference_out_path = f"{workdir}/{job_id}"

In [None]:
# Inference output must be here
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection"):
    !ls {inference_out_path}/