### Notebook to demonstrate TAO workflow on purpose built models

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)

### The workflow in a nutshell

- Creating a dataset
- Upload dataset to the service
- Running dataset convert (for specific models)
- Getting a PTM from NGC
- Model Actions
    - Train (Normal/AutoML)
    - Evaluate
    - Prune, retrain (for specific models)
    - Export
    - TAO-Deploy (for specific models)
    - Inference on TAO
    - Inference on TRT (for specific models)
    
### Table of contents

1. [Create datasets](#head-1)
1. [List the created datasets](#head-2)
1. [Dataset convert Action for train dataset](#head-3) (for specific models)
1. [Dataset convert Action for val dataset](#head-3.1) (for specific models)
1. [Create an experiment](#head-4)
1. [List experiments](#head-5)
1. [Assign datasets](#head-6)
1. [Assign PTM](#head-7)
1. [View hyperparameters that are enabled by default](#head-8)
1. [Set AutoML related configurations](#head-9)
1. [Actions](#head-10)
1. [Train](#head-11)
1. [Evaluate](#head-12)
1. [Optimize: Apply specs for prune](#head-14) (for specific models)   
1. [Optimize: Apply specs for retrain](#head-15) (for specific models)
1. [Optimize: Run actions](#head-16) (for specific models)
1. [Export](#head-17)
1. [TRT Engine generation using TAO-Deploy](#head-18) (for specific models)
1. [TAO inference](#head-19)
1. [TRT inference](#head-20) (for specific models)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import json
import os
import requests
import shutil
import time
from IPython.display import clear_output
import subprocess
import glob

### FIXME

1. Assign a model_name in FIXME 1

    1.1 Assign model type for action_recognition/fpenet/lprnet/pose_classification in FIXME 1.1

    1.2 Assign platform for action_recognition in FIXME 1.2
    
    1.3 Assign model input type for action_recognition in FIXME 1.3
1. Assign a workdir in FIXME 2
1. Assign the ip_address and port_number in FIXME 3 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
1. Assign the ngc_api_key variable in FIXME 4
1. (Optional) Enable AutoML if needed in FIXME 5
1. (Optional) Choose between bayesian and hyperband automl_algorithm in FIXME 6 (If automl was enabled in FIXME5)
1. Choose to download jobs or not in FIXME 7
1. Choose between default and custom dataset in FIXME 8
1. Assign path of DATA_DIR in FIXME 9

In [None]:
# Define model_name workspaces and other variables
# Available models (#FIXME 1):
# 1. action_recognition - https://docs.nvidia.com/tao/tao-toolkit/text/action_recognition_net.html
# 2. bpnet - https://docs.nvidia.com/tao/tao-toolkit/text/bodypose_estimation/bodyposenet.html
# 3. fpenet - https://docs.nvidia.com/tao/tao-toolkit/text/facial_landmarks_estimation/facial_landmarks_estimation.html
# 4. lprnet - https://docs.nvidia.com/tao/tao-toolkit/text/character_recognition/index.html
# 5. ml_recog - https://docs.nvidia.com/tao/tao-toolkit/text/ml_recog/index.html
# 6. ocdnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocdnet/index.html
# 7. ocrnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocrnet/index.html
# 8. optical_inspection - https://docs.nvidia.com/tao/tao-toolkit/text/optical_inspection/index.html
# 9. pose_classification - https://docs.nvidia.com/tao/tao-toolkit/text/pose_classification/index.html
# 10. pointpillars - https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html
# 11. re_identification - https://docs.nvidia.com/tao/tao-toolkit/text/re_identification/index.html
# 12. centerpose -
# 13. visual_changenet_segment - https://docs.nvidia.com/tao/tao-toolkit/text/visual_changenet/index.html
# 14. visual_changenet_classify - https://docs.nvidia.com/tao/tao-toolkit/text/visual_changenet/index.html 

model_name = "action_recognition" # FIXME1 (Add the model name from the above mentioned list)

In [None]:
if model_name in ("action_recognition","fpenet","lprnet","pose_classification"):
    # FIXME1.1 - model_type - string
        # action-recognition: rgb/of/joint;
        # fpenet: 10/80 (value represents the number of keypoints)
        # lprnet: us/ch (us for United States, ch for China)
        # pose-classification: kinetics/nvidia
    model_type = "rgb"

    if model_name == "action_recognition":
        if model_type not in ("rgb","of","joint"):
            raise Exception("Choose one of rgb/of/joint for action recognition model_type")
    elif model_name == "fpenet":
        if model_type not in ("10","80"):
            raise Exception("Choose one of 10/80 for FPENET model_type")
    elif model_name == "lprnet":
        if model_type not in ("us","ch"):
            raise Exception("Choose one of us/ch for LPRNET model_type")
    elif model_name == "pose_classification":
        if model_type not in ("kinetics","nvidia"):
            raise Exception("Choose one of kinetics/nvidia for pose classification model_type")

    if model_name == "action_recognition":
        platform = "a100" # FIXME1.2 a100/xavier - valid only for model_type that is not rgb
        model_input_type = "3d" # FIXME1.3 3d/2d

In [None]:
workdir = "workdir_purpose_built_models" # FIXME2
host_url = "http://<ip_address>:<port_number>" # FIXME3 example: https://10.137.149.22:32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'
ngc_api_key = "<ngc_api_key>" # FIXME4 example: (Add NGC API key)

In [None]:
automl_enabled = False # FIXME5 set to True if you want to run automl for the model chosen in the previous cell
automl_algorithm = "bayesian" # FIXME6 example: bayesian/hyperband
# FIXME7 Defaulted to False as downloading jobs from service to your machine takes time
# Set to True if you want to download jobs where examples have been provided like for train, export, inference.
download_jobs = False

In [None]:
# Exchange NGC_API_KEY for JWT
data = json.dumps({"ngc_api_key": ngc_api_key})
response = requests.post(f"{host_url}/api/v1/login", data=data)
assert response.status_code in (200, 201)
assert "user_id" in response.json().keys()
user_id = response.json()["user_id"]
print("User ID",user_id)
assert "token" in response.json().keys()
token = response.json()["token"]
print("JWT",token)

# Set base URL
base_url = f"{host_url}/api/v1/users/{user_id}"
print("API Calls will be forwarded to",base_url)

headers = {"Authorization": f"Bearer {token}"}

In [None]:
# Creating workdir
if not os.path.isdir(workdir):
    os.makedirs(workdir)

### Function to split tar files <a class="anchor" id="head-1.1"></a>

In [None]:
import os
import tarfile

def split_tar_file(input_tar_path, output_dir, max_split_size=0.2*1024*1024*1024):
	os.makedirs(output_dir, exist_ok=True)
	
	with tarfile.open(input_tar_path, 'r') as original_tar:
		members = original_tar.getmembers()
		current_split_size = 0
		current_split_number = 0
		current_split_name = os.path.join(output_dir, f'smaller_file_{current_split_number}.tar')
		
		with tarfile.open(current_split_name, 'w') as split_tar:
			for member in members:
				if current_split_size + member.size <= max_split_size:
					split_tar.addfile(member, original_tar.extractfile(member))
					current_split_size += member.size
				else:
					split_tar.close()
					current_split_number += 1
					current_split_name = os.path.join(output_dir, f'smaller_file_{current_split_number}.tar')
					current_split_size = 0
					split_tar = tarfile.open(current_split_name, 'w')  # Open a new split tar archive
					split_tar.addfile(member, original_tar.extractfile(member))
					current_split_size += member.size

### Set dataset type, format <a class="anchor" id="head-1.1"></a>

**Action Recognition:** We will be using the HMDB51 [dataset](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) for the tutorial. (We choose catch/smile for this tutorial):

**BPNET:** We will be using the `COCO dataset` for Instance segmentation - MaskRCNN. `download_coco.sh` script from dataset prepare will be used to download and unzip the coco2017 dataset from [here](https://cocodataset.org/#download)

**FPENET:** We will be using `AFW dataset`. Download it from [here](https://ibug.doc.ic.ac.uk/download/annotations/afw.zip/) and place it in $DATA_DIR.

**LPRNET**: We will be using the `OpenALPR benchmark dataset` for the tutorial. The following script will download the dataset automatically and convert it to the format used by TAO.  

**MLRecogNet** We will be using the `Retail Product Checkout Dataset` for the tutorial. Downdload the datsaet from [here](https://www.kaggle.com/datasets/diyer22/retail-product-checkout-dataset) and place it under $DATA_DIR/metric_learning_recognition

**OCDNET**: We will be using the ICDAR2015 dataset for the ocdnet tutorial. Please access the dataset [here](https://rrc.cvc.uab.es/?ch=4&com=tasks) to register and download the data from Task 4.1: Text Localization. Unzip the files to DATA_DIR

**OCRNET**: We will be using the ICDAR15 word recognition dataset for the tutorial. To find more details please visit [here](
https://rrc.cvc.uab.es/?ch=4&com=tasks). Please download the ICDAR15 word recognition train dataset and test_dataset [here](https://rrc.cvc.uab.es/?ch=4&com=downloads) to DATA_DIR.

**Pointpillars:** We will be using the `kitti object detection dataset` for this example. To find more details, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)

**Pose Classification:** We will be using the Kinetics dataset from [Deepmind](https://deepmind.com/research/open-source/kinetics) or NVIDIA created dataset. For kinetics based dataset set model_type as `kinetics` and for nvidia based dataset set model_type as `nvidia`

**Re-Identification:** We will be using the [Market-1501](https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html) dataset. Download the dataset [here](https://drive.google.com/file/d/1TwkgQcIa_EgRjVMPSbyEKtcfljqURrzi/view?usp=sharing) and extract it.

**Optical Inspection:** Bring your own dataset according to the format described [here](https://docs.nvidia.com/tao/tao-toolkit/text/data_annotation_format.html#optical-inspection-format). 

**Visual ChangeNet-Classification:** Bring your own dataset according to the format described [here](https://docs.nvidia.com/tao/tao-toolkit/text/data_annotation_format.html#optical-inspection-format). 

**Visual ChangeNet-Segmentation:** We will be using the [Market-1501](https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html) dataset. Download the dataset [here](https://www.dropbox.com/s/18fb5jo0npu5evm/LEVIR-CD256.zip) and extract it. 

**CenterPose:** We will be using [Google Objectron](https://github.com/google-research-datasets/Objectron) dataset. The following script will download and preprocess the dataset the dataset automatically.

In [None]:
if model_name == "lprnet":
    ds_type = "character_recognition"
    ds_format = "lprnet"
elif model_name in ("visual_changenet_classify", "visual_changenet_segment"):
    ds_format = model_name
    ds_type = model_name = "visual_changenet"
else:
    ds_type = model_name
    ds_format = "default"

In [None]:
dataset_to_be_used = "default" #FIXME8 #default/custom; default for the dataset used in this tutorial notebook; custom for a different dataset
DATA_DIR = os.path.abspath(model_name) # FIXME9 (set absolute path of the data_directory)
os.environ['DATA_DIR']= DATA_DIR
!mkdir -p $DATA_DIR
job_map = {}

### Dataset download and pre-processing <a class="anchor" id="head-1"></a>

In [None]:
if dataset_to_be_used == "default":
    if model_name == "action_recognition":
        !sudo apt-get update -y && sudo apt-get install unrar-free -y
        !wget -P $DATA_DIR --no-check-certificate http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/hmdb51_org.rar
        assert os.path.exists(f"{DATA_DIR}/hmdb51_org.rar")
        !mkdir -p $DATA_DIR/videos && unrar x -o+ $DATA_DIR/hmdb51_org.rar $DATA_DIR/videos
        !mkdir -p $DATA_DIR/raw_data
        !unrar x -o+ $DATA_DIR/videos/catch.rar $DATA_DIR/raw_data
        !unrar x -o+ $DATA_DIR/videos/smile.rar $DATA_DIR/raw_data
        assert os.path.exists(f"{DATA_DIR}/raw_data/catch")
        assert os.path.exists(f"{DATA_DIR}/raw_data/smile")
    elif model_name == "bpnet":
        !bash ../dataset_prepare/coco/download_coco.sh $DATA_DIR
        # Remove existing data
        !rm -rf $DATA_DIR/train2017/images
        !rm -rf $DATA_DIR/val2017/images
        # Rearrange data in the required format
        !mv $DATA_DIR/raw-data/* $DATA_DIR/
        !cp ../dataset_prepare/bpnet/* $DATA_DIR/
        assert os.path.exists(f"{DATA_DIR}/train2017")
        assert os.path.exists(f"{DATA_DIR}/val2017")
        assert os.path.exists(f"{DATA_DIR}/annotations")
        assert os.path.exists(f"{DATA_DIR}/bpnet_18joints.json")
        assert os.path.exists(f"{DATA_DIR}/coco_spec.json")
        assert os.path.exists(f"{DATA_DIR}/infer_spec.yaml")
    elif model_name == "fpenet":
        assert os.path.exists(f"{DATA_DIR}/afw.zip")
        !mkdir $DATA_DIR/data
        !unzip -uq $DATA_DIR/afw.zip -d $DATA_DIR/data/afw
        !cp ../dataset_prepare/fpenet/data.json $DATA_DIR/
        assert os.path.exists(f"{DATA_DIR}/data/afw")
        assert os.path.exists(f"{DATA_DIR}/data.json")
    elif model_name == "lprnet":
        !python3 -m pip install --upgrade pip
        !python3 -m pip install "opencv-python>=3.4.0.12,<=4.5.5.64"
        !bash ../dataset_prepare/lprnet/download_and_prepare_data.sh $DATA_DIR
        assert os.path.exists(f"{DATA_DIR}/train/image")
        assert os.path.exists(f"{DATA_DIR}/train/label")
        assert os.path.exists(f"{DATA_DIR}/val/image")
        assert os.path.exists(f"{DATA_DIR}/val/label")
    elif model_name == "ml_recog":
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset.zip")
        !unzip -uq $DATA_DIR/metric_learning_recognition/retail-product-checkout-dataset.zip -d $DATA_DIR/metric_learning_recognition
    elif model_name == "ocdnet":
        assert(os.path.exists(f"{DATA_DIR}/train/img"))
        assert(os.path.exists(f"{DATA_DIR}/train/gt"))
        assert(os.path.exists(f"{DATA_DIR}/test/img"))
        assert(os.path.exists(f"{DATA_DIR}/test/gt"))
    elif model_name == "ocrnet":
        !mkdir -p $DATA_DIR/train && rm -rf $DATA_DIR/train/*
        !mkdir -p $DATA_DIR/test && rm -rf $DATA_DIR/test/*
        !unzip -u $DATA_DIR/ch4_test_word_images_gt.zip -d $DATA_DIR/test
        !cp $DATA_DIR/Challenge4_Test_Task3_GT.txt -d $DATA_DIR/test
        !unzip -u $DATA_DIR/ch4_training_word_images_gt.zip -d $DATA_DIR/train    
        assert os.path.exists(f"{DATA_DIR}/ch4_test_word_images_gt.zip")
        assert os.path.exists(f"{DATA_DIR}/Challenge4_Test_Task3_GT.txt")
        assert os.path.exists(f"{DATA_DIR}/ch4_training_word_images_gt.zip")
    elif model_name == "optical_inspection" or ds_format == "visual_changenet_classify":
        assert os.path.exists(f"{DATA_DIR}/train/images")
        assert os.path.exists(f"{DATA_DIR}/train/dataset.csv")
        assert os.path.exists(f"{DATA_DIR}/val/images")
        assert os.path.exists(f"{DATA_DIR}/val/dataset.csv")
        assert os.path.exists(f"{DATA_DIR}/test/images")
        assert os.path.exists(f"{DATA_DIR}/test/dataset.csv")
    elif model_name == "visual_changenet" and ds_format == "visual_changenet_segment":
        #Download the data
        URL_DATASET = "https://www.dropbox.com/s/18fb5jo0npu5evm/LEVIR-CD256.zip"
        os.environ["URL_DATASET"]=URL_DATASET
        !if [ ! -f $DATA_DIR/LEVIR-CD256.zip ]; then wget $URL_DATASET -O $DATA_DIR/LEVIR-CD-256.zip; else echo "image archive already downloaded"; fi 
        # Check the dataset is present
        !mkdir -p $DATA_DIR
        !if [ ! -f $DATA_DIR/LEVIR-CD-256.zip ]; then echo 'Dataset zip file not found, please download.'; else echo 'Found Dataset zip file.';fi
        # unpack 
        !unzip -u $DATA_DIR/LEVIR-CD-256.zip -d $DATA_DIR
        assert os.path.exists(f"{DATA_DIR}/LEVIR-CD256/A")
        assert os.path.exists(f"{DATA_DIR}/LEVIR-CD256/B")
        assert os.path.exists(f"{DATA_DIR}/LEVIR-CD256/label")
        assert os.path.exists(f"{DATA_DIR}/LEVIR-CD256/list/train.txt")
        assert os.path.exists(f"{DATA_DIR}/LEVIR-CD256/list/val.txt")
        assert os.path.exists(f"{DATA_DIR}/LEVIR-CD256/list/test.txt")
    elif model_name == "pointpillars":
        !unzip -u $DATA_DIR/data_object_image_2.zip -d $DATA_DIR
        !unzip -u $DATA_DIR/data_object_label_2.zip -d $DATA_DIR
        !unzip -u $DATA_DIR/data_object_velodyne.zip -d $DATA_DIR
        !unzip -u $DATA_DIR/data_object_calib.zip -d $DATA_DIR
        assert os.path.exists(f"{DATA_DIR}/training/image_2")
        assert os.path.exists(f"{DATA_DIR}/training/label_2")
        assert os.path.exists(f"{DATA_DIR}/training/velodyne")
        assert os.path.exists(f"{DATA_DIR}/training/calib")
    elif model_name == "pose_classification":
        !pip3 install -U gdown
        if model_type == "kinetics":
            !gdown https://drive.google.com/uc?id=1dmzCRQsFXJ18BlXj1G9sbDnsclXIdDdR -O $DATA_DIR/st-gcn-processed-data.zip
            !unzip $DATA_DIR/st-gcn-processed-data.zip -d $DATA_DIR
            !mv $DATA_DIR/data/Kinetics/kinetics-skeleton $DATA_DIR/kinetics
            !rm -r $DATA_DIR/data
            !rm $DATA_DIR/st-gcn-processed-data.zip
            assert os.path.exists(f"{DATA_DIR}/kinetics")
        elif model_type == "nvidia":
            !gdown https://drive.google.com/uc?id=1GhSt53-7MlFfauEZ2YkuzOaZVNIGo_c- -O $DATA_DIR/data_3dbp_nvidia.zip
            !mkdir -p $DATA_DIR/nvidia
            !unzip $DATA_DIR/data_3dbp_nvidia.zip -d $DATA_DIR/nvidia
            !rm $DATA_DIR/data_3dbp_nvidia.zip
            assert os.path.exists(f"{DATA_DIR}/nvidia")
            assert os.path.exists(f"{DATA_DIR}/{model_type}/train_data.npy") and os.path.exists(f"{DATA_DIR}/{model_type}/train_label.pkl") and os.path.exists(f"{DATA_DIR}/{model_type}/val_data.npy") and os.path.exists(f"{DATA_DIR}/{model_type}/val_label.pkl")
    elif model_name == "re_identification":
        !pip3 install -U gdown
        !gdown https://drive.google.com/uc?id=0B8-rUzbwVRk0c054eEozWG9COHM -O $DATA_DIR/market1501.zip
        !unzip -u $DATA_DIR/market1501.zip -d $DATA_DIR
        !rm -rf $DATA_DIR/market1501
        !mv $DATA_DIR/Market-1501-v15.09.15 $DATA_DIR/market1501
        !rm $DATA_DIR/market1501.zip
        assert os.path.exists(f"{DATA_DIR}/market1501")

In [None]:
if model_name in ("lprnet","ocdnet","ocrnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    eval_dataset_path = f"{DATA_DIR}/purpose_built_models_val.tar.gz"
if model_name in ("lprnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    test_dataset_path = f"{DATA_DIR}/purpose_built_models_test.tar.gz"
train_dataset_path = f"{DATA_DIR}/purpose_built_models_train.tar.gz"

In [None]:
# Create train dataset
data = json.dumps({"type":ds_type,"format":ds_format})
endpoint = f"{base_url}/datasets"
response = requests.post(endpoint,data=data,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(response.json())
assert "id" in response.json().keys()
train_dataset_id = response.json()["id"]

In [None]:
if dataset_to_be_used == "default":
    USER_EXPERIMENT_DIR = os.path.join("/shared/users",user_id,"datasets",train_dataset_id)
    if model_name == "action_recognition":
        !python3 -m pip install opencv-python numpy
        # For rgb action recognition
        !if [ -d tao_toolkit_recipes ]; then rm -rf tao_toolkit_recipes; fi
        !git clone https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes
        assert os.path.exists("tao_toolkit_recipes")
        !cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && bash ./preprocess_HMDB_RGB.sh $DATA_DIR/raw_data $DATA_DIR/processed_data 

        # For optical flow, comment the above 3 lines and uncomment the below (Note: for generating optical flow, a Turing or Ampere above GPU is needed.)
        #!echo <passwd> | sudo -S apt install -y libfreeimage-dev
        #!cp ../dataset_prepare/action_recognition/AppOFCuda tao_toolkit_recipes/tao_action_recognition/data_generation/
        #!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && bash ./preprocess_HMDB.sh $DATA_DIR/raw_data $DATA_DIR/processed_data

        # download the split files and unrar
        !wget -P $DATA_DIR --no-check-certificate http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/test_train_splits.rar
        assert os.path.exists(f"{DATA_DIR}/test_train_splits.rar")
        !mkdir -p $DATA_DIR/splits && unrar x -o+ $DATA_DIR/test_train_splits.rar $DATA_DIR/splits
        assert os.path.exists(f"{DATA_DIR}/splits")
        # run split_HMDB to generate training split
        !if [ -d $DATA_DIR/train ]; then rm -rf $DATA_DIR/train $DATA_DIR/test; fi
        !cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && python3 ./split_dataset.py $DATA_DIR/processed_data $DATA_DIR/splits/testTrainMulti_7030_splits $DATA_DIR/train  $DATA_DIR/test
        assert os.path.exists(f'{DATA_DIR}/train')
        assert os.path.exists(f'{DATA_DIR}/test')

        if os.path.exists("tao_toolkit_recipes"):
            shutil.rmtree("tao_toolkit_recipes")

        assert not os.path.exists("tao_toolkit_recipes")

    elif model_name == "fpenet":
        !pip3 install numpy opencv-python
        if model_type == "80":
            output_json_path = os.path.join(os.environ['DATA_DIR'], 'data/afw/afw.json')
        elif model_type == "10":
            output_json_path = os.path.join(os.environ['DATA_DIR'], 'data/afw_10/afw_10.json')
        !python3 ../dataset_prepare/fpenet/data_utils.py --afw_data_path $DATA_DIR/data/afw --output_json_path $output_json_path --afw_image_save_path $DATA_DIR/data/afw --num_key_points $model_type --container_root_path $USER_EXPERIMENT_DIR
        assert os.path.exists(output_json_path)
        with open(output_json_path, encoding='utf-8') as afw_json_file:
            afw_json = json.load(afw_json_file)
            assert afw_json

    elif model_name == "lprnet":
        character_file_link = "https://api.ngc.nvidia.com/v2/models/nvidia/tao/lprnet/versions/trainable_v1.0/files/{}_lp_characters.txt".format(model_type)
        !wget -q -O $DATA_DIR/train/characters.txt $character_file_link
        !cp $DATA_DIR/train/characters.txt $DATA_DIR/val/characters.txt
        assert os.path.exists(f"{DATA_DIR}/train/characters.txt")
        assert os.path.exists(f"{DATA_DIR}/val/characters.txt")

    elif model_name == "ml_recog":
        # crops images from detection set and form a classification set
        # splits to reference/train/val/test set
        !sudo apt-get update && sudo apt-get install gcc -y
        !python3 -m pip install opencv-python numpy pycocotools tqdm
        !python3 ../dataset_prepare/metric_learning_recognition/process_retail_product_checkout_dataset.py
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/known_classes")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/unknown_classes")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/known_classes/train")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/unknown_classes/train")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/known_classes/test")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/unknown_classes/test")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/known_classes/val")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/unknown_classes/val")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/known_classes/reference")
        assert os.path.exists(f"{DATA_DIR}/metric_learning_recognition/retail-product-checkout-dataset_classification_demo/unknown_classes/reference")

    elif model_name == "ocrnet":
        orig_train_gt_file=os.path.join(os.getenv("DATA_DIR"), "train", "gt.txt")
        processed_train_gt_file=os.path.join(os.getenv("DATA_DIR"), "train", "gt_new.txt")
        orig_test_gt_file=os.path.join(os.getenv("DATA_DIR"), "test", "Challenge4_Test_Task3_GT.txt")
        processed_test_gt_file=os.path.join(os.getenv("DATA_DIR"), "test", "gt_new.txt")
        !python3 ../dataset_prepare/ocrnet/preprocess_label.py $orig_train_gt_file $processed_train_gt_file
        !python3 ../dataset_prepare/ocrnet/preprocess_label.py $orig_test_gt_file $processed_test_gt_file

    elif model_name == "pointpillars":
        !python3 -m pip install scikit-image numpy
        !mkdir -p $DATA_DIR/train/lidar $DATA_DIR/train/label $DATA_DIR/val/lidar $DATA_DIR/val/label

        # Convert labels from Camera coordinate system to LIDAR coordinate system, etc
        !python3 ../dataset_prepare/pointpillars/gen_lidar_points.py -p $DATA_DIR/training/velodyne \
                                               -c $DATA_DIR/training/calib    \
                                               -i $DATA_DIR/training/image_2  \
                                               -o $DATA_DIR/train/lidar
        assert os.listdir(f"{DATA_DIR}/train/lidar")
        # Drop DontCare class
        !python3 ../dataset_prepare/pointpillars/gen_lidar_labels.py -l $DATA_DIR/training/label_2 \
                                               -c $DATA_DIR/training/calib \
                                               -o $DATA_DIR/train/label
        # train/val split
        !python3 ../dataset_prepare/pointpillars/drop_class.py $DATA_DIR/train/label DontCare
        assert os.listdir(f"{DATA_DIR}/train/label")
        # Change the val set id's if you need a different set of validation images
        !python3 ../dataset_prepare/pointpillars/kitti_split.py ../dataset_prepare/pointpillars/val.txt \
                                          $DATA_DIR/train/lidar \
                                          $DATA_DIR/train/label \
                                          $DATA_DIR/val/lidar \
                                          $DATA_DIR/val/label
        assert os.listdir(f"{DATA_DIR}/val/label")
        assert os.listdir(f"{DATA_DIR}/val/lidar")

    elif model_name == "pose_classification" and model_type == "kinetics":
        !pip3 install numpy
        # select actions
        !python3 ../dataset_prepare/pose_classification/select_subset_actions.py
        assert os.path.exists(f"{DATA_DIR}/{model_type}/train_data.npy") and os.path.exists(f"{DATA_DIR}/{model_type}/train_label.pkl") and os.path.exists(f"{DATA_DIR}/{model_type}/val_data.npy") and os.path.exists(f"{DATA_DIR}/{model_type}/val_label.pkl")

    elif model_name == "re_identification":
        #100 is the number of samples to be present in the subset data - you can choose any number <= total samples in the dataset
        !python3 ../dataset_prepare/re_identification/obtain_subset_data.py 100
        assert os.path.exists(f"{DATA_DIR}/market1501/sample_train")
        assert os.path.exists(f"{DATA_DIR}/market1501/sample_test")
        assert os.path.exists(f"{DATA_DIR}/market1501/sample_query")

    elif model_name == "centerpose":
        # Select the training categories from: bike, book, bottle, camera, cereal_box, chair, laptop, shoe
        # Please set the "n" to -1 if you want to run the whole dataset training.
        testing_categories = 'bike'
        !pip3 install numpy opencv-python tqdm scipy==1.9.2 tensorflow==2.14.0
        !python3 ../dataset_prepare/centerpose/prepare_centerpose_dataset.py \
                                            -c $testing_categories \
                                            -n 100
        assert os.path.exists(f"{DATA_DIR}/{testing_categories}/train")
        assert os.path.exists(f"{DATA_DIR}/{testing_categories}/test")
        assert os.path.exists(f"{DATA_DIR}/{testing_categories}/val")

In [None]:
# Update
docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables 
dataset_information = {"name":"Train dataset",
                       "description":"My train dataset",
                       "docker_env_vars": docker_env_vars}
data = json.dumps(dataset_information)

endpoint = f"{base_url}/datasets/{train_dataset_id}"

response = requests.patch(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)

print(response)
print(response.json())

### Tar the datasets <a class="anchor" id="head-1.3"></a>

In [None]:
if model_name == "action_recognition":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train test
elif model_name == "bpnet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train2017 val2017 annotations bpnet_18joints.json  coco_spec.json  infer_spec.yaml
elif model_name == "fpenet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz data data.json
elif model_name == "lprnet":
    !tar -C $DATA_DIR/train/ -czf $DATA_DIR/purpose_built_models_train.tar.gz image label characters.txt
    !tar -C $DATA_DIR/val/ -czf $DATA_DIR/purpose_built_models_val.tar.gz image label characters.txt
    !tar -C $DATA_DIR/val/ -czf $DATA_DIR/purpose_built_models_test.tar.gz image
elif model_name == "ml_recog":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz metric_learning_recognition/retail-product-checkout-dataset_classification_demo/
elif model_name == "ocdnet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_val.tar.gz test
elif model_name == "ocrnet":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train character_list
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_val.tar.gz test character_list
elif model_name == "optical_inspection" or ds_format == "visual_changenet_classify":
    !tar -C $DATA_DIR/train -czf $DATA_DIR/purpose_built_models_train.tar.gz images dataset.csv
    !tar -C $DATA_DIR/val -czf $DATA_DIR/purpose_built_models_val.tar.gz images dataset.csv
    !tar -C $DATA_DIR/test -czf $DATA_DIR/purpose_built_models_test.tar.gz images dataset.csv
elif model_name == "visual_changenet" and ds_format == "visual_changenet_segment":
    !tar -C $DATA_DIR/LEVIR-CD256 -czf $DATA_DIR/purpose_built_models_train.tar.gz A B list label
elif model_name == "pointpillars":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz train val
elif model_name == "pose_classification":
    !tar -C $DATA_DIR -czf $DATA_DIR/purpose_built_models_train.tar.gz $model_type
elif model_name == "re_identification":
    !tar -C $DATA_DIR/market1501 -czf $DATA_DIR/purpose_built_models_train.tar.gz sample_train sample_test sample_query
elif model_name == 'centerpose':
    !tar -C $DATA_DIR/{testing_categories} -czf $DATA_DIR/purpose_built_models_train.tar.gz train val test

### Upload train dataset <a class="anchor" id="head-1.3"></a>

In [None]:
# Upload
output_dir = os.path.join(os.path.dirname(os.path.abspath(train_dataset_path)), model_name, "train")
split_tar_file(train_dataset_path, output_dir)
for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):
    print(f"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split")
    files = [("file",open(os.path.join(output_dir, tar_dataset_path),"rb"))]

    endpoint = f"{base_url}/datasets/{train_dataset_id}:upload"

    response = requests.post(endpoint, files=files, headers=headers)
    assert response.status_code in (200, 201)
    assert "message" in response.json().keys() and response.json()["message"] == "Server recieved file and upload process started"

    print(response)
    print(response.json())

### Create and upload val dataset <a class="anchor" id="head-1.3"></a>

In [None]:
# Create eval dataset
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    data = json.dumps({"type":ds_type,"format":ds_format})

    endpoint = f"{base_url}/datasets"

    response = requests.post(endpoint,data=data,headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(response.json())
    assert "id" in response.json().keys()
    eval_dataset_id = response.json()["id"]

In [None]:
# Update
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables 
    dataset_information = {"name":"Eval dataset",
                           "description":"My eval dataset with OpenALPR",
                           "docker_env_vars": docker_env_vars}
    data = json.dumps(dataset_information)

    endpoint = f"{base_url}/datasets/{eval_dataset_id}"

    response = requests.patch(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(response.json())

In [None]:
# Upload
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    output_dir = os.path.join(os.path.dirname(os.path.abspath(eval_dataset_path)), model_name, "eval")
    split_tar_file(eval_dataset_path, output_dir)
    for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):
        print(f"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split")
        files = [("file",open(os.path.join(output_dir, tar_dataset_path),"rb"))]

        endpoint = f"{base_url}/datasets/{eval_dataset_id}:upload"

        response = requests.post(endpoint, files=files, headers=headers)
        assert response.status_code in (200, 201)
        assert "message" in response.json().keys() and response.json()["message"] == "Server recieved file and upload process started"

        print(response)
        print(response.json())

### Create and upload test dataset <a class="anchor" id="head-1.4"></a>

In [None]:
# Create testing dataset for inference
if model_name in ("lprnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    if model_name == "lprnet":
        ds_type = "character_recognition"
        ds_format = "raw"
    elif ds_format == "visual_changenet_classify": 
        ds_type = "visual_changenet"
        ds_format = "visual_changenet_classify"
    else:
        ds_type = model_name
        ds_format = "default"

    data = json.dumps({"type":ds_type,"format":ds_format})

    endpoint = f"{base_url}/datasets"

    response = requests.post(endpoint,data=data, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(response.json())
    assert "id" in response.json().keys()
    test_dataset_id = response.json()["id"]

In [None]:
# Upload
if model_name in ("lprnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    output_dir = os.path.join(os.path.dirname(os.path.abspath(test_dataset_path)), model_name, "test")
    split_tar_file(test_dataset_path, output_dir)
    for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):
        print(f"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split")
        files = [("file",open(os.path.join(output_dir, tar_dataset_path),"rb"))]

        endpoint = f"{base_url}/datasets/{test_dataset_id}:upload"

        response = requests.post(endpoint, files=files, headers=headers)
        assert response.status_code in (200, 201)
        assert "message" in response.json().keys() and response.json()["message"] == "Server recieved file and upload process started"

        print(response)
        print(response.json())

### List the created datasets <a class="anchor" id="head-2"></a>

In [None]:
endpoint = f"{base_url}/datasets"

response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)

print(response)
# print(response.json()) ## Uncomment for verbose list output
print("id\t\t\t\t\t type\t\t\t format\t\t name")
for rsp in response.json():
    rsp_keys = rsp.keys()
    assert "id" in rsp_keys
    assert "type" in rsp_keys
    assert "format" in rsp_keys
    assert "name" in rsp_keys
    print(rsp["id"],"\t",rsp["type"],"\t",rsp["format"],"\t\t",rsp["name"])

### Train Dataset convert Action <a class="anchor" id="head-3"></a>

In [None]:
convert_action = "dataset_convert"

In [None]:
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    # Get default spec schema
    endpoint = f"{base_url}/datasets/{train_dataset_id}/specs/{convert_action}/schema"

    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    # print(response.json()) ## Uncomment for verbose schema

    assert "default" in response.json().keys()
    train_ds_convert_specs = response.json()["default"]

    print(json.dumps(train_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs dictionary if necessary
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    if model_name == "bpnet":
        train_ds_convert_specs["mode"] = "train"
    elif model_name == "fpenet":
        train_ds_convert_specs["num_keypoints"] = int(model_type)
    print(json.dumps(train_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    parent = None
    action = convert_action
    data = json.dumps({"parent_job_id":parent,"action":action, "specs":train_ds_convert_specs})

    endpoint = f"{base_url}/datasets/{train_dataset_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()
    print(response)
    print(response.json())

    train_ds_convert_id = response.json()
    job_map["train_dataset_convert_"+model_name] = train_ds_convert_id

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "fpenet", "ocrnet", "pointpillars"):
    job_id = train_ds_convert_id
    endpoint = f"{base_url}/datasets/{train_dataset_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True) 
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(response.json())
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Eval Dataset convert Action <a class="anchor" id="head-3"></a>

In [None]:
if model_name in ("bpnet", "ocrnet"):
    # Get default spec schema
    if model_name == "bpnet":
        endpoint = f"{base_url}/datasets/{train_dataset_id}/specs/{convert_action}/schema"
    else:
        endpoint = f"{base_url}/datasets/{eval_dataset_id}/specs/{convert_action}/schema"

    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    # print(response.json()) ## Uncomment for verbose schema

    assert "default" in response.json().keys()
    eval_ds_convert_specs = response.json()["default"]

    print(json.dumps(eval_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs dictionary if necessary
if model_name in ("bpnet", "ocrnet"):
    if model_name == "bpnet":
        eval_ds_convert_specs["mode"] = "test"
    print(json.dumps(eval_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("bpnet", "ocrnet"):
    parent = job_map["train_dataset_convert_"+model_name]
    action = convert_action
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":eval_ds_convert_specs})

    if model_name == "bpnet":
        endpoint = f"{base_url}/datasets/{train_dataset_id}/jobs"
    else:
        endpoint = f"{base_url}/datasets/{eval_dataset_id}/jobs"
    
    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(response.json())

    eval_ds_convert_id = response.json()
    job_map["eval_dataset_convert_"+model_name] = eval_ds_convert_id

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "ocrnet"):
    job_id = eval_ds_convert_id
    if model_name == "bpnet":
        endpoint = f"{base_url}/datasets/{train_dataset_id}/jobs/{job_id}"
    else:
        endpoint = f"{base_url}/datasets/{eval_dataset_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True) 
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(response.json())
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Create an experiment <a class="anchor" id="head-4"></a>

In [None]:
if model_name in ("action_recognition", "pose_classification", "ml_recog", "ocrnet", "ocdnet", "optical_inspection", "re_identification"):
    encode_key = "nvidia_tao"
elif model_name == "pointpillars":
    encode_key = "tlt_encode"
else:
    encode_key = "nvidia_tlt"

checkpoint_choose_method = "best_model"
data = json.dumps({"network_arch":model_name,"encryption_key":encode_key,"checkpoint_choose_method":checkpoint_choose_method})

endpoint = f"{base_url}/experiments"
response = requests.post(endpoint,data=data,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(response.json())
assert "id" in response.json().keys()
experiment_id = response.json()["id"]

### List experiments <a class="anchor" id="head-5"></a>

In [None]:
endpoint = f"{base_url}/experiments"
params = {"network_arch": model_name}
response = requests.get(endpoint, params=params, headers=headers)
assert response.status_code in (200, 201)

print(response)
# print(response.json()) ## Uncomment for verbose list output
print("model id\t\t\t     network architecture")
for rsp in response.json():
    rsp_keys = rsp.keys()
    assert "id" in rsp_keys and "network_arch" in rsp_keys
    print(rsp["name"], rsp["id"], rsp["network_arch"])

### Assign train, eval datasets <a class="anchor" id="head-6"></a>

In [None]:
docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables 
dataset_information = {}
dataset_information["train_datasets"] = [train_dataset_id]
if model_name in ("bpnet","fpenet","lprnet","ml_recog","ocdnet","ocrnet"):
    dataset_information["calibration_dataset"] = train_dataset_id
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection"):
    dataset_information["eval_dataset"] = eval_dataset_id
if model_name in ("lprnet", "optical_inspection"):
    dataset_information["inference_dataset"] = test_dataset_id
if model_name in ("centerpose"):
    dataset_information["eval_dataset"] = train_dataset_id
    dataset_information["inference_dataset"] = train_dataset_id
if model_name in ("visual_changenet") and ds_format in ("visual_changenet_classify"):
    dataset_information["eval_dataset"] = eval_dataset_id
    dataset_information["inference_dataset"] = test_dataset_id

dataset_information["docker_env_vars"] = docker_env_vars

data = json.dumps(dataset_information)

endpoint = f"{base_url}/experiments/{experiment_id}"

response = requests.patch(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)

print(response)
print(response.json())

### Assign PTM <a class="anchor" id="head-7"></a>

Search for PTM on NGC for the Purpose built model chosen

In [None]:
# List all pretrained models for the chosen network architecture
endpoint = f"{base_url}/experiments"
params = {"network_arch": model_name}
response = requests.get(endpoint, params=params, headers=headers)
assert response.status_code in (200, 201)

response_json = response.json()

# Search for ptm with given ngc path
for rsp in response_json:
    rsp_keys = rsp.keys()
    if "encryption_key" not in rsp.keys():
        assert "name" in rsp_keys and "version" in rsp_keys and "ngc_path" in rsp_keys and "additional_id_info" in rsp_keys
        print(f'PTM Name: {rsp["name"]}; PTM version: {rsp["version"]}; NGC PATH: {rsp["ngc_path"]}; Additional info: {rsp["additional_id_info"]}')

In [None]:
# Assigning pretrained models to different purpose built models versions
# From the output of previous cell make the appropriate changes to this map if you want to change the default PTM backbone.
# Changing the default backbone here requires changing default spec/config during train/eval etc like for example
# If you are changing the ptm to resnet34, then you have to modify the config key num_layers if it exists to 34 manually
visual_changenet_ptm = "visual_changenet_segmentation_levircd:visual_changenet_levircd_trainable_v1.0" # For segmentation
if model_name == 'visual_changenet' and ds_format == 'visual_changenet_classify':
    visual_changenet_ptm = "visual_changenet_classification:visual_changenet_nvpcb_trainable_v1.0"
pretrained_map = {"action_recognition":"actionrecognitionnet:trainable_v1.0",
                  "bpnet" : "bodyposenet:trainable_v1.0",
                  "fpenet" : "fpenet:trainable_v1.0",
                  "lprnet": "lprnet:trainable_v1.0",
                  "ml_recog": "retail_object_recognition:trainable_v1.0",
                  "ocdnet": "ocdnet:trainable_resnet18_v1.0",
                  "ocrnet": "ocrnet:trainable_v1.0",
                  "optical_inspection": "optical_inspection:trainable_v1.0",
                  "pointpillars":"pointpillarnet:trainable_v1.0",
                  "pose_classification":"poseclassificationnet:trainable_v1.0",
                  "re_identification":"reidentificationnet:trainable_v1.1",
                  "visual_changenet":visual_changenet_ptm,
                  "centerpose": "pretrained_fan_classification_nvimagenet:fan_small_hybrid_nvimagenet"}
if model_name == "action_recognition":
    if model_type == "of":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v2.0"
    elif model_type == "joint":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v1.0,actionrecognitionnet:trainable_v2.0"
        
no_ptm_models = set([])

In [None]:
if model_name not in no_ptm_models:
    # Get pretrained model
    endpoint = f"{base_url}/experiments"
    params = {"network_arch": model_name}
    response = requests.get(endpoint, params=params, headers=headers)
    assert response.status_code in (200, 201)

    response_json = response.json()
    ptm_model_names = pretrained_map[model_name].split(",")
    ptm = []

    # Search for ptm with given ngc path
    for ptm_model_name in ptm_model_names:
        ptm_id = None
        for rsp in response_json:
            rsp_keys = rsp.keys()
            assert "ngc_path" in rsp_keys
            if rsp["ngc_path"].endswith(ptm_model_name):
                additional_id_info = []
                if rsp["additional_id_info"]:
                    assert "additional_id_info" in rsp_keys
                    additional_id_info = rsp["additional_id_info"].split(",")
                if (len(additional_id_info) == 0) or \
                    (model_name == "lprnet" and len(additional_id_info) == 1 and additional_id_info[0] == model_type) or \
                    (model_name == "action_recognition" and len(additional_id_info) == 1 and additional_id_info[0] == model_input_type) or \
                    (model_name == "action_recognition" and len(additional_id_info) == 2 and additional_id_info[0] == platform and additional_id_info[1] == model_input_type):
                    assert "id" in rsp_keys
                    ptm_id = rsp["id"]
                    print("Metadata for model with requested NGC Path")
                    print(rsp)
                    break
        ptm.append(ptm_id)

In [None]:
if model_name not in no_ptm_models:
    ptm_information = {"base_experiment":ptm}
    data = json.dumps(ptm_information)
    endpoint = f"{base_url}/experiments/{experiment_id}"

    response = requests.patch(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(response.json())

### View hyperparameters that are enabled for AutoML by default <a class="anchor" id="head-8"></a>

In [None]:
if automl_enabled:
    # Get default spec schema
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/train/schema"
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    assert "automl_default_parameters" in response.json().keys()
    automl_specs = response.json()["automl_default_parameters"]
    print(json.dumps(automl_specs, sort_keys=True, indent=4))

### Actions <a class="anchor" id="head-10"></a>

For all actions:
1. Get default spec schema and derive the default values
2. Modify defaults if needed
3. Post spec dictionary to the service
4. Run model action
5. Monitor job using retrieve
6. Download results using job download endpoint (if needed)

### Train <a class="anchor" id="head-11"></a>

#### Set AutoML related configurations <a class="anchor" id="head-9"></a>
Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters:

[ActionRecognitionNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/action_recognition/action_recognition%20-%20train.csv), 
[BPNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/bpet/bpnet%20-%20train.csv), 
[FPENET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/fpenet/fpenet%20-%20train.csv), 
[LPRNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/lprnet/lprnet%20-%20train.csv), 
[MetricLearningRecognition](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ml_recog/ml_recog%20-%20train.csv), 
[OCDNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocdnet/ocdnet%20-%20train.csv), 
[OCRNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocrnet/ocrnet%20-%20train.csv), 
[OpticalInspection](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/optical_inspection/optical_inspection%20-%20train.csv), 
[Pointpillars](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pointpillars/pointpillars%20-%20train.csv), 
[PoseClassificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pose_classification/pose_classification%20-%20train.csv), 
[ReIdentificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/re_identification/re_identification%20-%20train.csv)

In [None]:
if automl_enabled:
    # Choose any metric that is present in the kpi dictionary present in the model's status.json. 
    # Example status.json for each model can be found in the respective section in NVIDIA TAO DOCS here: https://docs.nvidia.com/tao/tao-toolkit/text/model_zoo/cv_models/index.html
    metric = "kpi"

    additional_automl_parameters = [] #Refer to parameter list mentioned in the above links and add any extra parameter in addition to the default enabled ones
    remove_default_automl_parameters = [] #Remove any hyperparameters that are enabled by default for AutoML

    automl_information = {"automl_enabled":True,
                          "automl_algorithm":automl_algorithm,
                          "metric":metric,
                          "automl_max_recommendations": 20, # Only for bayesian
                          "automl_R": 27, # Only for hyperband
                          "automl_nu": 3, # Only for hyperband
                          "epoch_multiplier": 1, # Only for hyperband
                          # Enable this if you want to add parameters to automl_add_hyperparameters below that are disabled by TAO in the automl_enabled column of the spec csv.
                          # Warning: The parameters that are disabled are not tested by TAO, so there might be unexpected behaviour in overriding this
                          "override_automl_disabled_params": False,
                          "automl_add_hyperparameters":str(additional_automl_parameters),
                          "automl_remove_hyperparameters":str(remove_default_automl_parameters)
                        }
    data = json.dumps(automl_information)

    endpoint = f"{base_url}/experiments/{experiment_id}"

    response = requests.patch(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Get default spec schema
endpoint = f"{base_url}/experiments/{experiment_id}/specs/train/schema"

response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)

print(response)
print(response.json()) ## Uncomment for verbose schema
assert "default" in response.json().keys()
train_specs = response.json()["default"]
print(json.dumps(train_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
if model_name == "action_recognition":
    train_specs["model"]["model_type"] = model_type
    train_specs["model"]["input_type"] = model_input_type
    train_specs["train"]["num_epochs"] = 20
    train_specs["train"]["num_gpus"] = 1
elif model_name == "bpnet":
    train_specs["num_epoch"] = 20
    train_specs["checkpoint_n_epoch"] = 5
    train_specs["validation_every_n_epoch"] = 5
    train_specs["finetuning_config"]["checkpoint_path"] = None
    train_specs["gpus"] = 1
elif model_name == "centerpose":
    train_specs["train"]["num_epochs"] = 20 # Please set it to 140 if you want to run the whole training.
    train_specs["train"]["validation_interval"] = 10
    train_specs["train"]["checkpoint_interval"] = 10
    train_specs["train"]["num_gpus"] = 1
    train_specs["dataset"]["category"] = testing_categories
elif model_name == "fpenet":
    train_specs["num_epoch"] = 10
    train_specs["checkpoint_n_epoch"] = 5
    train_specs["dataloader"]["dataset_info"]["root_path"] = None
    train_specs["num_keypoints"] = int(model_type)
    train_specs["dataloader"]["num_keypoints"] = int(model_type)
    train_specs["gpus"] = 1
elif model_name == "lprnet":
    train_specs["training_config"]["num_epochs"] = 24
    train_specs["gpus"] = 1
elif model_name == "ml_recog":
    train_specs["train"]["num_epochs"] = 30
    train_specs["train"]["gpu_ids"] = [0]
    train_specs["train"]["checkpoint_interval"] = 5
elif model_name == "ocdnet":
    train_specs["train"]["num_epochs"] = 30
    train_specs["train"]["checkpoint_interval"] = 5
    train_specs["train"]["validation_interval"] = 5
    train_specs["train"]["gpu_id"] = [0]
    train_specs["num_gpus"] = 1
elif model_name == "ocrnet":
    train_specs["train"]["num_epochs"] = 20
    train_specs["train"]["checkpoint_interval"] = 5
    train_specs["train"]["validation_interval"] = 5
    train_specs["train"]["num_gpus"] = 1
elif model_name == "optical_inspection":
    train_specs["train"]["num_epochs"] = 20
    train_specs["train"]["checkpoint_interval"] = 5
    train_specs["train"]["validation_interval"] = 5
    train_specs["train"]["gpu_ids"] = [0]
elif model_name == "pose_classification":
    train_specs["train"]["num_epochs"] = 50
    train_specs["train"]["gpu_ids"] = [0]
    train_specs["train"]["num_gpus"] = 1
    if model_type == "nvidia":
        train_specs["dataset"]["num_classes"] = 6
        train_specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        train_specs["dataset"]["num_classes"] = 5
        train_specs["model"]["graph_layout"] = "openpose"
elif model_name == "pointpillars":
    train_specs["train"]["num_epochs"] = 80
    train_specs["gpus"] = 1
elif model_name == "re_identification":
    train_specs["train"]["num_epochs"] = 120
    train_specs["train"]["gpu_ids"] = [0]
    train_specs["train"]["num_gpus"] = 1
    train_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
    train_specs["dataset"]["num_workers"] = 4 #Modify the num_workers according to your hardware setup
    train_specs["dataset"]["batch_size"] = 16 #Modify the batch_size according to your hardware setup
elif model_name == "visual_changenet":
    train_specs["train"]["num_epochs"] = 30 
    train_specs["train"]["checkpoint_interval"] = 2
    train_specs["train"]["val_interval"] = 5
    train_specs["num_gpus"] = 1
    if ds_format == "visual_changenet_segment":
        train_specs["task"] = 'segment'
    elif ds_format == "visual_changenet_classify":
        train_specs["task"] = 'classify'
print(json.dumps(train_specs, sort_keys=True, indent=4))

In [None]:
# Run action
parent = job_map.get("eval_dataset_convert_"+model_name, job_map.get("train_dataset_convert_"+model_name, None))
parent_id = train_dataset_id
if model_name == "ocrnet": # Only model with eval dataset convert on eval dataset
    parent_id = eval_dataset_id
action = "train"
data = json.dumps({"parent_job_id":parent,"action":action,"specs":train_specs})
endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(response.json())

if model_name == "visual_changenet":
    job_map["train_" + ds_format] = response.json()
else:
    job_map["train_" + model_name] = response.json()
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
# For automl: Training times for different models benchmarked on 1 GPU V100 machine can be found here: https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#results-of-automl-experiments

if model_name == "visual_changenet":
    job_id = job_map["train_" + ds_format]
else:
    job_id = job_map["train_" + model_name]
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    if "error_desc" in response.json().keys() and response.json()["error_desc"] in ("Job trying to retrieve not found", "No AutoML run found"):
        print("Job is being created")
        time.sleep(5)
        continue
    assert response.status_code in (200, 201)
    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))
    assert "status" in response.json().keys() and response.json().get("status") != "Error"
    if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
## To Stop an AutoML JOB
#    1. Stop the 'Monitor job status by repeatedly running this cell' cell (the cell right before this cell) manually
#    2. Uncomment the snippet in the next cell and run the cell

In [None]:
# if automl_enabled:
#     if model_name == "visual_changenet":
#          job_id = job_map["train_" + ds_format]
#     else:
#         job_id = job_map["train_" + model_name]

#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:cancel"

#     response = requests.post(endpoint, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(response.json())

In [None]:
## Resume AutoML

In [None]:
# Uncomment the below snippet if you want to resume an already stopped AutoML job and then run the 'Monitor job status by repeatedly running this cell' cell above (4th cell above from this cell)
# if automl_enabled:
#     if model_name == "visual_changenet":
#          job_id = job_map["train_" + ds_format]
#     else:
#         job_id = job_map["train_" + model_name]
#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:resume"

#     data = json.dumps({"parent_job_id":parent,"specs":train_specs})
#     response = requests.post(endpoint, data=data, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(response.json())

### Download train job artifacts <a class="anchor" id="head-12"></a>

In [None]:
# Example to list the files of the executed train job
if model_name == "visual_changenet":
    job_id = job_map["train_" + ds_format]
else:
    job_id = job_map["train_" + model_name]
endpoint = f'{base_url}/experiments/{experiment_id}/jobs/{job_id}:list_files'

response = requests.get(endpoint, headers=headers)
print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
## Patch the model with proper metric before training to run this cell; By default loss is used, but some models dont log the parameter under the name 'loss'

# # Download selective job contents once the above job shows "Done" status
# # Example to download selective files of train job (Note: will take time)
# endpoint = f'{base_url}/experiments/{experiment_id}/jobs/{job_id}:download_selective_files'

# file_lists = [] # Choose file names from the previous cell where all the files for this job were listed
# best_model = False # Enable this to download the checkpoint of the best performing model w.r.t to the metric chosen before starting training
# latest_model = True # Enable this to download the latest checkpoint of the training job; Disable best_model to use latest_model

# params = {"file_lists": file_lists, "best_model": best_model, "latest_model": latest_model}

# # Save
# temptar = f'{job_id}.tar.gz'
# with requests.get(endpoint, headers=headers, params=params, stream=True) as r:
#     r.raise_for_status()
#     with open(temptar, 'wb') as f:
#         for chunk in r.iter_content(chunk_size=8192):
#             f.write(chunk)

# print("Untarring")
# # Untar to destination
# tar_command = f'tar -xvf {temptar} -C {workdir}/'
# os.system(tar_command)
# os.remove(temptar)
# print(f"Results at {workdir}/{job_id}")
# model_downloaded_path = f"{workdir}/{job_id}"

In [None]:
# Downloading train job takes a longer time, uncomment this cell if you want to still proceed
if download_jobs:
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    expected_file_size = response.json().get("job_tar_stats", {}).get("file_size")
    print("expected_file_size: ", expected_file_size)

    !python3 -m pip install tqdm
    from tqdm import tqdm

    endpoint = f'{base_url}/experiments/{experiment_id}/jobs/{job_id}:download'
    temptar = f'{job_id}.tar.gz'

    with tqdm(total=expected_file_size, unit='B', unit_scale=True) as progress_bar:
        while True:
            # Check if the file already exists
            headers_download_job = dict(headers)
            if os.path.exists(temptar):
                # Get the current file size
                file_size = os.path.getsize(temptar)
                print(f"File size of dowloaded content until now is {file_size}")

                # If the file size matches the expected size, break out of the loop
                if file_size >= (expected_file_size-1):
                    print("Download completed successfully.")
                    print("Untarring")
                    # Untar to destination
                    tar_command = f'tar -xf {temptar} -C {workdir}/'
                    os.system(tar_command)
                    os.remove(temptar)
                    print(f"Results at {workdir}/{job_id}")
                    model_downloaded_path = f"{workdir}/{job_id}"
                    break

                # Set the headers to resume the download from where it left off
                headers_download_job['Range'] = f'bytes={file_size}-'
            # Open the file for writing in binary mode
            with open(temptar, 'ab') as f:
                try:
                    response = requests.get(endpoint, headers=headers_download_job, stream=True)
                    print(response)
                    # Check if the request was successful
                    if response.status_code in [200, 206]:
                        # Iterate over the content in chunks
                        for chunk in response.iter_content(chunk_size=1024):
                            if chunk:
                                # Write the chunk to the file
                                f.write(chunk)
                                # Flush and sync the file to disk
                                f.flush()
                                os.fsync(f.fileno())
                            progress_bar.update(len(chunk))
                    else:
                        print(f"Failed to download file. Status code: {response.status_code}")
                except requests.exceptions.RequestException as e:
                    print("Connection interrupted during download, resuming download from breaking point")
                    time.sleep(5)  # Sleep for a while before retrying the request
                    continue  # Continue the loop to retry the request

In [None]:
# View the checkpoints generated for the training job and for automl jobs, in addition view: best performing model's config and the results of all automl experiments

if download_jobs:
    if automl_enabled:
        !python3 -m pip install pandas==1.5.1
        import pandas as pd
        model_downloaded_path = f"{model_downloaded_path}/best_model"
        assert glob.glob(f"{model_downloaded_path}/*.protobuf") or glob.glob(f"{model_downloaded_path}/*.yaml")

    assert os.path.exists(model_downloaded_path)
    assert (glob.glob(model_downloaded_path + "/**/*.tlt", recursive=True) + glob.glob(model_downloaded_path + "/**/*.hdf5", recursive=True) + glob.glob(model_downloaded_path + "/**/*.pth", recursive=True))

    if os.path.exists(model_downloaded_path):        
        #List the binary model file
        print("\nCheckpoints for the training experiment")
        if os.path.exists(model_downloaded_path+"/train/weights") and len(os.listdir(model_downloaded_path+"/train/weights")) > 0:
            print(f"Folder: {model_downloaded_path}/train/weights")
            print("Files:", os.listdir(model_downloaded_path+"/train/weights"))
        elif os.path.exists(model_downloaded_path+"/weights") and len(os.listdir(model_downloaded_path+"/weights")) > 0:
            print(f"Folder: {model_downloaded_path}/weights")
            print("Files:", os.listdir(model_downloaded_path+"/weights"))
        else:
            print(f"Folder: {model_downloaded_path}")
            print("Files:", os.listdir(model_downloaded_path))

        if automl_enabled:
            assert glob.glob(f"{model_downloaded_path}/*.protobuf") or glob.glob(f"{model_downloaded_path}/*.yaml")
            experiment_artifacts = json.load(open(f"{model_downloaded_path}/controller.json","r"))
            data_frame = pd.DataFrame(experiment_artifacts)
            # Print experiment id/number and the corresponding result
            print("\nResults of all experiments")
            with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):
                print(data_frame[["id","result"]])

### Evaluate <a class="anchor" id="head-12"></a>

In [None]:
# Get model handler parameters
endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

model_parameters = response.json()
update_checkpoint_choosing = {}
update_checkpoint_choosing["checkpoint_choose_method"] = model_parameters["checkpoint_choose_method"]
update_checkpoint_choosing["checkpoint_epoch_number"] = model_parameters["checkpoint_epoch_number"]
print(update_checkpoint_choosing)

In [None]:
# Change the method by which checkpoint from the parent action is chosen, when parent action is a train/retrain action.
# Example for evaluate action below, can be applied in the same way for other actions too
update_checkpoint_choosing["checkpoint_choose_method"] = "latest_model" # Choose between best_model/latest_model/from_epoch_number
# If from_epoch_number is chosen then assign the epoch number to the dictionary key in the format 'from_epoch_number{train_job_id}'
# update_checkpoint_choosing["checkpoint_epoch_number"]["from_epoch_number_28a2754e-50ef-43a8-9733-98913776dd90"] = 3
data = json.dumps(update_checkpoint_choosing)

endpoint = f"{base_url}/experiments/{experiment_id}"

response = requests.patch(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
print(response)
print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Get default spec schema
endpoint = f"{base_url}/experiments/{experiment_id}/specs/evaluate/schema"

response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)

print(response)
#print(response.json()) ## Uncomment for verbose schema
assert "default" in response.json().keys()
eval_specs = response.json()["default"]
print(json.dumps(eval_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
if model_name == "action_recognition":
    eval_specs["model"]["model_type"] = model_type
    eval_specs["model"]["input_type"] = model_input_type
elif model_name == "fpenet":
    eval_specs["dataloader"]["dataset_info"]["root_path"] = None
    eval_specs["num_keypoints"] = int(model_type)
    eval_specs["dataloader"]["num_keypoints"] = int(model_type)
elif model_name == "pose_classification":
    if model_type == "nvidia":
        eval_specs["dataset"]["num_classes"] = 6
        eval_specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        eval_specs["dataset"]["num_classes"] = 5
        eval_specs["model"]["graph_layout"] = "openpose"
elif model_name == "re_identification":
    eval_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
    eval_specs["task"] = 'segment'
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
    eval_specs["task"] = 'classify'
    eval_specs["train"]["classify"]["loss"] = "contrastive"
elif model_name == "centerpose":
    eval_specs["dataset"]["category"] = testing_categories
print(json.dumps(eval_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name == "visual_changenet":
    parent = job_map["train_" + ds_format]
else:
    parent = job_map["train_" + model_name]
action = "evaluate"
data = json.dumps({"parent_job_id":parent,"action":action,"specs":eval_specs})

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(response.json())

if model_name == "visual_changenet":
    job_map["evaluate_" + ds_format] = response.json()
else:
    job_map["evaluate_" + model_name] = response.json()
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name == "visual_changenet":
    job_id = job_map["evaluate_" + ds_format]
else:
    job_id = job_map["evaluate_" + model_name]
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    print(response)
    print(response.json())
    assert "status" in response.json().keys() and response.json().get("status") != "Error"
    if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### Prune, Retrain and Evaluation <a class="anchor" id="head-13"></a>

- We optimize the trained model by pruning and retraining in the following cells

#### Prune <a class="anchor" id="head-14"></a>

In [None]:
# Get default spec schema
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/prune/schema"

    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    #print(response.json()) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    prune_specs = response.json()["default"]
    print(json.dumps(prune_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
# None for prune
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    print(json.dumps(prune_specs, sort_keys=True, indent=4))

In [None]:
# Run actions
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    parent = job_map["train_" + model_name]
    action = "prune"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":prune_specs})

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(response.json())

    job_map["prune_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell (prune)
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map["prune_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(response.json())
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
            break
        time.sleep(15)

#### Retrain <a class="anchor" id="head-15"></a>

In [None]:
# Get default spec schema
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/retrain/schema"

    response = requests.get(endpoint,headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    #print(response.json()) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    retrain_specs = response.json()["default"]
    print(json.dumps(retrain_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    if model_name == "bpnet":
        retrain_specs["num_epoch"] = 20
        retrain_specs["checkpoint_n_epoch"] = 5
        retrain_specs["validation_every_n_epoch"] = 5
        retrain_specs["finetuning_config"]["checkpoint_path"] = None
        retrain_specs["gpus"] = 1
    elif model_name == "ocdnet":
        retrain_specs["train"]["num_epochs"] = 30
        retrain_specs["train"]["checkpoint_interval"] = 5
        retrain_specs["train"]["validation_interval"] = 5
        retrain_specs["train"]["gpu_id"] = [0]
        retrain_specs["num_gpus"] = 1
    elif model_name == "ocrnet":
        retrain_specs["train"]["num_epochs"] = 20
        retrain_specs["train"]["checkpoint_interval"] = 5
        retrain_specs["train"]["validation_interval"] = 5
        retrain_specs["train"]["num_gpus"] = 1
    elif model_name == "pointpillars":
        retrain_specs["train"]["num_epochs"] = 80
        retrain_specs["gpus"] = 1
    print(json.dumps(retrain_specs, sort_keys=True, indent=4))

In [None]:
# Run actions
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    parent = job_map["prune_" + model_name]
    action = "retrain"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":retrain_specs})

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(response.json())

    job_map["retrain_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell (retrain)
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map["retrain_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(response.json())
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
            break
        time.sleep(15)

In [None]:
# Optional cancel job - for jobs that are pending/running (retrain)

# if model_name == "pointpillars":
#     job_id = job_map["retrain_" + model_name]
#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:cancel"

#     response = requests.post(endpoint, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(response.json())

In [None]:
# Optional delete job - for jobs that are error/done (retrain)

# if model_name == "pointpillars":
#     job_id = job_map["retrain_" + model_name]
#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

#     response = requests.delete(endpoint, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(response.json())

#### Evaluate after retrain <a class="anchor" id="head-15"></a>

In [None]:
# Get default spec schema
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/evaluate/schema"

    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    #print(response.json()) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    eval_retrain_specs = response.json()["default"]
    print(json.dumps(eval_retrain_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs if necessary
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    print(json.dumps(eval_retrain_specs, sort_keys=True, indent=4))

In [None]:
# Run actions
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    parent = job_map["retrain_" + model_name]
    action = "evaluate"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":eval_retrain_specs})

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(response.json())

    job_map["eval_retrain_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell (evaluate)
if model_name in ("bpnet", "ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map["eval_retrain_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(response.json())
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Export <a class="anchor" id="head-17"></a>

In [None]:
# Get default spec schema
endpoint = f"{base_url}/experiments/{experiment_id}/specs/export/schema"

response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)

print(response)
# print(response.json()) ## Uncomment for verbose schema
assert "default" in response.json().keys()
export_specs = response.json()["default"]
print(json.dumps(export_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the export_specs dictionary if necessary
if model_name == "action_recognition":
    export_specs["model"]["model_type"] = model_type
    export_specs["model"]["input_type"] = model_input_type
elif model_name == "bpnet":
    export_specs["data_type"] = "int8"
    export_specs["max_batch_size"] = 1
elif model_name == "lprnet":
    export_specs["data_type"] = "fp32"
elif model_name == "pose_classification":
    if model_type == "nvidia":
        export_specs["dataset"]["num_classes"] = 6
        export_specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        export_specs["dataset"]["num_classes"] = 5
        export_specs["model"]["graph_layout"] = "openpose"
elif model_name == "re_identification":
    export_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
    export_specs["export"]["input_height"] = 256 
    export_specs["export"]["input_width"] = 256 
    export_specs["task"] = 'segment'
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
    export_specs["export"]["input_height"] = 512 
    export_specs["export"]["input_width"] = 128
    export_specs["task"] = 'classify'
print(json.dumps(export_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name == "visual_changenet":
    parent = job_map["train_" + ds_format]
else:
    parent = job_map["train_" + model_name]
action = "export"
data = json.dumps({"parent_job_id":parent,"action":action,"specs":export_specs})

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(response.json())

if model_name == "visual_changenet":
    job_map["export_" + ds_format] = response.json()
else:
    job_map["export_" + model_name] = response.json()
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name == "visual_changenet":
    job_id = job_map["export_" + ds_format]
else:
    job_id = job_map["export_" + model_name]
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    print(response)
    print(response.json())
    assert "status" in response.json().keys() and response.json().get("status") != "Error"
    if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### TRT Engine generation using TAO-Deploy <a class="anchor" id="head-18"></a>

- Here, we use the exported model to convert to target platform

In [None]:
# Get default spec schema
if model_name in ("bpnet","lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    if model_name == "bpnet":
        engine_generation_action = "trtexec"
    else:
        engine_generation_action = "gen_trt_engine"
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/{engine_generation_action}/schema"

    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    #print(response.json()) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    tao_deploy_specs = response.json()["default"]
    print(json.dumps(tao_deploy_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
if model_name in ("bpnet","lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    if model_name == "lprnet":
        tao_deploy_specs["data_type"] = "fp32"
    elif model_name in ("ml_recog", "ocdnet"):
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "int8"
    elif model_name in ("ocrnet", "optical_inspection"):
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "fp16"
    elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
        tao_deploy_specs["gen_trt_engine"]["input_height"] = 512 
        tao_deploy_specs["gen_trt_engine"]["input_width"] = 128
        tao_deploy_specs["task"] = 'classify'
    elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "fp16"
        tao_deploy_specs["gen_trt_engine"]["input_height"] = 256
        tao_deploy_specs["gen_trt_engine"]["input_width"]= 256
        tao_deploy_specs["task"] = 'segment'
    print(json.dumps(tao_deploy_specs, sort_keys=True, indent=4))        

In [None]:
# Run action
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        parent = job_map["export_" + ds_format]
    else:
        parent = job_map["export_" + model_name]
    action = engine_generation_action
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":tao_deploy_specs})

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(response.json())

    if model_name == "visual_changenet":
        job_map["gen_trt_engine_" + ds_format] = response.json()
    else:
        job_map["gen_trt_engine_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        job_id = job_map["gen_trt_engine_" + ds_format]
    else:
        job_id = job_map["gen_trt_engine_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:    
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(response.json())
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### TAO inference <a class="anchor" id="head-19"></a>

- Run inference on a set of images using the .tlt model created at train step

In [None]:
# Get default spec schema
endpoint = f"{base_url}/experiments/{experiment_id}/specs/inference/schema"

response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)

print(response)
# print(response.json()) ## Uncomment for verbose schema
assert "default" in response.json().keys()
tao_inference_specs = response.json()["default"]
print(json.dumps(tao_inference_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the tao_inference_specs dictionary if necessary
if model_name == "action_recognition":
    tao_inference_specs["model"]["model_type"] = model_type
    tao_inference_specs["model"]["input_type"] = model_input_type
elif model_name == "fpenet":
    tao_inference_specs["num_keypoints"] = int(model_type)
    tao_inference_specs["dataloader"]["num_keypoints"] = int(model_type)
elif model_name == "pose_classification":
    if model_type == "nvidia":
        tao_inference_specs["dataset"]["num_classes"] = 6
        tao_inference_specs["model"]["graph_layout"] = "nvidia"
    elif model_type == "kinetics":
        tao_inference_specs["dataset"]["num_classes"] = 5
        tao_inference_specs["model"]["graph_layout"] = "openpose"
elif model_name == "re_identification":
    tao_inference_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
    tao_inference_specs["inference"]["batch_size"] = tao_inference_specs["dataset"]["classify"]['batch_size'] 
    tao_inference_specs["task"] = 'classify'
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
    tao_inference_specs["inference"]["batch_size"] = tao_inference_specs["dataset"]["segment"]['batch_size'] 
    tao_inference_specs["task"] = 'segment'
elif model_name == "centerpose":
    tao_inference_specs["dataset"]["category"] = testing_categories
print(json.dumps(tao_inference_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name == "visual_changenet":
    parent = job_map["train_" + ds_format]
else:
    parent = job_map["train_" + model_name]
action = "inference"
data = json.dumps({"parent_job_id":parent,"action":action,"specs":tao_inference_specs})

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(response.json())

if model_name == "visual_changenet":
    job_map["inference_tlt_" + ds_format] = response.json()
else:
    job_map["inference_tlt_" + model_name] = response.json()
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name == "visual_changenet":
    job_id = job_map["inference_tlt_" + ds_format]
else:
    job_id = job_map["inference_tlt_" + model_name]
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    print(response)
    print(response.json())
    assert "status" in response.json().keys() and response.json().get("status") != "Error"
    if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
# Download job contents once the above job shows "Done" status
if download_jobs:
    if model_name == "visual_changenet":
        job_id = job_map["inference_tlt_" + ds_format]
    else:
        job_id = job_map["inference_tlt_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    expected_file_size = response.json().get("job_tar_stats", {}).get("file_size")
    print("expected_file_size: ", expected_file_size)

    !python3 -m pip install tqdm
    from tqdm import tqdm

    endpoint = f'{base_url}/experiments/{experiment_id}/jobs/{job_id}:download'
    temptar = f'{job_id}.tar.gz'

    with tqdm(total=expected_file_size, unit='B', unit_scale=True) as progress_bar:
        while True:
            # Check if the file already exists
            headers_download_job = dict(headers)
            if os.path.exists(temptar):
                # Get the current file size
                file_size = os.path.getsize(temptar)
                print(f"File size of dowloaded content until now is {file_size}")

                # If the file size matches the expected size, break out of the loop
                if file_size >= (expected_file_size-1):
                    print("Download completed successfully.")
                    print("Untarring")
                    # Untar to destination
                    tar_command = f'tar -xf {temptar} -C {workdir}/'
                    os.system(tar_command)
                    os.remove(temptar)
                    print(f"Results at {workdir}/{job_id}")
                    inference_out_path = f"{workdir}/{job_id}"
                    break

                # Set the headers to resume the download from where it left off
                headers_download_job['Range'] = f'bytes={file_size}-'
            # Open the file for writing in binary mode
            with open(temptar, 'ab') as f:
                try:
                    response = requests.get(endpoint, headers=headers_download_job, stream=True)
                    print(response)
                    # Check if the request was successful
                    if response.status_code in [200, 206]:
                        # Iterate over the content in chunks
                        for chunk in response.iter_content(chunk_size=1024):
                            if chunk:
                                # Write the chunk to the file
                                f.write(chunk)
                                # Flush and sync the file to disk
                                f.flush()
                                os.fsync(f.fileno())
                            progress_bar.update(len(chunk))
                    else:
                        print(f"Failed to download file. Status code: {response.status_code}")
                except requests.exceptions.RequestException as e:
                    print("Connection interrupted during download, resuming download from breaking point")
                    time.sleep(5)  # Sleep for a while before retrying the request
                    continue  # Continue the loop to retry the request

In [None]:
# Inference output must be here
if download_jobs:
    if model_name in ("action_recognition","lprnet","ocrnet"):
        assert os.path.exists(f'{inference_out_path}/logs_from_toolkit.txt')
        !cat {inference_out_path}/logs_from_toolkit.txt
    elif model_name in ("bpnet","pointpillars", "centerpose"):
        if model_name == "bpnet":
            assert glob.glob(f"{inference_out_path}/images_annotated/*.png")
        elif model_name == "pointpillars":
            assert glob.glob(f"{inference_out_path}/infer/detected_boxes/*.png")
        elif model_name == "centerpose":
            assert glob.glob(f"{inference_out_path}/inference/*.png")
        !python3 -m pip install matplotlib
        import matplotlib.pyplot as plt
        import matplotlib.image as mpimg
        if model_name == "bpnet":
            sample_image = glob.glob(f"{inference_out_path}/images_annotated/*.png")[0]
        elif model_name == "pointpillars":
            sample_image = glob.glob(f"{inference_out_path}/infer/detected_boxes/*.png")[0]
        elif model_name == "centerpose":
            sample_image = glob.glob(f"{inference_out_path}/inference/*.png")[0]
        def display_photo(path):
            img = mpimg.imread(path)
            plt.figure(figsize = (int(img.shape[0]/100)*2,int(img.shape[1]/100)*2))
            plt.axis('off')
            imgplot = plt.imshow(img, aspect='auto')
            plt.show()
        display_photo(sample_image)
    elif model_name == "fpenet":
        assert os.path.exists(f'{inference_out_path}/result.txt')
        !cat {inference_out_path}/result.txt
    elif model_name == "ml_recog":
        assert os.path.exists(f'{inference_out_path}/inference/result.csv')
        !cat {inference_out_path}/inference/result.csv
    elif model_name == "optical_inspection":
        assert os.path.exists(f'{inference_out_path}/inference/inference.csv')
        !cat {inference_out_path}/inference/inference.csv
    elif model_name == "pose_classification":
        assert os.path.exists(f'{inference_out_path}/results.txt')
        !cat {inference_out_path}/results.txt
    elif model_name == "re_identification":
        assert os.path.exists(f'{inference_out_path}/inference.json')
        !cat {inference_out_path}/inference.json
    elif model_name == "visual_changenet":
        if ds_format == 'visual_changenet_classify':
            assert os.path.exists(f'{inference_out_path}/inference/inference.csv')
            !cat {inference_out_path}/inference/inference.csv
        elif ds_format == 'visual_changenet_segment':
            assert os.path.exists(f'{inference_out_path}/inference/status.json')
            !cat {inference_out_path}/inference/status.json

### TRT inference <a class="anchor" id="head-20"></a>

- no need to change the specs since we already uploaded it at the tlt inference step

In [None]:
# Get default spec schema
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/inference/schema"
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    # print(response.json()) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    trt_inference_specs = response.json()["default"]
    print(json.dumps(trt_inference_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the specs dictionary if necessary
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
        trt_inference_specs["inference"]["batch_size"] = trt_inference_specs["dataset"]["classify"]['batch_size']
        trt_inference_specs["task"] = 'classify'
    elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
        trt_inference_specs["inference"]["batch_size"] = trt_inference_specs["dataset"]["segment"]['batch_size']
        trt_inference_specs["task"] = 'segment'
    print(json.dumps(trt_inference_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        parent = job_map["gen_trt_engine_" + ds_format]
    else:
        parent = job_map["gen_trt_engine_" + model_name]
    action = "inference"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":trt_inference_specs})

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(response.json())

    if model_name == "visual_changenet":
        job_map["inference_trt_" + ds_format] = response.json()
    else:
        job_map["inference_trt_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        job_id = job_map["inference_trt_" + ds_format]
    else:
        job_id = job_map["inference_trt_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:    
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(response.json())
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled"] or response.status_code not in (200,201):
            break
        time.sleep(15)

In [None]:
# Download job contents once the above job shows "Done" status
if download_jobs:
    if model_name in ("bpnet", "lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
        if model_name == "visual_changenet":
            job_id = job_map["inference_trt_" + ds_format]
        else:
            job_id = job_map["inference_trt_" + model_name]
        endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        expected_file_size = response.json().get("job_tar_stats", {}).get("file_size")
        print("expected_file_size: ", expected_file_size)

        !python3 -m pip install tqdm
        from tqdm import tqdm

        endpoint = f'{base_url}/experiments/{experiment_id}/jobs/{job_id}:download'
        temptar = f'{job_id}.tar.gz'

        with tqdm(total=expected_file_size, unit='B', unit_scale=True) as progress_bar:
            while True:
                # Check if the file already exists
                headers_download_job = dict(headers)
                if os.path.exists(temptar):
                    # Get the current file size
                    file_size = os.path.getsize(temptar)
                    print(f"File size of dowloaded content until now is {file_size}")

                    # If the file size matches the expected size, break out of the loop
                    if file_size >= (expected_file_size-1):
                        print("Download completed successfully.")
                        print("Untarring")
                        # Untar to destination
                        tar_command = f'tar -xf {temptar} -C {workdir}/'
                        os.system(tar_command)
                        os.remove(temptar)
                        print(f"Results at {workdir}/{job_id}")
                        inference_out_path = f"{workdir}/{job_id}"
                        break

                    # Set the headers to resume the download from where it left off
                    headers_download_job['Range'] = f'bytes={file_size}-'
                # Open the file for writing in binary mode
                with open(temptar, 'ab') as f:
                    try:
                        response = requests.get(endpoint, headers=headers_download_job, stream=True)
                        print(response)
                        # Check if the request was successful
                        if response.status_code in [200, 206]:
                            # Iterate over the content in chunks
                            for chunk in response.iter_content(chunk_size=1024):
                                if chunk:
                                    # Write the chunk to the file
                                    f.write(chunk)
                                    # Flush and sync the file to disk
                                    f.flush()
                                    os.fsync(f.fileno())
                                progress_bar.update(len(chunk))
                        else:
                            print(f"Failed to download file. Status code: {response.status_code}")
                    except requests.exceptions.RequestException as e:
                        print("Connection interrupted during download, resuming download from breaking point")
                        time.sleep(5)  # Sleep for a while before retrying the request
                        continue  # Continue the loop to retry the request

In [None]:
# Inference output must be here
if download_jobs:
    if model_name in ("bpnet","lprnet", "ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
        !ls {inference_out_path}/

### Delete model <a class="anchor" id="head-21"></a>

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}"

response = requests.delete(endpoint,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(response.json())

### Delete dataset <a class="anchor" id="head-21"></a>

#### Delete train dataset <a class="anchor" id="head-21"></a>

In [None]:
endpoint = f"{base_url}/datasets/{train_dataset_id}"

response = requests.delete(endpoint,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(response.json())

#### Delete val dataset <a class="anchor" id="head-21"></a>

In [None]:
if model_name in ("lprnet", "ocdnet", "ocrnet", "optical_inspection") or ds_format == 'visual_changenet_classify':
    endpoint = f"{base_url}/datasets/{eval_dataset_id}"

    response = requests.delete(endpoint,headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(response.json())

#### Delete test dataset <a class="anchor" id="head-21"></a>

In [None]:
if model_name in ("lprnet", "optical_inspection") or ds_format == 'visual_changenet_classify':
    endpoint = f"{base_url}/datasets/{test_dataset_id}"

    response = requests.delete(endpoint,headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(response.json())