### Notebook to demonstrate TAO workflow on purpose built models

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)

### The workflow in a nutshell

- Pulling datasets from cloud
- Running dataset convert (for specific models)
- Getting a PTM from NGC
- Model Actions
    - Train (Normal/AutoML)
    - Evaluate
    - Prune, retrain (for specific models)
    - Export
    - TAO-Deploy (for specific models)
    - Inference on TAO, TRT
    - Delete experiments/dataset
    
### Table of contents

1. [FIXME's](#head-1)
1. [Login](#head-2)
1. [Create a cloud workspace](#head-2)
1. [Set dataset formats](#head-3)
1. [Create and pull train dataset](#head-4)
1. [Create and pull val dataset](#head-5)
1. [Create and pull test dataset](#head-6)
1. [List the created datasets](#head-7)
1. [Train Dataset convert action](#head-8) (for specific models)
1. [Val dataset convert action](#head-9) (for specific models)
1. [Create an experiment](#head-10)
1. [List experiments](#head-11)
1. [Assign train, eval datasets](#head-12)
1. [Assign PTM](#head-13)
1. [Set AutoML related configurations](#head-14)
1. [Actions](#head-15)
1. [Train](#head-16)
1. [View hyperparameters that are enabled by default](#head-16.1)
1. [Evaluate](#head-17)
1. [Optimize: Prune, retrain and evaluate](#head-18) (for specific models)
1. [Export](#head-19)
1. [TRT Engine generation using TAO-Deploy](#head-20) (for specific models)
1. [TAO inference](#head-21)
1. [TRT inference](#head-22) (for specific models)
1. [Delete experiment](#head-23)
1. [Delete dataset](#head-24)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import json
import os
import requests
import time
from IPython.display import clear_output
import glob
from remove_corrupted_images import remove_corrupted_images_workflow

### To see the dataset folder structure required for the models supported in this notebook, visit the notebooks under dataset_prepare like for [this notebook](../dataset_prepare/purpose_built_models.ipynb)

### FIXME's <a class="anchor" id="head-1"></a>

1. Assign a model_name in FIXME 1

    1.1 Assign model type for action_recognition/pose_classification in FIXME 1.1
    
    1.2 Assign model input type for action_recognition in FIXME 1.2
1. (Optional) Enable AutoML if needed in FIXME 2
1. (Optional) Choose between bayesian and hyperband automl_algorithm in FIXME 3 (If automl was enabled in FIXME2)
1. Assign the ip_address and port_number in FIXME 4 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
1. Assign the ngc_key variable in FIXME 5
1. Assign the ngc_org_name variable in FIXME 6
1. Set cloud storage details in FIXME 7
1. Assign path of datasets relative to the bucket in FIXME 8

#### Choose a purpose built model

In [None]:
# Define model_name workspaces and other variables
# Available models (#FIXME 1):
# 1. action_recognition - https://docs.nvidia.com/tao/tao-toolkit/text/action_recognition_net.html
# 2. bevfusion - https://docs.nvidia.com/tao/tao-toolkit/text/bevfusion/index.html
# 2. ml_recog - https://docs.nvidia.com/tao/tao-toolkit/text/ml_recog/index.html
# 3. ocdnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocdnet/index.html
# 4. ocrnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocrnet/index.html
# 5. optical_inspection - https://docs.nvidia.com/tao/tao-toolkit/text/optical_inspection/index.html
# 6. pose_classification - https://docs.nvidia.com/tao/tao-toolkit/text/pose_classification/index.html
# 7. pointpillars - https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html
# 8. re_identification - https://docs.nvidia.com/tao/tao-toolkit/text/re_identification/index.html
# 9. centerpose - https://docs.nvidia.com/tao/tao-toolkit/text/centerpose/index.html
# 10. visual_changenet_classify - https://docs.nvidia.com/tao/tao-toolkit/text/visual_changenet/index.html
# 11. visual_changenet_segment - https://docs.nvidia.com/tao/tao-toolkit/text/visual_changenet/index.html

model_name = "action_recognition" # FIXME1 (Add the model name from the above mentioned list)

In [None]:
if model_name in ("action_recognition","pose_classification"):
    # FIXME1.1 - model_type - string
        # action-recognition: rgb/of/joint;
        # pose-classification: kinetics/nvidia
    model_type = "rgb"

    if model_name == "action_recognition":
        if model_type not in ("rgb","of","joint"):
            raise Exception("Choose one of rgb/of/joint for action recognition model_type")
    elif model_name == "pose_classification":
        if model_type not in ("kinetics","nvidia"):
            raise Exception("Choose one of kinetics/nvidia for pose classification model_type")

    if model_name == "action_recognition":
        model_input_type = "3d" # FIXME1.2 3d/2d

#### Toggle AutoML params
[AutoML documentation](https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#getting-started)

In [None]:
automl_enabled = False # FIXME2 set to True if you want to run automl for the model chosen in the previous cell
automl_algorithm = "bayesian" # FIXME3 example: bayesian/hyperband

#### Set API service's host information

In [None]:
host_url = "http://<ip_address>:<port_number>" # FIXME4 example: https://10.137.149.22:32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service tao-api-ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'

#### Set NGC Personal key for authentication and NGC org to access API services

In [None]:
ngc_key = "<ngc_key>" # FIXME5 example: (Add NGC Personal key)

In [None]:
ngc_org_name = "ea-tlt" # FIXME6 your NGC ORG

### Login <a class="anchor" id="head-2"></a>

In [None]:
# Validate NGC_PERSONAL_KEY
data = json.dumps({"ngc_org_name": ngc_org_name,
                   "ngc_key": ngc_key,
                   "enable_telemetry": True})
response = requests.post(f"{host_url}/api/v1/login", data=data)
assert response.status_code in (200, 201)
assert "token" in response.json().keys()
token = response.json()["token"]
print("JWT",token)

# Set base URL
base_url = f"{host_url}/api/v1/orgs/{ngc_org_name}"
print("API Calls will be forwarded to",base_url)

headers = {"Authorization": f"Bearer {token}"}

### Get NVCF gpu details <a class="anchor" id="head-2"></a>

 One of the keys of the response json are to be used as platform_id when you run each job

In [None]:
# # Valid only for NVCF backend during TAO-API helm deployment currently
# endpoint = f"{base_url}:gpu_types"
# response = requests.get(endpoint, headers=headers)

# assert response.ok
# print(response)
# print((json.dumps(response.json(), indent=4)))

### Create cloud workspace
This workspace will be the place where your datasets reside and your results of TAO API jobs will be pushed to.

If you want to have different workspaces for dataset and experiment, duplocate the workspace creation part and adjust the metadata accordingly.

In [None]:
#FIXME7 Dataset Cloud bucket details to download dataset for experiments (Can be read only)
cloud_metadata = {}
cloud_metadata["name"] = "AWS workspace info"  # A Representative name for this cloud info
cloud_metadata["cloud_type"] = "aws"  # If it's AWS, HuggingFace or Azure
cloud_metadata["cloud_specific_details"] = {}
cloud_metadata["cloud_specific_details"]["cloud_region"] = "us-west-1"  # Bucket region
cloud_metadata["cloud_specific_details"]["cloud_bucket_name"] = ""  # Bucket name
# Access and Secret for AWS
cloud_metadata["cloud_specific_details"]["access_key"] = ""
cloud_metadata["cloud_specific_details"]["secret_key"] = ""

In [None]:
# Create cloud workspace
data = json.dumps(cloud_metadata)

endpoint = f"{base_url}/workspaces"

response = requests.post(endpoint,data=data,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(json.dumps(response.json(), indent=4))

assert "id" in response.json().keys()
workspace_id = response.json()["id"]

#### Set dataset path (path within cloud bucket)

In [None]:
# FIXME8 : Set paths relative to cloud bucket
train_dataset_path = f"/data/purpose_built_models_{model_name}_train"
eval_dataset_path = f"/data/purpose_built_models_{model_name}_val"  # ocdnet, ocrnet, optical_inspection, visual_changenet_classify
test_dataset_path = f"/data/purpose_built_models_{model_name}_test"  # optical_inspection, visual_changenet_classify

### Set dataset formats <a class="anchor" id="head-3"></a>

In [None]:
if model_name in ("visual_changenet_classify", "visual_changenet_segment"):
    ds_format = model_name
    ds_type = model_name = "visual_changenet"
else:
    ds_type = model_name
    ds_format = "default"

### Create and pull train dataset <a class="anchor" id="head-4"></a>

In [None]:
# Create train dataset
train_dataset_metadata = {"type":ds_type,
                          "format":ds_format,
                          "workspace":workspace_id,
                          "cloud_file_path": train_dataset_path,
                          "use_for": ["training"]
                          }

data = json.dumps(train_dataset_metadata)
endpoint = f"{base_url}/datasets"
response = requests.post(endpoint,data=data,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(json.dumps(response.json(), indent=4))
assert "id" in response.json().keys()
train_dataset_id = response.json()["id"]

In [None]:
# Check progress
endpoint = f"{base_url}/datasets/{train_dataset_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(json.dumps(response.json(), indent=4))
    if response.json().get("status") == "invalid_pull":
        raise ValueError("Dataset pull failed")
    if response.json().get("status") == "pull_complete":
        break
    time.sleep(5)

#### Uncomment if you want to remove corrupted images in your dataset

In [None]:
# # This packages data-services experiments create and running the job of removing corrupted images
# try:
#     from remove_corrupted_images import remove_corrupted_images_workflow
#     train_dataset_id = remove_corrupted_images_workflow(base_url, headers, workspace_id, train_dataset_id)
# except Exception as e:
#     raise e

### Create and pull val dataset <a class="anchor" id="head-5"></a>

In [None]:
# Create eval dataset
if model_name in ("ocdnet", "ocrnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    eval_dataset_metadata = {"type":ds_type,
                             "format":ds_format,
                             "workspace":workspace_id,
                             "cloud_file_path": eval_dataset_path,
                             "use_for": ["evaluation"]
                            }

    data = json.dumps(eval_dataset_metadata)
    endpoint = f"{base_url}/datasets"
    response = requests.post(endpoint,data=data,headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(json.dumps(response.json(), indent=4))
    assert "id" in response.json().keys()
    eval_dataset_id = response.json()["id"]

In [None]:
# Check progress
if model_name in ("ocdnet", "ocrnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
    endpoint = f"{base_url}/datasets/{eval_dataset_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)

        print(response)
        print(json.dumps(response.json(), indent=4))
        if response.json().get("status") == "invalid_pull":
            raise ValueError("Dataset pull failed")
        if response.json().get("status") == "pull_complete":
            break
        time.sleep(5)

#### Uncomment if you want to remove corrupted images in your dataset

In [None]:
# # This packages data-services experiments create and running the job of removing corrupted images
# if model_name in ("ocdnet", "ocrnet", "optical_inspection") or ds_format in ("visual_changenet_classify"):
#     try:
#         from remove_corrupted_images import remove_corrupted_images_workflow
#         eval_dataset_id = remove_corrupted_images_workflow(base_url, headers, workspace_id, eval_dataset_id)
#     except Exception as e:
#         raise e

### Create and pull test dataset <a class="anchor" id="head-6"></a>

In [None]:
# Create testing dataset for inference
if model_name == "optical_inspection" or ds_format in ("visual_changenet_classify"):
    if ds_format == "visual_changenet_classify": 
        ds_type = "visual_changenet"
        ds_format = "visual_changenet_classify"
    else:
        ds_type = model_name
        ds_format = "default"

    test_dataset_metadata = {"type":ds_type,
                             "format":ds_format,
                             "workspace":workspace_id,
                             "cloud_file_path": test_dataset_path,
                             "use_for": ["testing"]
                             }
    data = json.dumps(test_dataset_metadata)

    endpoint = f"{base_url}/datasets"

    response = requests.post(endpoint,data=data, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(json.dumps(response.json(), indent=4))
    assert "id" in response.json().keys()
    test_dataset_id = response.json()["id"]

In [None]:
# Check progress
if model_name == "optical_inspection" or ds_format in ("visual_changenet_classify"):
    endpoint = f"{base_url}/datasets/{test_dataset_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)

        print(response)
        print(json.dumps(response.json(), indent=4))
        if response.json().get("status") == "invalid_pull":
            raise ValueError("Dataset pull failed")
        if response.json().get("status") == "pull_complete":
            break
        time.sleep(5)

#### Uncomment if you want to remove corrupted images in your dataset

In [None]:
# # This packages data-services experiments create and running the job of removing corrupted images
# if model_name == "optical_inspection" or ds_format in ("visual_changenet_classify"):
#     try:
#         from remove_corrupted_images import remove_corrupted_images_workflow
#         test_dataset_id = remove_corrupted_images_workflow(base_url, headers, workspace_id, test_dataset_id)
#     except Exception as e:
#         raise e

### List the created datasets <a class="anchor" id="head-7"></a>

In [None]:
endpoint = f"{base_url}/datasets"

response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)

print(response)
# print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose list output
print("id\t\t\t\t\t type\t\t\t format\t\t name")
for rsp in response.json()["datasets"]:
    rsp_keys = rsp.keys()
    assert "id" in rsp_keys
    assert "type" in rsp_keys
    assert "format" in rsp_keys
    assert "name" in rsp_keys
    print(rsp["id"],"\t",rsp["type"],"\t",rsp["format"],"\t\t",rsp["name"])

In [None]:
job_map = {}

### Train Dataset convert Action <a class="anchor" id="head-8"></a>

In [None]:
convert_action = "dataset_convert"

In [None]:
if model_name in ("bevfusion", "ocrnet", "pointpillars"):
    # Get default spec schema
    endpoint = f"{base_url}/datasets/{train_dataset_id}/specs/{convert_action}/schema"

    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    # print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema

    assert "default" in response.json().keys()
    train_ds_convert_specs = response.json()["default"]

    print(json.dumps(train_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs dictionary if necessary
if model_name in ("bevfusion", "ocrnet", "pointpillars"):
    print(json.dumps(train_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("bevfusion", "ocrnet", "pointpillars"):
    parent = None
    action = convert_action
    data = json.dumps({"parent_job_id":parent,"action":action, "specs":train_ds_convert_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

    endpoint = f"{base_url}/datasets/{train_dataset_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()
    print(response)
    print(json.dumps(response.json(), indent=4))

    train_ds_convert_id = response.json()
    job_map["train_dataset_convert_"+model_name] = train_ds_convert_id

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("bevfusion", "ocrnet", "pointpillars"):
    job_id = train_ds_convert_id
    endpoint = f"{base_url}/datasets/{train_dataset_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True) 
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Eval Dataset convert Action <a class="anchor" id="head-9"></a>

In [None]:
if model_name == "ocrnet":
    # Get default spec schema
    endpoint = f"{base_url}/datasets/{eval_dataset_id}/specs/{convert_action}/schema"

    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    # print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema

    assert "default" in response.json().keys()
    eval_ds_convert_specs = response.json()["default"]

    print(json.dumps(eval_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs dictionary if necessary
if model_name == "ocrnet":
    print(json.dumps(eval_ds_convert_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name == "ocrnet":
    parent = job_map["train_dataset_convert_"+model_name]
    action = convert_action
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":eval_ds_convert_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

    endpoint = f"{base_url}/datasets/{eval_dataset_id}/jobs"
    
    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(json.dumps(response.json(), indent=4))

    eval_ds_convert_id = response.json()
    job_map["eval_dataset_convert_"+model_name] = eval_ds_convert_id

In [None]:
# Monitor job status by repeatedly running this cell
if model_name == "ocrnet":
    job_id = eval_ds_convert_id
    endpoint = f"{base_url}/datasets/{eval_dataset_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True) 
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Create an experiment <a class="anchor" id="head-10"></a>

In [None]:
if model_name in ("action_recognition", "centerpose", "pose_classification", "ml_recog", "ocrnet", "ocdnet", "optical_inspection", "re_identification", "visual_changenet"):
    encode_key = "nvidia_tao"
elif model_name == "pointpillars":
    encode_key = "tlt_encode"
else:
    encode_key = "nvidia_tlt"

checkpoint_choose_method = "best_model"
data = json.dumps({"network_arch":model_name,
                   "encryption_key":encode_key,
                   "checkpoint_choose_method":checkpoint_choose_method,
                   "workspace": workspace_id})

endpoint = f"{base_url}/experiments"
response = requests.post(endpoint,data=data,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(json.dumps(response.json(), indent=4))
assert "id" in response.json().keys()
experiment_id = response.json()["id"]

### List experiments <a class="anchor" id="head-11"></a>

In [None]:
endpoint = f"{base_url}/experiments"
params = {"network_arch": model_name}
response = requests.get(endpoint, params=params, headers=headers)
assert response.status_code in (200, 201)

print(response)
# print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose list output
print("model id\t\t\t     network architecture")
for rsp in response.json()["experiments"]:
    rsp_keys = rsp.keys()
    assert "id" in rsp_keys and "network_arch" in rsp_keys
    print(rsp["name"], rsp["id"], rsp["network_arch"])

### Assign train, eval datasets <a class="anchor" id="head-12"></a>

In [None]:
docker_env_vars = {} # Update any variables to be included while triggering Docker run-time like MLOPs variables 
dataset_information = {}
dataset_information["train_datasets"] = [train_dataset_id]
if model_name in ("ml_recog","ocdnet","ocrnet"):
    dataset_information["calibration_dataset"] = train_dataset_id
if model_name in ("ocdnet", "ocrnet", "optical_inspection"):
    dataset_information["eval_dataset"] = eval_dataset_id
if model_name == "optical_inspection":
    dataset_information["inference_dataset"] = test_dataset_id
if model_name in ("centerpose"):
    dataset_information["eval_dataset"] = train_dataset_id
    dataset_information["inference_dataset"] = train_dataset_id
if model_name in ("visual_changenet") and ds_format in ("visual_changenet_classify"):
    dataset_information["eval_dataset"] = eval_dataset_id
    dataset_information["inference_dataset"] = test_dataset_id

dataset_information["docker_env_vars"] = docker_env_vars

data = json.dumps(dataset_information)

endpoint = f"{base_url}/experiments/{experiment_id}"

response = requests.patch(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)

print(response)
print(json.dumps(response.json(), indent=4))

### Assign PTM <a class="anchor" id="head-13"></a>

Search for PTM on NGC for the Purpose built model chosen

In [None]:
# List all pretrained models for the chosen network architecture
endpoint = f"{base_url}/experiments:base"
params = {"network_arch": model_name}
response = requests.get(endpoint, params=params, headers=headers)
assert response.status_code in (200, 201)

response_json = response.json()["experiments"]

# Search for ptm with given ngc path
for rsp in response_json:
    rsp_keys = rsp.keys()
    if "encryption_key" not in rsp.keys():
        assert "name" in rsp_keys and "version" in rsp_keys and "ngc_path" in rsp_keys
        print(f'PTM Name: {rsp["name"]}; PTM version: {rsp["version"]}; NGC PATH: {rsp["ngc_path"]}')

In [None]:
# Assigning pretrained models to different purpose built models versions
# From the output of previous cell make the appropriate changes to this map if you want to change the default PTM backbone.
# Changing the default backbone here requires changing default spec/config during train/eval etc like for example
# If you are changing the ptm to resnet34, then you have to modify the config key num_layers if it exists to 34 manually
visual_changenet_ptm = "visual_changenet_segmentation_levircd:visual_changenet_levircd_trainable_v1.0" # For segmentation
if model_name == 'visual_changenet' and ds_format == 'visual_changenet_classify':
    visual_changenet_ptm = "visual_changenet_classification:visual_changenet_nvpcb_trainable_v1.0"
pretrained_map = {"action_recognition":"actionrecognitionnet:trainable_rgb_3d",
                  "bevfusion": "bevfusion:bevfusion_1.0",
                  "ml_recog": "retail_object_recognition:trainable_v1.0",
                  "ocdnet": "ocdnet:trainable_resnet18_v1.0",
                  "ocrnet": "ocrnet:trainable_v1.0",
                  "optical_inspection": "optical_inspection:trainable_v1.0",
                  "pointpillars":"pointpillarnet:trainable_v1.0",
                  "pose_classification":"poseclassificationnet:trainable_v1.0",
                  "re_identification":"reidentificationnet:trainable_v1.1",
                  "visual_changenet":visual_changenet_ptm,
                  "centerpose": "pretrained_fan_classification_nvimagenet:fan_small_hybrid_nvimagenet"}
if model_name == "action_recognition":
    if model_type == "of":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v2.0"
    elif model_type == "joint":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v1.0,actionrecognitionnet:trainable_v2.0"
        
no_ptm_models = set([])

In [None]:
if model_name not in no_ptm_models:
    # Get pretrained model
    endpoint = f"{base_url}/experiments:base"
    params = {"network_arch": model_name}
    response = requests.get(endpoint, params=params, headers=headers)
    assert response.status_code in (200, 201)

    response_json = response.json()["experiments"]
    ptm_model_names = pretrained_map[model_name].split(",")
    ptm = []

    # Search for ptm with given ngc path
    for ptm_model_name in ptm_model_names:
        ptm_id = None
        for rsp in response_json:
            rsp_keys = rsp.keys()
            assert "ngc_path" in rsp_keys
            if rsp["ngc_path"].endswith(ptm_model_name):
                assert "id" in rsp_keys
                ptm_id = rsp["id"]
                print("Metadata for model with requested NGC Path")
                print(rsp)
                break
        ptm.append(ptm_id)

In [None]:
if model_name not in no_ptm_models:
    ptm_information = {"base_experiment":ptm}
    data = json.dumps(ptm_information)
    endpoint = f"{base_url}/experiments/{experiment_id}"

    response = requests.patch(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(json.dumps(response.json(), indent=4))

### Actions <a class="anchor" id="head-15"></a>

For all actions:
1. Get default spec schema and derive the default values
2. Modify defaults if needed
3. Post spec dictionary to the service
4. Run model action
5. Monitor job using retrieve
6. Download results using job download endpoint (if needed)

### Train <a class="anchor" id="head-16"></a>

#### View hyperparameters that are enabled for AutoML by default <a class="anchor" id="head-14"></a>

In [None]:
if automl_enabled:
    # Get default spec schema
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/train/schema"
    while True:
        response = requests.get(endpoint, headers=headers)
        if response.status_code == 404:
            if "Base spec file download state is " in response.json()["error_desc"]:
                print("Base experiment spec file is being downloaded")
                time.sleep(2)
                continue
            else:
                break
        else:
            break
    assert response.status_code in (200, 201)
    assert "automl_default_parameters" in response.json().keys()
    automl_params = response.json()["automl_default_parameters"]
    print(json.dumps(automl_params, sort_keys=True, indent=4))

#### Set AutoML related configurations <a class="anchor" id="head-16.1"></a>
Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters:

[ActionRecognitionNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/action_recognition/action_recognition%20-%20train.csv), 
[MetricLearningRecognition](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ml_recog/ml_recog%20-%20train.csv), 
[OCDNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocdnet/ocdnet%20-%20train.csv), 
[OCRNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocrnet/ocrnet%20-%20train.csv), 
[OpticalInspection](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/optical_inspection/optical_inspection%20-%20train.csv), 
[Pointpillars](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pointpillars/pointpillars%20-%20train.csv), 
[PoseClassificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pose_classification/pose_classification%20-%20train.csv), 
[ReIdentificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/re_identification/re_identification%20-%20train.csv)

In [None]:
if automl_enabled:
    # Choose any metric that is present in the kpi dictionary present in the model's status.json. 
    # Example status.json for each model can be found in the respective section in NVIDIA TAO DOCS here: https://docs.nvidia.com/tao/tao-toolkit/text/model_zoo/cv_models/index.html
    metric = "kpi"

    #Refer to parameter list mentioned in the above links and add/remove any extra parameter in addition to the default enabled ones in automl_specs

    automl_information = {"automl_enabled": True,
                          "automl_algorithm": automl_algorithm,
                          "automl_max_recommendations": 20, # Only for bayesian
                          "automl_R": 27, # Only for hyperband
                          "automl_nu": 3, # Only for hyperband
                          "epoch_multiplier": 1, # Only for hyperband
                          # Warning: The parameters that are disabled are not tested by TAO, so there might be unexpected behaviour in overriding this
                          "override_automl_disabled_params": False,
                          "automl_hyperparameters": str(automl_params)}
    data = json.dumps({"metric":metric, "automl_settings": automl_information})

    endpoint = f"{base_url}/experiments/{experiment_id}"

    response = requests.patch(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    
    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))

In [None]:
# Get default spec schema
endpoint = f"{base_url}/experiments/{experiment_id}/specs/train/schema"
while True:
    response = requests.get(endpoint, headers=headers)
    if response.status_code == 404:
        if "Base spec file download state is " in response.json()["error_desc"]:
            print("Base experiment spec file is being downloaded")
            time.sleep(2)
            continue
        else:
            break
    else:
        break

assert response.status_code in (200, 201)
assert "default" in response.json().keys()

print(response)
# print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
train_specs = response.json()["default"]
print(json.dumps(train_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
train_specs["train"]["num_epochs"] = 30
train_specs["train"]["checkpoint_interval"] = 10
train_specs["train"]["validation_interval"] = 10
train_specs["train"]["num_gpus"] = 1
if model_name == "action_recognition":
    train_specs["model"]["model_type"] = model_type
    train_specs["model"]["input_type"] = model_input_type
    train_specs["dataset"]["batch_size"] = 2
    train_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "centerpose":
    train_specs["dataset"]["category"] = "bike"
elif model_name == "ocdnet":
    train_specs["dataset"]["train_dataset"]["loader"]["batch_size"] = 16
elif model_name == "ocrnet":
    train_specs["dataset"]["batch_size"] = 16
elif model_name == "pose_classification":
    if model_type == "nvidia":
        train_specs["dataset"]["num_classes"] = 6
        train_specs["model"]["graph_layout"] = "nvidia"
        train_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        train_specs["dataset"]["num_classes"] = 5
        train_specs["model"]["graph_layout"] = "openpose"
        train_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    train_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
    train_specs["dataset"]["num_workers"] = 4 #Modify the num_workers according to your hardware setup
    train_specs["dataset"]["batch_size"] = 16 #Modify the batch_size according to your hardware setup
elif model_name == "visual_changenet":
    if ds_format == "visual_changenet_segment":
        train_specs["task"] = 'segment'
    elif ds_format == "visual_changenet_classify":
        train_specs["task"] = 'classify'
print(json.dumps(train_specs, sort_keys=True, indent=4))

In [None]:
# Run action
parent = job_map.get("eval_dataset_convert_"+model_name, job_map.get("train_dataset_convert_"+model_name, None))
parent_id = train_dataset_id
if model_name == "ocrnet": # Only model with eval dataset convert on eval dataset
    parent_id = eval_dataset_id
action = "train"
data = json.dumps({"parent_job_id":parent,"action":action,"specs":train_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })
endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(json.dumps(response.json(), indent=4))

if model_name == "visual_changenet":
    job_map["train_" + ds_format] = response.json()
else:
    job_map["train_" + model_name] = response.json()
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
# For automl: Training times for different models benchmarked on 1 GPU V100 machine can be found here: https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#results-of-automl-experiments

if model_name == "visual_changenet":
    job_id = job_map["train_" + ds_format]
else:
    job_id = job_map["train_" + model_name]
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    if "error_desc" in response.json().keys() and response.json()["error_desc"] in ("Job trying to retrieve not found", "No AutoML run found"):
        print("Job is being created")
        time.sleep(5)
        continue
    print(response)
    print(json.dumps(response.json(), sort_keys=True, indent=4))
    assert response.status_code in (200, 201)
    assert "status" in response.json().keys() and response.json().get("status") != "Error"
    if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
## To Stop an AutoML JOB
#    1. Stop the 'Monitor job status by repeatedly running this cell' cell (the cell right before this cell) manually
#    2. Uncomment the snippet in the next cell and run the cell

In [None]:
# if automl_enabled:
#     if model_name == "visual_changenet":
#          job_id = job_map["train_" + ds_format]
#     else:
#         job_id = job_map["train_" + model_name]

#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:pause"

#     response = requests.post(endpoint, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(json.dumps(response.json(), indent=4))

In [None]:
## Resume AutoML

In [None]:
# # Uncomment the below snippet if you want to resume an already stopped AutoML job and then run the 'Monitor job status by repeatedly running this cell' cell above (4th cell above from this cell)
# if automl_enabled:
#     if model_name == "visual_changenet":
#          job_id = job_map["train_" + ds_format]
#     else:
#         job_id = job_map["train_" + model_name]
#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:resume"

#     data = json.dumps({"parent_job_id":parent,"specs":train_specs,
#                    "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
#                    })
#     response = requests.post(endpoint, data=data, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(json.dumps(response.json(), indent=4))

### Publish model

#### Edit the method of choosing checkpoint from list of train checkpoint files

In [None]:
# Get model handler parameters
endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.get(endpoint, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

model_parameters = response.json()
update_checkpoint_choosing = {}
update_checkpoint_choosing["checkpoint_choose_method"] = model_parameters["checkpoint_choose_method"]
update_checkpoint_choosing["checkpoint_epoch_number"] = model_parameters["checkpoint_epoch_number"]
print(update_checkpoint_choosing)

In [None]:
# Change the method by which checkpoint from the parent action is chosen, when parent action is a train/retrain action.
# Example for evaluate action below, can be applied in the same way for other actions too
update_checkpoint_choosing["checkpoint_choose_method"] = "latest_model" # Choose between best_model/latest_model/from_epoch_number
# If from_epoch_number is chosen then assign the epoch number to the dictionary key in the format 'from_epoch_number{train_job_id}'
# update_checkpoint_choosing["checkpoint_epoch_number"]["from_epoch_number_28a2754e-50ef-43a8-9733-98913776dd90"] = 3
data = json.dumps(update_checkpoint_choosing)

endpoint = f"{base_url}/experiments/{experiment_id}"

response = requests.patch(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)

print(response)
print(json.dumps(response.json(), sort_keys=True, indent=4))

#### Push model to private ngc team registry

In [None]:
if model_name == "visual_changenet":
    job_id = job_map["train_" + ds_format]
else:
    job_id = job_map["train_" + model_name]
data = json.dumps({"display_name": f"TAO {model_name}",
                   "description": f"Train {model_name}",
                   "team_name":"tao_ea"})

endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:publish_model"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(json.dumps(response.json(), indent=4))

#### Remove model from private ngc team registry

In [None]:
# endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:remove_published_model"
# params = {"team_name": "tao_ea"}
# response = requests.delete(endpoint, params=params, headers=headers)
# assert response.status_code in (200, 201)
# assert response.json()

# print(response)
# print(json.dumps(response.json(), indent=4))

### Evaluate <a class="anchor" id="head-17"></a>

In [None]:
# Get default spec schema
endpoint = f"{base_url}/experiments/{experiment_id}/specs/evaluate/schema"
while True:
    response = requests.get(endpoint, headers=headers)
    if response.status_code == 404:
        if "Base spec file download state is " in response.json()["error_desc"]:
            print("Base experiment spec file is being downloaded")
            time.sleep(2)
            continue
        else:
            break
    else:
        break

assert response.status_code in (200, 201)

print(response)
#print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
assert "default" in response.json().keys()
eval_specs = response.json()["default"]
print(json.dumps(eval_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
if model_name == "action_recognition":
    eval_specs["model"]["model_type"] = model_type
    eval_specs["model"]["input_type"] = model_input_type
    eval_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "pose_classification":
    if model_type == "nvidia":
        eval_specs["dataset"]["num_classes"] = 6
        eval_specs["model"]["graph_layout"] = "nvidia"
        eval_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        eval_specs["dataset"]["num_classes"] = 5
        eval_specs["model"]["graph_layout"] = "openpose"
        eval_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    eval_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
    eval_specs["task"] = 'segment'
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
    eval_specs["task"] = 'classify'
    eval_specs["train"]["classify"]["loss"] = "contrastive"
elif model_name == "centerpose":
    eval_specs["dataset"]["category"] = "bike"
print(json.dumps(eval_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name == "visual_changenet":
    parent = job_map["train_" + ds_format]
else:
    parent = job_map["train_" + model_name]
action = "evaluate"
data = json.dumps({"parent_job_id":parent,"action":action,"specs":eval_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(json.dumps(response.json(), indent=4))

if model_name == "visual_changenet":
    job_map["evaluate_" + ds_format] = response.json()
else:
    job_map["evaluate_" + model_name] = response.json()
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name == "visual_changenet":
    job_id = job_map["evaluate_" + ds_format]
else:
    job_id = job_map["evaluate_" + model_name]
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    print(response)
    print(json.dumps(response.json(), indent=4))
    assert "status" in response.json().keys() and response.json().get("status") != "Error"
    if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### Prune, Retrain and Evaluation <a class="anchor" id="head-18"></a>

- We optimize the trained model by pruning and retraining in the following cells

#### Prune

In [None]:
# Get default spec schema
if model_name in ("ocdnet", "ocrnet", "pointpillars"):

    endpoint = f"{base_url}/experiments/{experiment_id}/specs/prune/schema"
    while True:
        response = requests.get(endpoint, headers=headers)
        if response.status_code == 404:
            if "Base spec file download state is " in response.json()["error_desc"]:
                print("Base experiment spec file is being downloaded")
                time.sleep(2)
                continue
            else:
                break
        else:
            break

    assert response.status_code in (200, 201)

    print(response)
    #print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    prune_specs = response.json()["default"]
    print(json.dumps(prune_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
# None for prune
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    print(json.dumps(prune_specs, sort_keys=True, indent=4))

In [None]:
# Run actions
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    parent = job_map["train_" + model_name]
    action = "prune"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":prune_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(json.dumps(response.json(), indent=4))

    job_map["prune_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell (prune)
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map["prune_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

#### Retrain

In [None]:
# Get default spec schema
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/retrain/schema"
    while True:
        response = requests.get(endpoint, headers=headers)
        if response.status_code == 404:
            if "Base spec file download state is " in response.json()["error_desc"]:
                print("Base experiment spec file is being downloaded")
                time.sleep(2)
                continue
            else:
                break
        else:
            break

    assert response.status_code in (200, 201)

    print(response)
    #print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    retrain_specs = response.json()["default"]
    print(json.dumps(retrain_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    retrain_specs["train"]["num_epochs"] = 30
    retrain_specs["train"]["checkpoint_interval"] = 10
    retrain_specs["train"]["validation_interval"] = 10
    retrain_specs["train"]["num_gpus"] = 1
    if model_name == "ocdnet":
        retrain_specs["dataset"]["train_dataset"]["loader"]["batch_size"] = 16
    elif model_name == "ocrnet":
        retrain_specs["dataset"]["batch_size"] = 16
    print(json.dumps(retrain_specs, sort_keys=True, indent=4))

In [None]:
# Run actions
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    parent = job_map["prune_" + model_name]
    action = "retrain"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":retrain_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(json.dumps(response.json(), indent=4))

    job_map["retrain_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell (retrain)
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map["retrain_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

In [None]:
# Optional cancel job - for jobs that are pending/running (retrain)

# if model_name == "pointpillars":
#     job_id = job_map["retrain_" + model_name]
#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:pause"

#     response = requests.post(endpoint, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(json.dumps(response.json(), indent=4))

In [None]:
# Optional delete job - for jobs that are error/done (retrain)

# if model_name == "pointpillars":
#     job_id = job_map["retrain_" + model_name]
#     endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

#     response = requests.delete(endpoint, headers=headers)
#     assert response.status_code in (200, 201)

#     print(response)
#     print(json.dumps(response.json(), indent=4))

#### Evaluate after retrain

In [None]:
# Get default spec schema
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/evaluate/schema"
    while True:
        response = requests.get(endpoint, headers=headers)
        if response.status_code == 404:
            if "Base spec file download state is " in response.json()["error_desc"]:
                print("Base experiment spec file is being downloaded")
                time.sleep(2)
                continue
            else:
                break
        else:
            break

    assert response.status_code in (200, 201)

    print(response)
    #print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    eval_retrain_specs = response.json()["default"]
    print(json.dumps(eval_retrain_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to specs if necessary
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    print(json.dumps(eval_retrain_specs, sort_keys=True, indent=4))

In [None]:
# Run actions
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    parent = job_map["retrain_" + model_name]
    action = "evaluate"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":eval_retrain_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(json.dumps(response.json(), indent=4))

    job_map["eval_retrain_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell (evaluate)
if model_name in ("ocdnet", "ocrnet", "pointpillars"):
    job_id = job_map["eval_retrain_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Export <a class="anchor" id="head-19"></a>

In [None]:
if model_name != "bevfusion":
    # Get default spec schema
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/export/schema"
    while True:
        response = requests.get(endpoint, headers=headers)
        if response.status_code == 404:
            if "Base spec file download state is " in response.json()["error_desc"]:
                print("Base experiment spec file is being downloaded")
                time.sleep(2)
                continue
            else:
                break
        else:
            break
    assert response.status_code in (200, 201)

    print(response)
    # print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    export_specs = response.json()["default"]
    print(json.dumps(export_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the export_specs dictionary if necessary
if model_name == "action_recognition":
    export_specs["model"]["model_type"] = model_type
    export_specs["model"]["input_type"] = model_input_type
    export_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "pose_classification":
    if model_type == "nvidia":
        export_specs["dataset"]["num_classes"] = 6
        export_specs["model"]["graph_layout"] = "nvidia"
        export_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        export_specs["dataset"]["num_classes"] = 5
        export_specs["model"]["graph_layout"] = "openpose"
        export_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    export_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
    export_specs["export"]["input_height"] = 256 
    export_specs["export"]["input_width"] = 256 
    export_specs["task"] = 'segment'
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
    export_specs["export"]["input_height"] = 448 
    export_specs["export"]["input_width"] = 448
    export_specs["task"] = 'classify'
if model_name != "bevfusion":
    print(json.dumps(export_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name != "bevfusion":
    if model_name == "visual_changenet":
        parent = job_map["train_" + ds_format]
    else:
        parent = job_map["train_" + model_name]
    action = "export"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":export_specs,
                    #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                    })

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(json.dumps(response.json(), indent=4))

    if model_name == "visual_changenet":
        job_map["export_" + ds_format] = response.json()
    else:
        job_map["export_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name != "bevfusion":
    if model_name == "visual_changenet":
        job_id = job_map["export_" + ds_format]
    else:
        job_id = job_map["export_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### TRT Engine generation using TAO-Deploy <a class="anchor" id="head-20"></a>

- Here, we use the exported model to convert to target platform

In [None]:
# Get default spec schema
if model_name in ("ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/gen_trt_engine/schema"
    while True:
        response = requests.get(endpoint, headers=headers)
        if response.status_code == 404:
            if "Base spec file download state is " in response.json()["error_desc"]:
                print("Base experiment spec file is being downloaded")
                time.sleep(2)
                continue
            else:
                break
        else:
            break
    assert response.status_code in (200, 201)

    print(response)
    #print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    tao_deploy_specs = response.json()["default"]
    print(json.dumps(tao_deploy_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes
if model_name in ("ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    if model_name in ("ml_recog", "ocdnet"):
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "INT8"
    elif model_name in ("ocrnet", "optical_inspection"):
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "fp16"
    elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
        tao_deploy_specs["task"] = 'classify'
    elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "fp16"
        tao_deploy_specs["task"] = 'segment'
    print(json.dumps(tao_deploy_specs, sort_keys=True, indent=4))        

In [None]:
# Run action
if model_name in ("ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        parent = job_map["export_" + ds_format]
    else:
        parent = job_map["export_" + model_name]
    data = json.dumps({"parent_job_id":parent,"action":"gen_trt_engine","specs":tao_deploy_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(json.dumps(response.json(), indent=4))

    if model_name == "visual_changenet":
        job_map["gen_trt_engine_" + ds_format] = response.json()
    else:
        job_map["gen_trt_engine_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        job_id = job_map["gen_trt_engine_" + ds_format]
    else:
        job_id = job_map["gen_trt_engine_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:    
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### TAO inference <a class="anchor" id="head-21"></a>

- Run inference on a set of images using the .tlt model created at train step

In [None]:
# Get default spec schema
endpoint = f"{base_url}/experiments/{experiment_id}/specs/inference/schema"
while True:
    response = requests.get(endpoint, headers=headers)
    if response.status_code == 404:
        if "Base spec file download state is " in response.json()["error_desc"]:
            print("Base experiment spec file is being downloaded")
            time.sleep(2)
            continue
        else:
            break
    else:
        break
assert response.status_code in (200, 201)

print(response)
# print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
assert "default" in response.json().keys()
tao_inference_specs = response.json()["default"]
print(json.dumps(tao_inference_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the tao_inference_specs dictionary if necessary
if model_name == "action_recognition":
    tao_inference_specs["model"]["model_type"] = model_type
    tao_inference_specs["model"]["input_type"] = model_input_type
    tao_inference_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "pose_classification":
    if model_type == "nvidia":
        tao_inference_specs["dataset"]["num_classes"] = 6
        tao_inference_specs["model"]["graph_layout"] = "nvidia"
        tao_inference_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        tao_inference_specs["dataset"]["num_classes"] = 5
        tao_inference_specs["model"]["graph_layout"] = "openpose"
        tao_inference_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    tao_inference_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
    tao_inference_specs["inference"]["batch_size"] = tao_inference_specs["dataset"]["classify"]['batch_size'] 
    tao_inference_specs["task"] = 'classify'
elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
    tao_inference_specs["inference"]["batch_size"] = tao_inference_specs["dataset"]["segment"]['batch_size'] 
    tao_inference_specs["task"] = 'segment'
elif model_name == "centerpose":
    tao_inference_specs["dataset"]["category"] = "bike"
print(json.dumps(tao_inference_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name == "visual_changenet":
    parent = job_map["train_" + ds_format]
else:
    parent = job_map["train_" + model_name]
action = "inference"
data = json.dumps({"parent_job_id":parent,"action":action,"specs":tao_inference_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

response = requests.post(endpoint, data=data, headers=headers)
assert response.status_code in (200, 201)
assert response.json()

print(response)
print(json.dumps(response.json(), indent=4))

if model_name == "visual_changenet":
    job_map["inference_tao_" + ds_format] = response.json()
else:
    job_map["inference_tao_" + model_name] = response.json()
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name == "visual_changenet":
    job_id = job_map["inference_tao_" + ds_format]
else:
    job_id = job_map["inference_tao_" + model_name]
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    assert response.status_code in (200, 201)
    print(response)
    print(json.dumps(response.json(), indent=4))
    assert "status" in response.json().keys() and response.json().get("status") != "Error"
    if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### TRT inference <a class="anchor" id="head-22"></a>

- no need to change the specs since we already uploaded it at the tlt inference step

In [None]:
# Get default spec schema
if model_name in ("ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    endpoint = f"{base_url}/experiments/{experiment_id}/specs/inference/schema"
    while True:
        response = requests.get(endpoint, headers=headers)
        if response.status_code == 404:
            if "Base spec file download state is " in response.json()["error_desc"]:
                print("Base experiment spec file is being downloaded")
                time.sleep(2)
                continue
            else:
                break
        else:
            break
    assert response.status_code in (200, 201)

    print(response)
    # print(json.dumps(response.json(), indent=4)) ## Uncomment for verbose schema
    assert "default" in response.json().keys()
    trt_inference_specs = response.json()["default"]
    print(json.dumps(trt_inference_specs, sort_keys=True, indent=4))

In [None]:
# Apply changes to the specs dictionary if necessary
if model_name in ("ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet" and ds_format == 'visual_changenet_classify':
        trt_inference_specs["inference"]["batch_size"] = trt_inference_specs["dataset"]["classify"]['batch_size']
        trt_inference_specs["task"] = 'classify'
    elif model_name == "visual_changenet" and ds_format == 'visual_changenet_segment':
        trt_inference_specs["inference"]["batch_size"] = trt_inference_specs["dataset"]["segment"]['batch_size']
        trt_inference_specs["task"] = 'segment'
    print(json.dumps(trt_inference_specs, sort_keys=True, indent=4))

In [None]:
# Run action
if model_name in ("ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        parent = job_map["gen_trt_engine_" + ds_format]
    else:
        parent = job_map["gen_trt_engine_" + model_name]
    action = "inference"
    data = json.dumps({"parent_job_id":parent,"action":action,"specs":trt_inference_specs,
                  #  "platform_id": "9af1aa90-8ea5-5a11-98d9-3879cd0da92c",  # Pick a platform_from output of {base_url}:gpu_types depending on GPU_type and instance_type
                   })

    endpoint = f"{base_url}/experiments/{experiment_id}/jobs"

    response = requests.post(endpoint, data=data, headers=headers)
    assert response.status_code in (200, 201)
    assert response.json()

    print(response)
    print(json.dumps(response.json(), indent=4))

    if model_name == "visual_changenet":
        job_map["inference_trt_" + ds_format] = response.json()
    else:
        job_map["inference_trt_" + model_name] = response.json()
    print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
if model_name in ("ocdnet", "ocrnet", "ml_recog", "optical_inspection", "visual_changenet", "centerpose"):
    if model_name == "visual_changenet":
        job_id = job_map["inference_trt_" + ds_format]
    else:
        job_id = job_map["inference_trt_" + model_name]
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"

    while True:    
        clear_output(wait=True)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code in (200, 201)
        print(response)
        print(json.dumps(response.json(), indent=4))
        assert "status" in response.json().keys() and response.json().get("status") != "Error"
        if response.json().get("status") in ["Done","Error", "Canceled", "Paused"] or response.status_code not in (200,201):
            break
        time.sleep(15)

### Delete experiment <a class="anchor" id="head-23"></a>

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}"

response = requests.delete(endpoint,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(json.dumps(response.json(), indent=4))

### Delete dataset <a class="anchor" id="head-24"></a>

#### Delete train dataset

In [None]:
endpoint = f"{base_url}/datasets/{train_dataset_id}"

response = requests.delete(endpoint,headers=headers)
assert response.status_code in (200, 201)

print(response)
print(json.dumps(response.json(), indent=4))

#### Delete val dataset

In [None]:
if model_name in ("ocdnet", "ocrnet", "optical_inspection") or ds_format == 'visual_changenet_classify':
    endpoint = f"{base_url}/datasets/{eval_dataset_id}"

    response = requests.delete(endpoint,headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(json.dumps(response.json(), indent=4))

#### Delete test dataset <a class="anchor" id="head-21"></a>

In [None]:
if model_name in ("optical_inspection") or ds_format == 'visual_changenet_classify':
    endpoint = f"{base_url}/datasets/{test_dataset_id}"

    response = requests.delete(endpoint,headers=headers)
    assert response.status_code in (200, 201)

    print(response)
    print(json.dumps(response.json(), indent=4))