### TAO remote client - Purpose built models

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)


### The workflow in a nutshell

- Pulling datasets from cloud
- Running dataset convert (for specific models)
- Getting a PTM from NGC
- Model Actions
    - Train (Normal/AutoML)
    - Evaluate
    - Prune, retrain (for specific models)
    - Export
    - TAO-Deploy (for specific models)
    - Inference on TAO
    - Inference on TAO, TRT
    - Delete experiments/dataset

### Table of contents

1. [Install TAO remote client ](#head-1)
1. [FIXME's](#head-2)
1. [Login](#head-3)
1. [Create a cloud workspace](#head-2)
1. [Set dataset formats](#head-4)
1. [Create and pull train dataset](#head-5)
1. [Create and pull val dataset](#head-6)
1. [Create and pull test dataset](#head-7)
1. [List the created datasets](#head-8)
1. [Train Dataset convert action](#head-9) (for specific models)
1. [Val dataset convert action](#head-10) (for specific models)
1. [Create experiment (via create-job)](#head-11)
1. [List experiments](#head-12)
1. [Assign train, eval datasets](#head-13)
1. [Assign PTM](#head-14)
1. [Set AutoML related configurations](#head-15)
1. [Train](#head-16)
1. [View hyperparameters that are enabled by default](#head-16.1)
1. [Evaluate](#head-17)
1. [Optimize: Prune, retrain and evaluate](#head-18) (for specific models)
1. [Export](#head-19)
1. [TRT Engine generation using TAO-Deploy](#head-20) (for specific models)
1. [TAO inference](#head-21)
1. [TRT inference](#head-22) (for specific models)
1. [Delete experiment](#head-23)
1. [Delete dataset](#head-24)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

### Install TAO remote client <a class="anchor" id="head-1"></a>

In [None]:
# SKIP this step IF you have already installed the TAO-Client wheel.
! pip3 install nvidia-tao-client

In [None]:
# View the version of the TAO-Client
! tao --version

### Import python packages required for notebook

In [None]:
import os
import subprocess
import json
import time
from IPython.display import clear_output

In [None]:
# Restore variable in case of jupyter session restart and resume execution where it left off
%store -r model_name
%store -r automl_enabled
%store -r automl_algorithm
%store -r workspace_id
%store -r train_dataset_id
%store -r test_dataset_id
%store -r test_dataset_id
%store -r experiment_id
%store -r job_map

In [None]:
namespace = 'default'
job_map = {}

In [None]:
EXPLICIT_EVAL_DATASET_MODELS = ["ocdnet", "ocrnet", "optical_inspection", "visual_changenet_classify"]
EXPLICIT_TEST_DATASET_MODELS = ["optical_inspection", "visual_changenet_classify"]
TRAIN_REUSE_FOR_EVAL_TEST_MODELS = ["sparse4d"]
TRAIN_DATASET_CONVERT_MODELS = ["bevfusion", "ocrnet", "pointpillars", "sparse4d"]
EVAL_DATASET_CONVERT_MODELS = ["ocrnet"]
PRUNEABLE_MODELS = ["ocdnet", "ocrnet", "pointpillars"]
UN_EXPORTABLE_MODELS = ["bevfusion"]
TAO_DEPLOY_MODELS = ["ocdnet", "ocrnet", "optical_inspection", "ml_recog", "visual_changenet_classify", "visual_changenet_segment", "centerpose"]

### To see the dataset folder structure required for the models supported in this notebook, visit the notebooks under dataset_prepare like for [this notebook](../dataset_prepare/purpose_built_models.ipynb)

### FIXME's  <a class="anchor" id="head-2"></a>

1. Assign a model_name in FIXME 1

    1.1 Assign model type for action_recognition/pose_classification in FIXME 1.1

    1.2 Assign model input type for action_recognition in FIXME 1.2
1. (Optional) Enable AutoML if needed in FIXME 2
1. (Optional) Choose between bayesian and hyperband automl_algorithm in FIXME 3 (If automl was enabled in FIXME2)
1. Assign path of datasets relative to the bucket in FIXME 4
1. Assign the ip_address and port_number in FIXME 5 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
1. Assign the ngc_key variable in FIXME 6
1. Assign the ngc_org_name variable in FIXME 7
1. Assign a workdir in FIXME 8 for log file download
1. Set cloud storage details in FIXME 9
1. Database backup/restore archive filename in FIXME 10

#### Choose a purpose built model

In [None]:
# Define model_name workspaces and other variables
# Available models (#FIXME 1):
# 1. action_recognition - https://docs.nvidia.com/tao/tao-toolkit/text/action_recognition_net.html
# 2. bevfusion - https://docs.nvidia.com/tao/tao-toolkit/text/bevfusion/index.html
# 2. ml_recog - https://docs.nvidia.com/tao/tao-toolkit/text/ml_recog/index.html
# 3. ocdnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocdnet/index.html
# 4. ocrnet - https://docs.nvidia.com/tao/tao-toolkit/text/ocrnet/index.html
# 5. optical_inspection - https://docs.nvidia.com/tao/tao-toolkit/text/optical_inspection/index.html
# 6. pose_classification - https://docs.nvidia.com/tao/tao-toolkit/text/pose_classification/index.html
# 7. pointpillars - https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html
# 8. re_identification - https://docs.nvidia.com/tao/tao-toolkit/text/re_identification/index.html
# 9. sparse4d - https://docs.nvidia.com/tao/tao-toolkit/text/sparse4d/index.html
# 10. centerpose - https://docs.nvidia.com/tao/tao-toolkit/text/centerpose/index.html
# 11. visual_changenet_classify - https://docs.nvidia.com/tao/tao-toolkit/text/visual_changenet/index.html
# 12. visual_changenet_segment - https://docs.nvidia.com/tao/tao-toolkit/text/visual_changenet/index.html

# FIXME 1 (Add the model name from the above mentioned list)
os.environ["TAO_MODEL_NAME"] = model_name = os.environ.get("TAO_MODEL_NAME", "ocrnet")
%store model_name

In [None]:
if model_name in ("action_recognition","pose_classification"):
    # FIXME1.1 - model_type - string
        # action_recognition: rgb/of/joint;
        # pose_classification: kinetics/nvidia
    model_type = "rgb" # FIXME1.1 action_recognition: rgb/of/joint; pose_classification: kinetics/nvidia

    if model_name == "action_recognition":
        if model_type not in ("rgb","of","joint"):
            raise Exception("Choose one of rgb/of/joint for action recognition model_type")
    elif model_name == "pose_classification":
        if model_type not in ("kinetics","nvidia"):
            raise Exception("Choose one of kinetics/nvidia for pose classification model_type")

    if model_name == "action_recognition":
        model_input_type = "3d" # FIXME1.2 3d/2d

#### Toggle AutoML params
[AutoML documentation](https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#getting-started)

In [None]:
# FIXME 2: Set to True if you want to run automl for the model chosen in the previous cell
automl_enabled = os.environ.get("TAO_AUTOML_ENABLED", "False").lower() == "true"
os.environ["TAO_AUTOML_ENABLED"] = str(automl_enabled)
# FIXME 3: One of bayesian/hyperband
os.environ["TAO_AUTOML_ALGORITHM"] = automl_algorithm = os.environ.get("TAO_AUTOML_ALGORITHM", "bayesian")

%store automl_enabled
%store automl_algorithm


### Common Functions used across the notebook

#### Function to parse logs

In [None]:
def my_tail(model_name_cli, job_id):
	status = None
	while True:
		time.sleep(10)
		clear_output(wait=True)
		response = subprocess.getoutput(f"tao {model_name_cli} get-job-metadata --job-id {job_id}")
		response = json.loads(response)
		if response and "status" in response.keys() and response.get("status") in ("Done", "Error", "Canceled", "Paused"):
			print(json.dumps(response.get("job_details", {}), indent=4))
			status = response.get("status")
			assert status == "Done", f"Status is not Done, it is {status}"
			break

		logs = subprocess.getoutput(f"tao {model_name_cli} get-job-logs --job-id {job_id}")
		if not logs:
			continue
		log_content_lines = logs.split("\n")        
		for line in log_content_lines:
			print(line.strip())
			if line.strip() == "Error EOF":
				status = "Error"
				break
			elif line.strip() == "Done EOF":
				status = "Done"
				break
		if status is not None:
			break
	return status

#### Function to load login details from saved config

In [None]:
def load_tao_credentials_from_config():
    """Load TAO credentials from ~/.tao/config and set as environment variables"""
    from configparser import ConfigParser
    from pathlib import Path
    import os
    
    config_path = Path.home() / '.tao' / 'config'
    
    if not config_path.exists():
        print(f"Warning: Config file not found at {config_path}")
        print("Please run 'tao login' first")
        return False
    
    try:
        parser = ConfigParser()
        parser.read(config_path)
        
        # Read from [CURRENT] section
        if parser.has_section('CURRENT'):
            section = parser['CURRENT']
        else:
            print("Warning: No [CURRENT] section found in config file")
            return False
        
        # Set environment variables
        if 'tao_base_url' in section:
            os.environ['TAO_BASE_URL'] = section['tao_base_url']
            print(f"✓ TAO_BASE_URL set to: {section['tao_base_url']}")
        
        if 'tao_org' in section:
            os.environ['TAO_ORG'] = section['tao_org']
            print(f"✓ TAO_ORG set to: {section['tao_org']}")
        
        if 'tao_token' in section:
            os.environ['TAO_TOKEN'] = section['tao_token']
            print(f"✓ TAO_TOKEN set (expires: check token if auth fails)")
        
        return True
        
    except Exception as e:
        print(f"Error reading config file: {e}")
        return False

#### Set API service's host information

In [None]:
# FIXME 4: Set TAO API environment variables

# Set to your TAO API endpoint
os.environ["TAO_BASE_URL"] = os.environ.get("TAO_BASE_URL", "https://your_tao_ip_address:port/api/v2")

#### Set NGC Personal key for authentication and NGC org to access API services

In [None]:
os.environ["NGC_KEY"] = ngc_key = os.environ.get("NGC_KEY", "your_ngc_key")  # FIXME6 example: (Add NGC Personal key)

In [None]:
os.environ["NGC_ORG"] = ngc_org_name = os.environ.get("NGC_ORG", "nvstaging")  # FIXME7 your NGC ORG

### Login <a class="anchor" id="head-3"></a>

In [None]:
# Exchange NGC_API_KEY for JWT
! tao login --ngc-org-name {ngc_org_name} --ngc-key {ngc_key} --enable-telemetry

# Load credentials when this cell runs
load_tao_credentials_from_config()

### Get NVCF gpu details <a class="anchor" id="head-2"></a>

 One of the keys of the response json are to be used as platform_id when you run each job

In [None]:
# # Valid only for NVCF backend during TAO-API helm deployment currently
# # response = json.loads(subprocess.getoutput(f'tao get-gpu-types'))
# print((json.dumps(response, indent=4)))

### Create cloud workspace
This workspace will be the place where your datasets reside and your results of TAO API jobs will be pushed to.

If you want to have different workspaces for dataset and experiment, duplocate the workspace creation part and adjust the metadata accordingly.

In [None]:
# FIXME 7: Dataset Cloud bucket details to download dataset or push job artifacts for jobs

cloud_metadata = {}

# A Representative name for this cloud info
os.environ["TAO_WORKSPACE_NAME"] = cloud_metadata["name"] = os.environ.get("TAO_WORKSPACE_NAME", "AWS workspace info")

# Cloud specific details. Below is assuming AWS.
cloud_metadata["cloud_specific_details"] = {}

 # Whether it is AWS, HuggingFace or Azure
os.environ["TAO_WORKSPACE_CLOUD_TYPE"] = cloud_metadata["cloud_specific_details"]["cloud_type"] = os.environ.get("TAO_WORKSPACE_CLOUD_TYPE", "aws")

# Bucket region
os.environ["TAO_WORKSPACE_CLOUD_REGION"] = cloud_metadata["cloud_specific_details"]["cloud_region"] = os.environ.get("TAO_WORKSPACE_CLOUD_REGION", "us-west-1")

# Bucket name
os.environ["TAO_WORKSPACE_CLOUD_BUCKET_NAME"] = cloud_metadata["cloud_specific_details"]["cloud_bucket_name"] = os.environ.get("TAO_WORKSPACE_CLOUD_BUCKET_NAME", "bucket_name")

# Access and Secret keys
os.environ["TAO_WORKSPACE_CLOUD_ACCESS_KEY"] = cloud_metadata["cloud_specific_details"]["access_key"] = os.environ.get("TAO_WORKSPACE_CLOUD_ACCESS_KEY", "access_key")
os.environ["TAO_WORKSPACE_CLOUD_SECRET_KEY"] = cloud_metadata["cloud_specific_details"]["secret_key"] = os.environ.get("TAO_WORKSPACE_CLOUD_SECRET_KEY", "secret_key")

In [None]:
workspace_id = subprocess.getoutput(f"tao {model_name} create-workspace --name 'AWS Workspace' --cloud-type {cloud_metadata["cloud_specific_details"]["cloud_type"]} --cloud-specific-details '{json.dumps(cloud_metadata["cloud_specific_details"])}'")
print(workspace_id)
%store workspace_id

In [None]:
# #Optional: Restore database with a mongodump file saved in workspace dump/archive/{backup_filename}
# backup_file_name = "mongodump.tar.gz" # FIXME 10
# response = subprocess.getoutput(f"tao {model_name} restore-workspace --workspace-id {workspace_id} --backup_file_name {backup_file_name}")
# print(response)

#### Set dataset path (path within cloud bucket)

In [None]:
# FIXME 8 : Set paths relative to cloud bucket
os.environ["TAO_TRAIN_DATASET_PATH"] = train_dataset_path =  os.environ.get("TAO_TRAIN_DATASET_PATH", f"/data/purpose_built_models_{model_name}_train")
os.environ["TAO_EVAL_DATASET_PATH"] = eval_dataset_path = os.environ.get("TAO_EVAL_DATASET_PATH", f"/data/purpose_built_models_{model_name}_val")   # ocdnet, ocrnet, optical_inspection, visual_changenet_classify
os.environ["TAO_TEST_DATASET_PATH"] = test_dataset_path = os.environ.get("TAO_TEST_DATASET_PATH", f"/data/purpose_built_models_{model_name}_test")  # optical_inspection, visual_changenet_classify
train_dataset_id = None
eval_dataset_id = None
test_dataset_id = None

#### Set dataset formats <a class="anchor" id="head-4"></a>

In [None]:
if model_name == "sparse4d":
    ds_type = model_name
    ds_format = "ovpkl"
else:
    ds_type = model_name
    ds_format = "default"

### Create and pull train dataset <a class="anchor" id="head-5"></a>

In [None]:
train_dataset_id = subprocess.getoutput(f"tao {model_name} create-dataset --dataset-type {ds_type} --dataset-format {ds_format}  --workspace-id {workspace_id} --cloud-file-path {train_dataset_path} --use-for '{json.dumps(['training'])}'")
print(train_dataset_id)
if model_name in TRAIN_REUSE_FOR_EVAL_TEST_MODELS:
    eval_dataset_id = train_dataset_id
    test_dataset_id = train_dataset_id
    %store test_dataset_id
%store train_dataset_id

In [None]:
# Check progress
while True:
    clear_output(wait=True)
    response = subprocess.getoutput(f"tao {model_name} get-dataset-metadata --dataset-id {train_dataset_id} ")
    try:
        response = json.loads(response)
    except Exception as e:
        print(response)
        raise e
    print(json.dumps(response, sort_keys=True, indent=4))
    if response.get("status") == "invalid_pull":
        raise ValueError("Dataset pull failed")
    if response.get("status") == "pull_complete":
        break
    time.sleep(5)

#### Uncomment if you want to remove corrupted images in your dataset

In [None]:
# # This packages data-services experiments create and running the job of removing corrupted images
# from remove_corrupted_images import remove_corrupted_images_workflow
# # try:
#     from remove_corrupted_images import remove_corrupted_images_workflow
#     train_dataset_id = remove_corrupted_images_workflow(workspace_id, train_dataset_id)
# except Exception as e:
#     raise e

### Create and pull val dataset <a class="anchor" id="head-6"></a>

In [None]:
if model_name in EXPLICIT_EVAL_DATASET_MODELS:
    eval_dataset_id = subprocess.getoutput(f"tao {model_name} create-dataset --dataset-type {ds_type} --dataset-format {ds_format} --workspace-id {workspace_id} --cloud-file-path {eval_dataset_path} --use-for '{json.dumps(['evaluation'])}'")
    print(eval_dataset_id)
    %store eval_dataset_id

In [None]:
# Check progress
if model_name in EXPLICIT_EVAL_DATASET_MODELS:
    while True:
        clear_output(wait=True)
        response = subprocess.getoutput(f"tao {model_name} get-dataset-metadata --dataset-id {eval_dataset_id} ")
        try:
            response = json.loads(response)
        except Exception as e:
            print(response)
            raise e
        print(json.dumps(response, sort_keys=True, indent=4))
        if response.get("status") == "invalid_pull":
            raise ValueError("Dataset pull failed")
        if response.get("status") == "pull_complete":
            break
        time.sleep(5)

#### Uncomment if you want to remove corrupted images in your dataset

In [None]:
# # This packages data-services experiments create and running the job of removing corrupted images
# from remove_corrupted_images import remove_corrupted_images_workflow
# # try:
#     from remove_corrupted_images import remove_corrupted_images_workflow
#     test_dataset_id = remove_corrupted_images_workflow(workspace_id, test_dataset_id)
# except Exception as e:
#     raise e

### Create and pull test dataset <a class="anchor" id="head-7"></a>

In [None]:
if model_name in EXPLICIT_TEST_DATASET_MODELS:
    ds_type = model_name
    ds_format = "default"

    test_dataset_id = subprocess.getoutput(f"tao {model_name} create-dataset --dataset-type {ds_type} --dataset-format {ds_format} --cloud-file-path {test_dataset_path} --workspace-id {workspace_id} --use-for '{json.dumps(['testing'])}'")
    print(test_dataset_id)
    %store test_dataset_id

In [None]:
# Check progress
if model_name in EXPLICIT_TEST_DATASET_MODELS:
    while True:
        clear_output(wait=True)
        response = subprocess.getoutput(f"tao {model_name} get-dataset-metadata --dataset-id {test_dataset_id} ")
        try:
            response = json.loads(response)
        except Exception as e:
            print(response)
            raise e
        print(json.dumps(response, sort_keys=True, indent=4))
        if response.get("status") == "invalid_pull":
            raise ValueError("Dataset pull failed")
        if response.get("status") == "pull_complete":
            break
        time.sleep(5)

#### Uncomment if you want to remove corrupted images in your dataset

In [None]:
# # This packages data-services experiments create and running the job of removing corrupted images
# from remove_corrupted_images import remove_corrupted_images_workflow
# # try:
#     from remove_corrupted_images import remove_corrupted_images_workflow
#     test_dataset_id = remove_corrupted_images_workflow(workspace_id, test_dataset_id)
# except Exception as e:
#     raise e

### List the created datasets <a class="anchor" id="head-8"></a>

In [None]:
message = subprocess.getoutput(f"tao {model_name} list-datasets")
message = json.loads(message)
for rsp in message:
    rsp_keys = rsp.keys()
    assert "id" in rsp_keys
    assert "type" in rsp_keys
    assert "format" in rsp_keys
    assert "name" in rsp_keys
    print(rsp["id"],"\t",rsp["type"],"\t",rsp["format"],"\t\t",rsp["name"])

### Train Dataset convert Action <a class="anchor" id="head-9"></a>

In [None]:
# Default train dataset specs
if model_name in TRAIN_DATASET_CONVERT_MODELS:
   train_ds_convert_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action dataset_convert")
   train_ds_convert_specs_schema = json.loads(train_ds_convert_specs_response)
   train_ds_convert_specs = train_ds_convert_specs_schema.get("default", {})
   print(json.dumps(train_ds_convert_specs, indent=4))

In [None]:
# Customize train dataset specs
if model_name in TRAIN_DATASET_CONVERT_MODELS:
    if model_name == "sparse4d":
        train_ds_convert_specs["data"]["input_format"] = "AICity"
        train_ds_convert_specs["data"]["output_format"] = "OVPKL"
        train_ds_convert_specs["aicity"]["num_frames"] = 90
    print(json.dumps(train_ds_convert_specs, indent=4))

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
if model_name in TRAIN_DATASET_CONVERT_MODELS:
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind dataset --dataset-id {train_dataset_id} --action dataset_convert --specs '{json.dumps(train_ds_convert_specs)}'")
    job_map["train_convert_" + model_name] = job_id
    print(job_id)
    %store job_map

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name in TRAIN_DATASET_CONVERT_MODELS:
    job_id = job_map["train_convert_" + model_name]
    status = my_tail(model_name, job_id)

### Eval Dataset convert Action <a class="anchor" id="head-10"></a>

In [None]:
# Default val dataset specs
if model_name in EVAL_DATASET_CONVERT_MODELS:
   eval_ds_convert_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action dataset_convert")
   eval_ds_convert_specs_schema = json.loads(eval_ds_convert_specs_response)
   eval_ds_convert_specs = eval_ds_convert_specs_schema.get("default", {})
   print(json.dumps(eval_ds_convert_specs, indent=4))

In [None]:
# Customize val dataset specs
if model_name in EVAL_DATASET_CONVERT_MODELS:
    print(json.dumps(eval_ds_convert_specs, indent=4))

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
if model_name in EVAL_DATASET_CONVERT_MODELS:
    train_convert_job_id = job_map["train_convert_" + model_name]
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind dataset --dataset-id {eval_dataset_id} --action dataset_convert --parent-job-id {train_convert_job_id} --specs '{json.dumps(eval_ds_convert_specs)}'")
    job_map["eval_convert_" + model_name] = job_id
    print(job_id)
    %store job_map

In [None]:
if model_name in EVAL_DATASET_CONVERT_MODELS:
    job_id = job_map["eval_convert_" + model_name]
    status = my_tail(model_name, job_id)

### Assign PTM <a class="anchor" id="head-14"></a>

Search for PTM on NGC for the Segmentation model chosen

In [None]:
# List base experiments (PTMs) using TAO SDK  
filter_params = {"network_arch": model_name}
message = subprocess.getoutput(f"tao {model_name} list-base-experiments --filter-params '{json.dumps(filter_params)}'")
message = json.loads(message)
# Store base experiments list for reuse
base_experiments = message

print(f" Available base experiments (PTMs) for {model_name}:")
print("name\t\t\t     model id\t\t\t     network architecture")
print("-" * 120)

for exp in base_experiments:
    exp_name = exp.get("name", "N/A")
    exp_id = exp.get("id", "N/A")
    exp_arch = exp.get("network_arch", "N/A")
    print(f"{exp_name}\t{exp_id}\t{exp_arch}")

In [None]:
# Assigning pretrained models to different purpose built models versions
# From the output of previous cell make the appropriate changes to this map if you want to change the default PTM backbone.
# Changing the default backbone here requires changing default spec/config during train/eval etc like for example
# If you are changing the ptm to resnet34, then you have to modify the config key num_layers if it exists to 34 manually
pretrained_map = {"action_recognition":"actionrecognitionnet:trainable_rgb_3d",
                  "bevfusion": "bevfusion:bevfusion_1.0",
                  "ml_recog": "retail_object_recognition:trainable_v1.0",
                  "ocdnet": "ocdnet:trainable_resnet18_v1.0",
                  "ocrnet": "nvidia/tao/ocrnet:trainable_v1.0",
                  "optical_inspection": "optical_inspection:trainable_v1.0",
                  "pointpillars":"pointpillarnet:trainable_v1.0",
                  "pose_classification":"poseclassificationnet:trainable_v1.0",
                  "re_identification":"reidentificationnet_transformer:swin_tiny_256_1",
                  "sparse4d": "sparse4d:resnet_101",
                  "visual_changenet_classify": "visual_changenet_classification:visual_changenet_nvpcb_trainable_v1.0",
                  "visual_changenet_segment": "visual_changenet_segmentation_levircd:visual_changenet_levircd_trainable_v1.0",
                  "centerpose": "pretrained_fan_classification_nvimagenet:fan_small_hybrid_nvimagenet"}

if model_name == "action_recognition":
    if model_type == "of":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v2.0"
    elif model_type == "joint":
        pretrained_map["action_recognition"] = "actionrecognitionnet:trainable_v1.0,actionrecognitionnet:trainable_v2.0"

no_ptm_models = set([])

In [None]:
# Get pretrained model using TAO SDK
selected_ptm_id = None
if model_name not in no_ptm_models:

    # Search for PTM with given NGC path
    for exp in base_experiments:
        ngc_path = exp.get("ngc_path", "")
        if ngc_path.endswith(pretrained_map[model_name]):
            selected_ptm_id = exp.get("id")
            print(" Selected PTM metadata:")
            print(json.dumps(exp, indent=4))
            break

    if not selected_ptm_id:
        print(f" PTM with NGC path ending in '{pretrained_map[model_name]}' not found!")

if model_name not in no_ptm_models and selected_ptm_id:
    print(f" PTM ID {selected_ptm_id} will be used as base_experiment_id in job creation")
else:
    print(" No PTM will be used (training from scratch)")

### Train <a class="anchor" id="head-16"></a>

#### View hyperparameters that are enabled for AutoML by default <a class="anchor" id="head-15"></a>

In [None]:
if automl_enabled:
    # View default automl params enabled
    automl_params = subprocess.getoutput(f"tao {model_name} get-automl-defaults")
    print(automl_params)
    automl_params = json.loads(automl_params)    

#### Set AutoML related configurations <a class="anchor" id="head-16.1"></a>
Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters:

[ActionRecognitionNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/action_recognition/action_recognition%20-%20train.csv), 
[MetricLearningRecognition](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ml_recog/ml_recog%20-%20train.csv), 
[OCDNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocdnet/ocdnet%20-%20train.csv), 
[OCRNET](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ocrnet/ocrnet%20-%20train.csv), 
[OpticalInspection](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/optical_inspection/optical_inspection%20-%20train.csv), 
[Pointpillars](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pointpillars/pointpillars%20-%20train.csv), 
[PoseClassificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/pose_classification/pose_classification%20-%20train.csv), 
[ReIdentificationNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/re_identification/re_identification%20-%20train.csv)

In [None]:
# Set encryption key (CLI notebooks typically use this default)
encode_key = "tlt_encode"

# Prepare AutoML configuration if enabled (matching SDK approach)
automl_information = None

if automl_enabled:
    # Choose any metric that is present in the kpi dictionary present in the model's status.json
    metric = "kpi"
    
    automl_information = {
        "automl_enabled": True,
        "automl_algorithm": automl_algorithm,
        "automl_max_recommendations": 20,  # Only for bayesian
        "automl_R": 27,  # Only for hyperband
        "automl_nu": 3,  # Only for hyperband
        "epoch_multiplier": 1,  # Only for hyperband
        "override_automl_disabled_params": False,
        "automl_hyperparameters": str(automl_params),
        "metric": metric
    }
    
    print(" AutoML configuration prepared for job creation:")
    print(json.dumps(automl_information, sort_keys=True, indent=4))
else:
    print(" AutoML is disabled - training will use standard approach")

#### Provide train specs <a class="anchor" id="head-15"></a>

In [None]:
# Default train model specs
train_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action train")
train_specs_schema = json.loads(train_specs_response)
train_specs = train_specs_schema.get("default", {})
print(json.dumps(train_specs, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
train_specs["train"]["num_epochs"] = 30
train_specs["train"]["checkpoint_interval"] = 10
train_specs["train"]["validation_interval"] = 10
train_specs["train"]["num_gpus"] = 1
if model_name == "action_recognition":
    train_specs["model"]["model_type"] = model_type
    train_specs["model"]["input_type"] = model_input_type
    train_specs["dataset"]["batch_size"] = 2
    train_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "centerpose":
    train_specs["dataset"]["category"] = "bike"
    train_specs["dataset"]["batch_size"] = 4
elif model_name == "ocdnet":
    train_specs["dataset"]["train_dataset"]["loader"]["batch_size"] = 16
elif model_name == "ocrnet":
    train_specs["dataset"]["batch_size"] = 16
elif model_name == "pose_classification":
    if model_type == "nvidia":
        train_specs["dataset"]["num_classes"] = 6
        train_specs["model"]["graph_layout"] = "nvidia"
        train_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        train_specs["dataset"]["num_classes"] = 5
        train_specs["model"]["graph_layout"] = "openpose"
        train_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    train_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
    train_specs["dataset"]["num_workers"] = 4 #Modify the num_workers according to your hardware setup
    train_specs["dataset"]["batch_size"] = 16 #Modify the batch_size according to your hardware setup
elif model_name == "sparse4d":
    train_specs["dataset"]["sequences"]["split_num"] = 90
    train_specs["dataset"]["train_dataset"]["sequences_split_num"] = 90
print(json.dumps(train_specs, indent=4))

#### Run train <a class="anchor" id="head-16"></a>

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
ds_convert_parent = ""
if model_name == "ocrnet":
    val_convert_job_id = job_map["eval_convert_" + model_name]
    ds_convert_parent = f"--parent_job_id {val_convert_job_id}"
elif model_name in ("ocrnet", "pointpillars"):
    train_convert_job_id = job_map["train_convert_" + model_name]
    ds_convert_parent = f"--parent_job_id {train_convert_job_id}"

if not eval_dataset_id:
    eval_dataset_id = train_dataset_id

automl_settings = json.dumps(automl_information) if automl_information else 'null'    

train_datasets_json = json.dumps([train_dataset_id])
job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action train --name {model_name}_training_job --encryption-key {encode_key} --workspace-id {workspace_id} --base-experiment-ids {selected_ptm_id} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} --inference-dataset {eval_dataset_id} --calibration-dataset {train_dataset_id} --specs '{json.dumps(train_specs)}' --automl-settings '{automl_settings}'")
job_map["train_" + model_name] = job_id
print(job_id)
%store job_map

In [None]:
# Monitor job status
job_id = job_map["train_" + model_name]
if automl_enabled:    
    while True:
        clear_output(wait=True)
        response = subprocess.getoutput(f"tao {model_name} get-job-metadata --job-id {job_id}")
        response = json.loads(response)
        job_details = response.get("job_details", {})
        if "error_desc" in response.keys() and response["error_desc"] in ("Job trying to retrieve not found", "No AutoML run found"):
            print("Job is being created")
            time.sleep(5)
            continue
        print(json.dumps(job_details, sort_keys=True, indent=4))
        assert "status" in response.keys() and response.get("status") != "Error"
        if response.get("status") in ["Done","Error"]:
            break
        time.sleep(15)
else:
    # Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
    status = my_tail(model_name, job_id)

In [None]:
## To Stop an AutoML JOB
#    1. Stop the 'Monitor job status' cell (the cell right before this cell) manually
#    2. Uncomment the snippet in the next cell and run the cell

In [None]:
# if automl_enabled:
#     job_id = job_map["train_" + model_name]
#     job_id = subprocess.getoutput(f"tao {model_name} pause-job --job-id {job_id}")
#     job_map["canceled_" + model_name] = job_id
#     print(job_id)
#     %store job_map

In [None]:
## Resume AutoML

In [None]:
# Uncomment the below snippet if you want to resume an already stopped AutoML job and then run the 'Monitor job status' cell above (4th cell above from this cell)
# if automl_enabled:
#     job_id = job_map["train_" + model_name]
#     job_id = subprocess.getoutput(f"tao {model_name} resume-job --job-id {job_id} --specs '{json.dumps(train_specs)}' {ds_convert_parent}")
#     job_map["resumed_" + model_name] = job_id
#     print(job_id)
#     %store job_map

### Publish model

#### Edit the method of choosing checkpoint from list of train checkpoint files

In [None]:
# Print model handler parameters
job_id = job_map["train_" + model_name]
model_parameters = subprocess.getoutput(f"tao {model_name} get-job-metadata --job-id {job_id}")
model_parameters = json.loads(model_parameters)
update_checkpoint_choosing = {}
update_checkpoint_choosing["checkpoint_choose_method"] = model_parameters["checkpoint_choose_method"]
update_checkpoint_choosing["checkpoint_epoch_number"] = model_parameters["checkpoint_epoch_number"]
print(json.dumps(update_checkpoint_choosing, indent=4))

In [None]:
# Checkpoint method configuration
# Checkpoint selection is handled per-job, not per-experiment
# You can configure this when creating export/inference jobs if needed

# Example: Change checkpoint selection method for future jobs
update_checkpoint_choosing["checkpoint_choose_method"] = "latest_model"  # Choose between best_model/latest_model/from_epoch_number
# Note: If from_epoch_number is chosen, you would specify the epoch in job creation specs

print("Checkpoint selection configuration updated:")
print(f"Method: {update_checkpoint_choosing['checkpoint_choose_method']}")
print("This will be applied to future job creations")
print(json.dumps(update_checkpoint_choosing, sort_keys=True, indent=4))

json_update_data = json.dumps(update_checkpoint_choosing)
updated_job_data = subprocess.getoutput(f"tao {model_name} update-job --job-id {job_id} --update-data '{json_update_data}'")
print("\n Updated job data:")
print(json.dumps(json.loads(updated_job_data), indent=4))

#### Push model to private ngc team registry

In [None]:
display_name = f"TAO {model_name}"  # Display name for the model to be published on the model card
description = f"Train {model_name}"  # Short description for the model to be published on the model card
team = "tao"  # Team within org for the model to be published to

job_id = job_map["train_" + model_name]
message = subprocess.getoutput(f"tao {model_name} publish-model --job-id {job_id} --display-name='{display_name}' --description='{description}' --team {team}")
print(message)

#### Remove model from private ngc team registry

In [None]:
# message = subprocess.getoutput(f"tao {model_name} remove-published-model --job-id {job_id} --team {team}")
# print(message)

### Evaluate <a class="anchor" id="head-17"></a>

#### Provide evaluate specs

In [None]:
# Default evaluate model specs
eval_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action evaluate")
eval_specs_schema = json.loads(eval_specs_response)
eval_specs = eval_specs_schema.get("default", {})
print(json.dumps(eval_specs, indent=4))

In [None]:
# Customize evaluate model specs
if model_name == "action_recognition":
    eval_specs["model"]["model_type"] = model_type
    eval_specs["model"]["input_type"] = model_input_type
    eval_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "pose_classification":
    if model_type == "nvidia":
        eval_specs["dataset"]["num_classes"] = 6
        eval_specs["model"]["graph_layout"] = "nvidia"
        eval_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        eval_specs["dataset"]["num_classes"] = 5
        eval_specs["model"]["graph_layout"] = "openpose"
        eval_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    eval_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == 'visual_changenet_segment':
    eval_specs["task"] = 'segment'
elif model_name == 'visual_changenet_classify':
    eval_specs["task"] = 'classify'
    eval_specs["train"]["classify"]["loss"] = "contrastive"
elif model_name == "centerpose":
    eval_specs["dataset"]["category"] = "bike"
print(json.dumps(eval_specs, indent=4))

#### Run evaluate

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
parent = job_map["train_" + model_name]
train_datasets_json = json.dumps([train_dataset_id])
test_dataset_str = f"--inference-dataset {test_dataset_id}" if test_dataset_id else ""
job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action evaluate --name {model_name}_evaluation_job --encryption-key {encode_key} --workspace-id {workspace_id} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} {test_dataset_str} --parent-job-id {parent} --specs '{json.dumps(eval_specs)}'")
job_map["eval_" + model_name] = job_id
print(job_id)
%store job_map

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
job_id = job_map["eval_" + model_name]
status = my_tail(model_name, job_id)

### Prune, Retrain and Evaluation <a class="anchor" id="head-18"></a>

- We optimize the trained model by pruning and retraining in the following cells

#### Prune

In [None]:
# Default prune model specs
if model_name in PRUNEABLE_MODELS:
   prune_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action prune")
   prune_specs_schema = json.loads(prune_specs_response)
   prune_specs = prune_specs_schema.get("default", {})
   print(json.dumps(prune_specs, indent=4))

In [None]:
# Apply changes to prune specs if neccessary
if model_name in PRUNEABLE_MODELS:
    print(json.dumps(prune_specs, indent=4))

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
if model_name in PRUNEABLE_MODELS:
    parent = job_map["train_" + model_name]
    train_datasets_json = json.dumps([train_dataset_id])
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action prune --name {model_name}_prune_job --encryption-key {encode_key} --workspace-id {workspace_id} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} --parent-job-id {parent} --specs '{json.dumps(prune_specs)}'")
    job_map["prune_" + model_name] = job_id

In [None]:
# Check status of pruning job (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name in PRUNEABLE_MODELS:
    job_id = job_map["prune_" + model_name]
    status = my_tail(model_name, job_id)

#### Retrain

In [None]:
# Default retrain model specs
if model_name in PRUNEABLE_MODELS:
   retrain_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action retrain")
   retrain_specs_schema = json.loads(retrain_specs_response)
   retrain_specs = retrain_specs_schema.get("default", {})
   print(json.dumps(retrain_specs, indent=4))

In [None]:
# Apply changes for any of the parameters listed in the previous cell as required
if model_name in PRUNEABLE_MODELS:
    retrain_specs["train"]["num_epochs"] = 30
    retrain_specs["train"]["checkpoint_interval"] = 10
    retrain_specs["train"]["validation_interval"] = 10
    retrain_specs["train"]["num_gpus"] = 1
    if model_name == "ocdnet":
        retrain_specs["dataset"]["train_dataset"]["loader"]["batch_size"] = 16
    elif model_name == "ocrnet":
        retrain_specs["dataset"]["batch_size"] = 16
    print(json.dumps(retrain_specs, sort_keys=True, indent=4))

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
if model_name in PRUNEABLE_MODELS:
    parent = job_map["prune_" + model_name]
    # Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
    train_datasets_json = json.dumps([train_dataset_id])
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action retrain --name {model_name}_training_job --encryption-key {encode_key} --workspace-id {workspace_id} --parent-job-id {parent} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} --specs '{json.dumps(train_specs)}'")
    job_map["retrain_" + model_name] = job_id
    print(job_id)
    %store job_map    

In [None]:
# Check status of retrain job (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name in PRUNEABLE_MODELS:
    job_id = job_map["retrain_" + model_name]
    status = my_tail(model_name, job_id)

#### Evaluate after retrain

In [None]:
# Default evaluate model specs
if model_name in PRUNEABLE_MODELS:
   eval_retrain_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action evaluate")
   eval_retrain_specs_schema = json.loads(eval_retrain_specs_response)
   eval_retrain_specs = eval_retrain_specs_schema.get("default", {})
   print(json.dumps(eval_retrain_specs, indent=4))

In [None]:
# Customize evaluate model specs if necessary
if model_name in PRUNEABLE_MODELS:
    print(json.dumps(eval_retrain_specs, indent=4))

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
if model_name in PRUNEABLE_MODELS:
    parent = job_map["retrain_" + model_name]
    train_datasets_json = json.dumps([train_dataset_id])
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action evaluate --name {model_name}_evaluation_job --encryption-key {encode_key} --workspace-id {workspace_id} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} --parent-job-id {parent} --specs '{json.dumps(eval_retrain_specs)}'")
    job_map["eval2_" + model_name] = job_id

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name in PRUNEABLE_MODELS:
    job_id = job_map["eval2_" + model_name]
    status = my_tail(model_name, job_id)

### Export <a class="anchor" id="head-19"></a>

#### Provide Export specs

In [None]:
if model_name not in UN_EXPORTABLE_MODELS:
    # Default export model specs
   export_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action export")
   export_specs_schema = json.loads(export_specs_response)
   export_specs = export_specs_schema.get("default", {})
   print(json.dumps(export_specs, indent=4))

In [None]:
# Customize export model specs
if model_name == "action_recognition":
    export_specs["model"]["model_type"] = model_type
    export_specs["model"]["input_type"] = model_input_type
    export_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "pose_classification":
    if model_type == "nvidia":
        export_specs["dataset"]["num_classes"] = 6
        export_specs["model"]["graph_layout"] = "nvidia"
        export_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        export_specs["dataset"]["num_classes"] = 5
        export_specs["model"]["graph_layout"] = "openpose"
        export_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    export_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == 'visual_changenet_segment':
    export_specs["export"]["input_height"] = 224 
    export_specs["export"]["input_width"] = 224 
    export_specs["task"] = 'segment'
elif model_name == 'visual_changenet_classify':
    export_specs["export"]["input_height"] = 896 
    export_specs["export"]["input_width"] = 224
    export_specs["task"] = 'classify'
if model_name not in UN_EXPORTABLE_MODELS:
    print(json.dumps(export_specs, indent=4))

#### Run export

In [None]:
if model_name not in UN_EXPORTABLE_MODELS:
    # Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
    parent = job_map["train_" + model_name]
    train_datasets_json = json.dumps([train_dataset_id])
    test_dataset_str = f"--inference-dataset {test_dataset_id}" if test_dataset_id else ""
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action export --name {model_name}_export_job --encryption-key {encode_key} --workspace-id {workspace_id} --parent-job-id {parent} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} {test_dataset_str} --specs '{json.dumps(export_specs)}'")
    job_map["export_" + model_name] = job_id
    print(job_id)
    %store job_map

In [None]:
if model_name not in UN_EXPORTABLE_MODELS:
    # Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
    job_id = job_map["export_" + model_name]
    status = my_tail(model_name, job_id)

### TRT Engine generation using TAO-Deploy <a class="anchor" id="head-20"></a>

#### Provide trt engine generation specs

In [None]:
# Default gen_trt_engine model specs
if model_name in TAO_DEPLOY_MODELS:
   tao_deploy_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action gen_trt_engine")
   tao_deploy_specs_schema = json.loads(tao_deploy_specs_response)
   tao_deploy_specs = tao_deploy_specs_schema.get("default", {})
   print(json.dumps(tao_deploy_specs, indent=4))

In [None]:
# Customize convert model specs
if model_name in TAO_DEPLOY_MODELS:
    # Make changes to the specs dictionary if required here
    if model_name in ("ml_recog", "ocdnet"):
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "INT8"
    elif model_name in ("ocrnet", "optical_inspection"):
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "fp16"   
    elif model_name == 'visual_changenet_classify':
        tao_deploy_specs["task"] = 'classify'
    elif model_name == 'visual_changenet_segment':
        tao_deploy_specs["gen_trt_engine"]["tensorrt"]["data_type"] = "fp16"
        tao_deploy_specs["task"] = 'segment' 
    print(json.dumps(tao_deploy_specs, indent=4))

#### Run TRT engine generation using TAO-Deploy

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
if model_name in TAO_DEPLOY_MODELS:
    parent = job_map["export_" + model_name]
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action gen_trt_engine --name {model_name}_gen_trt_engine_job --encryption-key {encode_key} --workspace-id {workspace_id} --calibration-dataset {train_dataset_id} --parent-job-id {parent} --specs '{json.dumps(tao_deploy_specs)}'")
    job_map["gen_trt_engine_" + model_name] = job_id
    print(job_id)
    %store job_map

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name in TAO_DEPLOY_MODELS:
    job_id = job_map["gen_trt_engine_" + model_name]
    status = my_tail(model_name, job_id)

### TAO inference <a class="anchor" id="head-21"></a>

#### Provide TAO inference specs

In [None]:
# Default inference model specs
tao_inference_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action inference")
tao_inference_specs_schema = json.loads(tao_inference_specs_response)
tao_inference_specs = tao_inference_specs_schema.get("default", {})
print(json.dumps(tao_inference_specs, indent=4))

In [None]:
# Customize TAO inference specs
if model_name == "action_recognition":
    tao_inference_specs["model"]["model_type"] = model_type
    tao_inference_specs["model"]["input_type"] = model_input_type
    tao_inference_specs["dataset"]["label_map"] = {"catch": 0, "smile": 1}
elif model_name == "pose_classification":
    if model_type == "nvidia":
        tao_inference_specs["dataset"]["num_classes"] = 6
        tao_inference_specs["model"]["graph_layout"] = "nvidia"
        tao_inference_specs["dataset"]["label_map"] = {"sitting_down": 0,"getting_up": 1,"sitting": 2,"standing": 3,"walking": 4,"jumping": 5}
    elif model_type == "kinetics":
        tao_inference_specs["dataset"]["num_classes"] = 5
        tao_inference_specs["model"]["graph_layout"] = "openpose"
        tao_inference_specs["dataset"]["label_map"] = {"front_raises": 0,"pull_ups": 1,"clean_and_jerk": 2,"presenting_weather_forecast": 3,"deadlifting": 4}
elif model_name == "re_identification":
    tao_inference_specs["dataset"]["num_classes"] = 100 #The number set in obtain_subset script
elif model_name == 'visual_changenet_classify':
    tao_inference_specs["inference"]["batch_size"] = tao_inference_specs["dataset"]["classify"]['batch_size'] 
    tao_inference_specs["task"] = 'classify'
elif model_name == 'visual_changenet_segment':
    tao_inference_specs["inference"]["batch_size"] = tao_inference_specs["dataset"]["segment"]['batch_size'] 
    tao_inference_specs["task"] = 'segment'
elif model_name == "centerpose":
    tao_inference_specs["dataset"]["category"] = "bike"
print(json.dumps(tao_inference_specs, indent=4))

#### Run TAO inference

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
inference_dataset = test_dataset_id
if not inference_dataset:
    inference_dataset = eval_dataset_id
if not inference_dataset:
    inference_dataset = train_dataset_id
train_datasets_json = json.dumps([train_dataset_id])

parent = job_map["train_" + model_name]
job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action inference --name {model_name}_inference_job --encryption-key {encode_key} --workspace-id {workspace_id} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} --inference-dataset {inference_dataset} --parent-job-id {parent} --specs '{json.dumps(tao_inference_specs)}'")
job_map["tao_inference_" + model_name] = job_id
print(job_id)
%store job_map

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
job_id = job_map["tao_inference_" + model_name]
status = my_tail(model_name, job_id)

### TRT inference <a class="anchor" id="head-22"></a>

#### Provide TRT inference specs

In [None]:
# Default inference model specs
if model_name in TAO_DEPLOY_MODELS:
   trt_inference_specs_response = subprocess.getoutput(f"tao {model_name} get-job-schema --action inference")
   trt_inference_specs_schema = json.loads(trt_inference_specs_response)
   trt_inference_specs = trt_inference_specs_schema.get("default", {})
   print(json.dumps(trt_inference_specs, indent=4))

In [None]:
# Customize TRT inference specs
# Change any spec if you wish
if model_name in TAO_DEPLOY_MODELS:
    if model_name == 'visual_changenet_classify':
        trt_inference_specs["inference"]["batch_size"] = trt_inference_specs["dataset"]["classify"]['batch_size']
        trt_inference_specs["task"] = 'classify'
    elif model_name == 'visual_changenet_segment':
        trt_inference_specs["inference"]["batch_size"] = trt_inference_specs["dataset"]["segment"]['batch_size']
        trt_inference_specs["task"] = 'segment'
    print(json.dumps(trt_inference_specs, indent=4))

#### Run TRT inference

In [None]:
# Add --platform_id uuid for NVCF backend, where the uuid is a key from output of tao gpu-types
if model_name in TAO_DEPLOY_MODELS:
    parent = job_map["gen_trt_engine_" + model_name]
    inference_dataset = test_dataset_id
    if not inference_dataset:
        inference_dataset = eval_dataset_id
    if not inference_dataset:
        inference_dataset = train_dataset_id
    train_datasets_json = json.dumps([train_dataset_id])
    job_id = subprocess.getoutput(f"tao {model_name} create-job --kind experiment --action inference --name {model_name}_inference_job --encryption-key {encode_key} --workspace-id {workspace_id} --train-datasets '{train_datasets_json}' --eval-dataset {eval_dataset_id} --inference-dataset {inference_dataset} --parent-job-id {parent} --specs '{json.dumps(trt_inference_specs)}'")
    job_map["trt_inference_" + model_name] = job_id 
    print(job_id)
%store job_map

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name in TAO_DEPLOY_MODELS:
    job_id = job_map["trt_inference_" + model_name]
    status = my_tail(model_name, job_id)

In [None]:
# # Optional: Backup database with a mongodump file saved in workspace dump/archive/{backup_filename}
# backup_file_name = "mongodump.tar.gz" # FIXME 10
# subprocess.getoutput(f"tao {model_name} backup-workspace --workspace-id {workspace_id} --backup_file_name {backup_file_name}")

### Delete jobs <a class="anchor" id="head-22"></a>

In [None]:
print("Deleting all created jobs...")

jobs_to_delete = []
for job_key, job_id in job_map.items():
    try:
        delete_response = subprocess.getoutput(f"tao {model_name} delete-job --job-id {job_id} --confirm")
        print(f"Deleted job: {job_key}")
    except Exception as e:
        print(f"Error deleting job {job_key}: {e}")

print(f"\n Job cleanup completed! Processed {len(jobs_to_delete)} jobs.")

### Delete dataset <a class="anchor" id="head-24"></a>

#### Delete train dataset

In [None]:
! tao {model_name} delete-dataset --dataset-id {train_dataset_id}

#### Delete val dataset

In [None]:
if model_name in EXPLICIT_EVAL_DATASET_MODELS:
    ! tao {model_name} delete-dataset --dataset-id {eval_dataset_id}

#### Delete test dataset

In [None]:
if model_name in EXPLICIT_TEST_DATASET_MODELS:
    ! tao {model_name} delete-dataset --dataset-id {test_dataset_id}