### TAO remote client - Object Detection

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)

### Sample prediction for an Object Detection model
<img align="center" src="../example_images/sample_object_detection.jpg" width="960">

### The workflow in a nutshell

- Creating a dataset
- Upload kitti dataset to the service
- Running dataset convert
- Getting a PTM from NGC
- Model Actions
    - Train (Normal/AutoML)
    - Evaluate
    - Prune, retrain
    - Export
    - Tao-Deploy
    - Inference on TAO
    - Inference on TRT

### Table of contents

1. [Install TAO remote client ](#head-1)
1. [Set the remote service base URL](#head-2)
1. [Access the shared volume](#head-3)
1. [Create the datasets](#head-4)
1. [List datasets](#head-5)
1. [Provide and customize dataset convert specs](#head-6)
1. [Run dataset convert](#head-7)
1. [Create a model experiment](#head-8)
1. [Find pretrained model](#head-9)
1. [Customize model metadata](#head-10)
1. [View hyperparameters that are enabled for AutoML by default](#head-11)
1. [Set AutoML related configurations](#head-12)
1. [Provide train specs](#head-13)
1. [Run train](#head-14)
1. [View checkpoint files](#head-15)
1. [Provide evaluate specs](#head-16)
1. [Run evaluate](#head-17)
1. [Provide prune specs](#head-18)
1. [Run prune](#head-19)
1. [Provide retrain specs](#head-20)
1. [Run retrain](#head-21)
1. [Run evaluate on retrain](#head-21-1)
1. [Provide export specs](#head-22)
1. [Run export](#head-23)
1. [Provide trt engine generation specs](#head-26)
1. [Run TRT Engine generation using TAO-Deploy](#head-27)
1. [Provide TAO inference specs](#head-28)
1. [Run TAO inference](#head-29)
1. [Provide TRT inference specs](#head-30)
1. [Run TRT inference](#head-31)
1. [Delete experiment](#head-32)
1. [Delete datasets](#head-33)
1. [Unmount shared volume](#head-34)
1. [Uninstall TAO Remote Client](#head-35)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import os
import glob
import subprocess
import getpass
import uuid
import json

In [None]:
namespace = 'default'

### FIXME

1. Assign a model_name in FIXME 1
3. Assign the ip_address and port_number in FIXME 2 and FIXME 3 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
4. Assign the ngc_api_key variable in FIXME 4
5. (Optional) Enable AutoML if needed in FIXME 5
6. Choose between default and custom dataset in FIXME 6
7. Assign path of DATA_DIR in FIXME 7
8. Choose between Bayesian and Hyperband automl_algorithm in FIXME 8 (If automl was enabled in FIXME5)

In [None]:
# Available models (#FIXME 1):
# 1. deformable-detr - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/detectnet_v2.html
# 2. detectnet-v2 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/detectnet_v2.html
# 3. dino - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/dino.html
# 4. dssd - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/ssd.html
# 5. efficientdet-tf1 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/efficientdet_tf1.html
# 6. efficientdet-tf2 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/efficientdet_tf2.html
# 7. faster-rcnn - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/fasterrcnn.html
# 8. retinanet - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/retinanet.html
# 9. ssd - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/ssd.html
# 10. yolo-v3 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v3.html
# 11. yolo-v4 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html
# 12. yolo-v4-tiny - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4_tiny.html

model_name = "detectnet-v2" # FIXME1 (Add the model name from the above mentioned list)

### Install TAO remote client <a class="anchor" id="head-1"></a>

In [None]:
# SKIP this step IF you have already installed the TAO-Client wheel.
! pip3 install nvidia-tao-client

In [None]:
# View the version of the TAO-Client
! tao-client --version

### Set the remote service base URL and Token <a class="anchor" id="head-2"></a>

In [None]:
# Define the node_addr and port number
node_addr = "<ip_address>" # FIXME2 example: 10.137.149.22
node_port = "<port_number>" # FIXME3 example: 32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'

ngc_api_key = "<ngc_api_key>" # FIXME4 example: (Add NGC API key)

In [None]:
automl_enabled = False # FIXME5 set to True if you want to run automl for the model chosen in the previous cell

In [None]:
%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1

# Exchange NGC_API_KEY for JWT
identity = json.loads(subprocess.getoutput(f'tao-client login --ngc-api-key {ngc_api_key}'))

%env USER={identity['user_id']}
%env TOKEN={identity['token']}

### Access the shared volume <a class="anchor" id="head-3"></a>

In [None]:
# Get PVC ID
pvc_id = subprocess.getoutput(f'kubectl get pvc tao-toolkit-api-pvc -n {namespace} -o jsonpath="{{.spec.volumeName}}"')
print(pvc_id)

In [None]:
# Get NFS server info
provisioner = json.loads(subprocess.getoutput(f'helm get values nfs-subdir-external-provisioner -o json'))
nfs_server = provisioner['nfs']['server']
nfs_path = provisioner['nfs']['path']
print(nfs_server, nfs_path)

In [None]:
user = getpass.getuser()
home = os.path.expanduser('~')

! echo "Password for {user}"
password = getpass.getpass()

In [None]:
# Mount shared volume 
! mkdir -p ~/shared

command = "apt-get -y install nfs-common >> /dev/null"
! echo {password} | sudo -S -k {command}

command = f"mount -t nfs {nfs_server}:{nfs_path}/{namespace}-tao-toolkit-api-pvc-{pvc_id} ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE

### Create the datasets <a class="anchor" id="head-4"></a>

We will be using the kitti object detection dataset for this example. To find more details, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d). One can request the images from [here](http://www.cvlibs.net/download.php?file=data_object_image_2.zip), and the training labels from [here](http://www.cvlibs.net/download.php?file=data_object_label_2.zip).

**If using custom dataset; it should follow this dataset structure**
```
$DATA_DIR/train
├── images
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   ├── ...
└── labels
    ├── image_name_1.txt
    ├── image_name_2.txt
    ├── ...
$DATA_DIR/val
├── images
│   ├── image_name_5.jpg
│   ├── image_name_6.jpg
|   ├── ...
└── labels
    ├── image_name_5.txt
    ├── image_name_6.txt
    ├── ...
```
The file name should be same for images and labels folders

In [None]:
dataset_to_be_used = "default" #FIXME6 #default/custom; default for the dataset used in this tutorial notebook; custom for a different dataset
DATA_DIR = model_name # FIXME7
os.environ['DATA_DIR']= DATA_DIR
!mkdir -p $DATA_DIR

In [None]:
if dataset_to_be_used == "default":
    !python3 -m pip install awscli
    !aws s3 cp --no-sign-request s3://tao-object-detection-synthetic-dataset/tao_od_synthetic_train.tar.gz $DATA_DIR/
    !aws s3 cp --no-sign-request s3://tao-object-detection-synthetic-dataset/tao_od_synthetic_val.tar.gz $DATA_DIR/

    !mkdir -p $DATA_DIR/train/ && rm -rf $DATA_DIR/train/*
    !mkdir -p $DATA_DIR/val/ && rm -rf $DATA_DIR/val/*
    
    !tar -xzf $DATA_DIR/tao_od_synthetic_train.tar.gz -C $DATA_DIR/train/
    !tar -xzf $DATA_DIR/tao_od_synthetic_val.tar.gz -C $DATA_DIR/val/

In [None]:
if model_name in ("efficientdet-tf1", "efficientdet-tf2", "deformable-detr", "dino"):
    ds_format = "coco"
else:
    ds_format = "kitti"

In [None]:
train_dataset_id = subprocess.getoutput(f"tao-client {model_name} dataset-create --dataset_type object_detection --dataset_format {ds_format}")
print(train_dataset_id)

In [None]:
if model_name in ("efficientdet-tf1", "efficientdet-tf2", "deformable-detr", "dino"):
    import subprocess
    !python3 -m pip install ujson opencv-python tqdm
    if model_name == "efficientdet-tf2":
        label_map_extension = "yaml"
    else:
        label_map_extension = "txt"
    num_classes = subprocess.getoutput(f'python3 dataset_prepare/kitti/kitti_to_coco.py {DATA_DIR}/train/labels {DATA_DIR}/train {label_map_extension}')
    ! rsync -ah --info=progress2 $DATA_DIR/train/images ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
    ! rsync -ah --info=progress2 $DATA_DIR/train/annotations.json ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/annotations.json
    ! rsync -ah --info=progress2 $DATA_DIR/train/label_map.{label_map_extension} ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
else:
    ! rsync -ah --info=progress2 $DATA_DIR/train/images ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
    ! rsync -ah --info=progress2 $DATA_DIR/train/labels ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
! echo DONE

In [None]:
eval_dataset_id = subprocess.getoutput(f"tao-client {model_name} dataset-create --dataset_type object_detection --dataset_format {ds_format}")
print(eval_dataset_id)

In [None]:
if model_name in ("efficientdet-tf1", "efficientdet-tf2", "deformable-detr", "dino"):
    subprocess.getoutput(f'python3 dataset_prepare/kitti/kitti_to_coco.py {DATA_DIR}/val/labels {DATA_DIR}/val {label_map_extension}')
    ! rsync -ah --info=progress2 $DATA_DIR/val/images ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
    ! rsync -ah --info=progress2 $DATA_DIR/val/annotations.json ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/annotations.json
    ! rsync -ah --info=progress2 $DATA_DIR/val/label_map.{label_map_extension} ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
else:
    ! rsync -ah --info=progress2 $DATA_DIR/val/images ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
    ! rsync -ah --info=progress2 $DATA_DIR/val/labels ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
! echo DONE

In [None]:
infer_dataset_id = subprocess.getoutput(f"tao-client {model_name} dataset-create --dataset_type object_detection --dataset_format raw")
print(infer_dataset_id)

In [None]:
! rsync -ah --info=progress2 $DATA_DIR/val/images ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}/
if model_name in ("efficientdet-tf1", "efficientdet-tf2", "deformable-detr", "dino"):
    ! rsync -ah --info=progress2 $DATA_DIR/val/annotations.json ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}/annotations.json
    ! rsync -ah --info=progress2 $DATA_DIR/val/label_map.{label_map_extension} ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}/
! echo DONE

### List datasets <a class="anchor" id="head-5"></a>

In [None]:
pattern = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', '*', 'metadata.json')

datasets = []
for metadata_path in glob.glob(pattern):
    with open(metadata_path, 'r') as metadata_file:
        datasets.append(json.load(metadata_file))

print(json.dumps(datasets, indent=2))

### Provide and customize dataset convert specs <a class="anchor" id="head-6"></a>

In [None]:
# Choose dataset convert action
if model_name not in ("deformable-detr","dino"):
    if model_name in ("dssd", "ssd", "retinanet"):
        convert_action = "convert_and_index"
    elif "efficientdet" in model_name:
        convert_action = "convert_" + model_name.replace("-","_")
    else:
        convert_action = "convert"

In [None]:
# Default train dataset specs
if model_name not in ("deformable-detr","dino"):
    ! tao-client {model_name} dataset-convert-defaults --id {train_dataset_id} --action {convert_action} | tee ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/specs/{convert_action}.json

In [None]:
# Customize train dataset specs
if model_name not in ("deformable-detr","dino"):
    specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', train_dataset_id, 'specs', f'{convert_action}.json')

    with open(specs_path , "r") as specs_file:
        specs = json.load(specs_file)

    # Apply changes
    if "efficientdet" in model_name:
        specs["dataset_convert"]["num_shards"] = 256
        specs["dataset_convert"]["tag"] = "train"
    else:
        specs["kitti_config"]["image_extension"] = ".jpg" #Change to png if your entire dataset is of png format

    if convert_action == "convert_and_index": # This can be applied for "convert" action also for networks like dnv2, yolo's, frcnn etc but not mandatory; for convert_and_index action it is mandatory
        #Map your classes to a superclass like mapping pedestrian to person or just the same name
        #Mention the classes in the dataset and their mapping
        specs["target_class_mapping"] = [{"key":"cone","value":"cone"},
                                 {"key":"cart","value":"cart"},
                                 {"key":"fire_extingusher","value":"fire_extingusher"},
                                 {"key":"forklift","value":"forklift"}]
    with open(specs_path, "w") as specs_file:
        json.dump(specs, specs_file, indent=2)

    print(json.dumps(specs, indent=2))

In [None]:
# Default eval dataset specs
if model_name not in ("deformable-detr","dino"):
    ! tao-client {model_name} dataset-convert-defaults --id {eval_dataset_id} --action {convert_action} | tee ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/specs/{convert_action}.json

In [None]:
# Customize eval dataset specs
if model_name not in ("deformable-detr","dino"):
    specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', eval_dataset_id, 'specs', f'{convert_action}.json')

    with open(specs_path , "r") as specs_file:
        specs = json.load(specs_file)

    # Apply changes
    if "efficientdet" in model_name:
        specs["dataset_convert"]["num_shards"] = 256
        specs["dataset_convert"]["tag"] = "val"
    else:
        specs["kitti_config"]["image_extension"] = ".jpg" #Change to png if your entire dataset is of png format

    if convert_action == "convert_and_index": # This can be applied for "convert" action also for networks like dnv2, yolo's, frcnn etc but not mandatory; for convert_and_index action it is mandatory
        #Map your classes to a superclass like mapping pedestrian to person or just the same name
        #Mention the classes in the dataset and their mapping
        specs["target_class_mapping"] = [{"key":"cone","value":"cone"},
                                 {"key":"cart","value":"cart"},
                                 {"key":"fire_extingusher","value":"fire_extingusher"},
                                 {"key":"forklift","value":"forklift"}]

    with open(specs_path, "w") as specs_file:
        json.dump(specs, specs_file, indent=2)

    print(json.dumps(specs, indent=2))

### Run dataset convert <a class="anchor" id="head-7"></a>

In [None]:
if model_name not in ("deformable-detr","dino"):
    train_convert_job_id = subprocess.getoutput(f"tao-client {model_name} dataset-convert --id {train_dataset_id}  --action {convert_action} ")
    print(train_convert_job_id)

In [None]:
def my_tail(logs_dir, log_file):
    %env LOG_FILE={logs_dir}/{log_file}
    ! mkdir -p {logs_dir}
    ! [ ! -f "$LOG_FILE" ] && touch $LOG_FILE && chmod 666 $LOG_FILE
    ! tail -f -n +1 $LOG_FILE | while read LINE; do echo "$LINE"; [[ "$LINE" == "EOF" ]] && pkill -P $$ tail; done
    
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name not in ("deformable-detr","dino"):
    logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', train_dataset_id, 'logs')
    log_file = f"{train_convert_job_id}.txt"

    my_tail(logs_dir, log_file)

In [None]:
if model_name not in ("deformable-detr","dino"):
    eval_convert_job_id = subprocess.getoutput(f"tao-client {model_name} dataset-convert --id {eval_dataset_id}  --action {convert_action} ")
    print(eval_convert_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name not in ("deformable-detr","dino"):
    logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', eval_dataset_id, 'logs')
    log_file = f"{eval_convert_job_id}.txt"

    my_tail(logs_dir, log_file)

### Create a model experiment <a class="anchor" id="head-8"></a>

In [None]:
network_arch = model_name.replace("-","_")
model_id = subprocess.getoutput(f"tao-client {model_name} model-create --network_arch {network_arch} --encryption_key tlt_encode ")
print(model_id)

### Find pretrained model <a class="anchor" id="head-9"></a>

In [None]:
# List all pretrained models for the chosen network architecture
pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')

for ptm_metadata_path in glob.glob(pattern):
  with open(ptm_metadata_path, 'r') as metadata_file:
    ptm_metadata = json.load(metadata_file)
    metadata_network_arch = ptm_metadata.get("network_arch")
    if metadata_network_arch == network_arch:
      if "encryption_key" not in ptm_metadata.keys():
        print(f'PTM Name: {ptm_metadata["name"]}; PTM version: {ptm_metadata["version"]}; NGC PATH: {ptm_metadata["ngc_path"]}; Additional info: {ptm_metadata["additional_id_info"]}')

In [None]:
# Assigning pretrained models to different object detection models versions
# From the output of previous cell make the appropriate changes to this map if you want to change the default PTM backbone.
# Changing the default backbone here requires changing default spec/config during train/eval etc like for example
# If you are changing the ptm to resnet34, then you have to modify the config key num_layers if it exists to 34 manually
pretrained_map = {"detectnet_v2" : "detectnet_v2:resnet18",
                  "deformable_detr": "pretrained_deformable_detr_nvimagenet:resnet50",
                  "dino": "pretrained_dino_nvimagenet:resnet50",
                  "dssd" : "pretrained_object_detection:resnet18",
                  "efficientdet_tf1" : "pretrained_efficientdet:efficientnet_b0",
                  "efficientdet_tf2" : "pretrained_efficientdet_tf2:efficientnet_b0",
                  "faster_rcnn" : "pretrained_object_detection:resnet18",
                  "retinanet" : "pretrained_object_detection:resnet18",
                  "ssd" : "pretrained_object_detection:resnet18",
                  "yolo_v3" : "pretrained_object_detection:resnet18",
                  "yolo_v4" : "pretrained_object_detection:resnet18",
                  "yolo_v4_tiny": "pretrained_object_detection:cspdarknet_tiny"}
no_ptm_models = set([])

In [None]:
if network_arch not in no_ptm_models:
    pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')

    ptm = []
    for ptm_metadata_path in glob.glob(pattern):
      with open(ptm_metadata_path, 'r') as metadata_file:
        ptm_metadata = json.load(metadata_file)
        ngc_path = ptm_metadata.get("ngc_path")
        metadata_network_arch = ptm_metadata.get("network_arch")
        if metadata_network_arch == network_arch and ngc_path.endswith(pretrained_map[network_arch]):
          ptm = [ptm_metadata["id"]]
          break

    print(ptm)

### Customize model metadata <a class="anchor" id="head-10"></a>

In [None]:
metadata_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'metadata.json')

with open(metadata_path , "r") as metadata_file:
    metadata = json.load(metadata_file)

metadata["train_datasets"] = [train_dataset_id]
metadata["eval_dataset"] = eval_dataset_id
metadata["inference_dataset"] = infer_dataset_id
metadata["calibration_dataset"] = train_dataset_id

if network_arch not in no_ptm_models:
    metadata["ptm"] = ptm

with open(metadata_path, "w") as metadata_file:
    json.dump(metadata, metadata_file, indent=2)

print(json.dumps(metadata, indent=2))

### View hyperparameters that are enabled for AutoML by default <a class="anchor" id="head-11"></a>

In [None]:
if automl_enabled:
    # View default automl specs enabled
    ! tao-client {model_name} model-automl-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/automl_defaults.json

### Set AutoML related configurations <a class="anchor" id="head-12"></a>
Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters: 

[DetectNet_V2](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/detectnet_v2/detectnet_v2%20-%20train.csv), 
[Deformable Detr](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/deformable_detr/deformable_detr%20-%20train.csv), 
[DINO](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/dino/dino%20-%20train.csv), 
[DSSD](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/dssd/dsssd%20-%20train.csv), 
[EfficientDet TF1](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/efficientdet_tf1/efficientdet_tf1%20-%20train.csv), 
[EfficientDet TF2](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/efficientdet_tf2/efficientdet_tf2%20-%20train.csv), 
[FasterRCNN](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/faster_rcnn/faster_rcnn%20-%20train.csv), 
[RetinaNet](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/retinanet/retinanet%20-%20train.csv), 
[SSD](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/ssd/ssd%20-%20train.csv), 
[YOLO_V3](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/yolo_v3/yolo_v3%20-%20train.csv), 
[YOLO_V4](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/yolo_v4/yolo_v4%20-%20train.csv), 
[YOLO_V4_Tiny](https://github.com/NVIDIA/tao_front_end_services/tree/main/api/specs_utils/specs/yolo_v4_tiny/yolo_v4_tiny%20-%20train.csv)

In [None]:
if automl_enabled:
    # Choose automl algorithm between "Bayesian" and "HyperBand".
    automl_algorithm="Bayesian" # FIXME8 example: Bayesian/HyperBand

    # for Bayesian; don't change, more metrics will be supported in the future; for Hyperband, only loss is monitored, so this doesn't matter
    metric = "kpi"

    additional_automl_parameters = [] #Refer to parameter list mentioned in the above links and add any extra parameter in addition to the default enabled ones
    remove_default_automl_parameters = [] #Remove any hyperparameters that are enabled by default for AutoML

    metadata["automl_algorithm"] = automl_algorithm
    metadata["automl_enabled"] = automl_enabled
    metadata["metric"] = metric
    metadata["epoch_multiplier"] = 1 # Will be considered for Hyperband only
    metadata["automl_add_hyperparameters"] = str(additional_automl_parameters)
    metadata["automl_remove_hyperparameters"] = str(remove_default_automl_parameters)

    with open(metadata_path, "w") as metadata_file:
        json.dump(metadata, metadata_file, indent=2)

    print(json.dumps(metadata, indent=2))

### Provide train specs <a class="anchor" id="head-13"></a>

In [None]:
# Default train model specs
! tao-client {model_name} model-train-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/train.json

In [None]:
# Customize train model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'train.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Apply changes for any of the parameters listed in the previous cell as required
if model_name in ("deformable-detr", "dino"):
    specs["dataset"]["num_classes"] = int(num_classes) + 1
    specs["train"]["num_epochs"] = 10
    specs["train"]["num_gpus"] = 1
    
elif model_name == "efficientdet-tf1":
    specs["training_config"]["num_epochs"] = 10
    specs["training_config"]["train_batch_size"] = 2
    specs["training_config"]["num_examples_per_epoch"] = 1414 #number of images in your dataset/number of gpu's
    specs["dataset_config"]["num_classes"] = int(num_classes) #num_classes was computed during kitti_to_coco_conversion
    specs["eval_config"]["eval_epoch_cycle"] = 10
    specs["gpus"] = 1

elif model_name == "efficientdet-tf2":
    specs["train"]["num_epochs"] = 10
    specs["gpus"] = 1
    specs["train"]["batch_size"] = 4
    specs["train"]["num_examples_per_epoch"] = 1414 #number of images in your dataset/number of gpu's
    specs["dataset"]["num_classes"] = int(num_classes) #num_classes was computed during kitti_to_coco_conversion

else:
    # Example for detectnet_v2 (for each network the parameter key might be different)
    specs["training_config"]["num_epochs"] = 10 # num_epochs is the parameter name for all object detection networks
    specs["gpus"] = 1

if "dataset_config" in specs.keys() and "image_extension" in specs["dataset_config"].keys():
    specs["dataset_config"]["image_extension"] = "jpg"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run train <a class="anchor" id="head-14"></a>

In [None]:
train_job_id = subprocess.getoutput(f"tao-client {model_name} model-train --id " + model_id)
print(train_job_id)

In [None]:
# Monitor job status
if automl_enabled:    
    # Set poll_automl_stats to True if just want to see what's the time left, how many epochs are remaining etc.
    # Set poll_automl_stats to False if you want to skip stats and see the training logs instead. Training logs viewing are supported only for Bayesian

    # For automl: Training times for different models benchmarked on 1 GPU V100 machine can be found here: https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#results-of-automl-experiments
    
    poll_automl_stats = True
    if poll_automl_stats:
        import time
        from IPython.display import clear_output
        stats_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, "automl_metadata.json")
        controller_json_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, "controller.json")
        while True:
            time.sleep(15)
            clear_output(wait=True)
            if os.path.exists(stats_path):
                try:
                    with open(stats_path , "r") as stats_file:
                        stats_dict = json.load(stats_file)
                    print(json.dumps(stats_dict, indent=2))
                    if float(stats_dict.get("Number of epochs yet to start",-1)) == 0.0 or float(stats_dict.get("Number of iters yet to start",-1)) == 0.0:
                        break
                except (json.JSONDecodeError):
                    print("Stats computed are being written to file. Stats will be visible on screen in a few seconds")
    else:
        # Print the log file - supported only for bayesian (the file won't exist until the backend Toolkit container is running -- can take several minutes)
        if automl_algorithm == "Bayesian":
            logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id)
            max_recommendations = metadata.get("automl_max_recommendations",20)
            for experiment_num in range(max_recommendations):
                log_file = f"{train_job_id}/experiment_{experiment_num}/log.txt"
                while True:
                    if os.path.exists(os.path.join(logs_dir, log_file)):
                        break
                print(f"\n\nViewing experiment {experiment_num}\n\n")
                my_tail(logs_dir, log_file)
    
else:
    # Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
    logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'logs')
    log_file = f"{train_job_id}.txt"

    my_tail(logs_dir, log_file)

In [None]:
## To Stop an AutoML JOB
#    1. Stop the 'Monitor job status' cell (the cell right before this cell) manually
#    2. Uncomment the snippet in the next cell and run the cell

In [None]:
# if automl_enabled:
    # canceled_job_id = subprocess.getoutput(f"tao-client {model_name} model-job-cancel --id {model_id} --job {train_job_id}")
#     print(canceled_job_id)

In [None]:
## Resume AutoML

In [None]:
# Uncomment the below snippet if you want to resume an already stopped AutoML job and then run the 'Monitor job status' cell above (4th cell above from this cell)
# if automl_enabled:
#     resumed_job_id = subprocess.getoutput(f"tao-client {model_name} model-job-resume --id {model_id} --job {train_job_id}")
#     print(resumed_job_id)

### Viewing checkpoint files <a class="anchor" id="head-15"></a>

In [None]:
# View the checkpoints generated for the training job and for automl jobs, in addition view: best performing model's config and the results of all automl experiments

job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{train_job_id}"
model_path = job_dir

if automl_enabled:
    !python3 -m pip install pandas==1.5.1
    import pandas as pd
    import glob
    model_path =  f"{job_dir}/best_model"

from IPython.display import clear_output

while True:
    clear_output(wait=True)
    if os.path.exists(model_path) and len(os.listdir(model_path)) > 0:
        #List the binary model file
        print("\nCheckpoints for the training experiment")
        if os.path.exists(model_path+"/train/weights") and len(os.listdir(model_path+"/train/weights")) > 0:
            print(f"Folder: {model_path}/train/weights")
            print("Files:", os.listdir(model_path+"/train/weights"))
        elif os.path.exists(model_path+"/weights") and len(os.listdir(model_path+"/weights")) > 0:
            print(f"Folder: {model_path}/weights")
            print("Files:", os.listdir(model_path+"/weights"))
        else:
            print(f"Folder: {model_path}")
            print("Files:", os.listdir(model_path))

        if automl_enabled:
            if os.path.exists(f"{model_path}/controller.json") and (len(glob.glob(os.path.join(model_path,"*.protobuf"))) > 0 or len(glob.glob(os.path.join(model_path,"*.yaml"))) > 0):
                experiment_artifacts = json.load(open(f"{model_path}/controller.json","r"))
                data_frame = pd.DataFrame(experiment_artifacts)
                # Print experiment id/number and the corresponding result
                print("\nResults of all experiments")
                with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):
                    print(data_frame[["id","result"]])
                break
        else:
            break

### Provide evaluate specs <a class="anchor" id="head-16"></a>

In [None]:
# Default evaluate model specs
! tao-client {model_name} model-evaluate-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/evaluate.json

In [None]:
# Customize evaluate model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'evaluate.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

if model_name in ("deformable-detr","dino"):
    specs["dataset"]["num_classes"] = int(num_classes) + 1

elif model_name == "efficientdet-tf1":
    specs["dataset_config"]["num_classes"] = int(num_classes) #num_classes was computed during kitti_to_coco_conversion

elif model_name == "efficientdet-tf2":
    specs["evaluate"]["num_samples"] = 353 #number of images in your dataset
    specs["dataset"]["num_classes"] = int(num_classes) #num_classes was computed during kitti_to_coco_conversion

if "dataset_config" in specs.keys() and "image_extension" in specs["dataset_config"].keys():
    specs["dataset_config"]["image_extension"] = "jpg"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run evaluate <a class="anchor" id="head-17"></a>

In [None]:
eval_job_id = subprocess.getoutput(f"tao-client {model_name} model-evaluate --id {model_id} --job {train_job_id}")
print(eval_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{eval_job_id}.txt"
logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'logs')
my_tail(logs_dir, log_file)

### Provide prune specs <a class="anchor" id="head-18"></a>

In [None]:
# Default prune model specs
if model_name not in ("deformable-detr","dino"):
    ! tao-client {model_name} model-prune-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/prune.json

### Run prune <a class="anchor" id="head-19"></a>

In [None]:
if model_name not in ("deformable-detr","dino"):
    prune_job_id = subprocess.getoutput(f"tao-client {model_name} model-prune --id {model_id} --job {train_job_id}")
    print(prune_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name not in ("deformable-detr","dino"):
    log_file = f"{prune_job_id}.txt"
    my_tail(logs_dir, log_file)

### Provide retrain specs <a class="anchor" id="head-20"></a>

In [None]:
# Default retrain model specs
if model_name not in ("deformable-detr","dino"):
    ! tao-client {model_name} model-retrain-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/retrain.json

In [None]:
# Customize retrain model specs
if model_name not in ("deformable-detr","dino"):
    specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'retrain.json')

    with open(specs_path , "r") as specs_file:
        specs = json.load(specs_file)

    # for efficientdet_tf1
    if model_name == "efficientdet-tf1":
        specs["training_config"]["num_epochs"] = 10
        specs["gpus"] = 1
        specs["training_config"]["train_batch_size"] = 8
        specs["training_config"]["num_examples_per_epoch"] = 1414 #number of images in your dataset/number of gpu's
        specs["dataset_config"]["num_classes"] = int(num_classes) #num_classes was computed during kitti_to_coco_conversion
        specs["eval_config"]["eval_epoch_cycle"] = 10

    # for efficientdet_tf2
    elif model_name == "efficientdet-tf2":
        specs["train"]["num_epochs"] = 10
        specs["gpus"] = 1
        specs["train"]["batch_size"] = 4
        specs["train"]["num_examples_per_epoch"] = 1414 #number of images in your dataset/number of gpu's
        specs["dataset"]["num_classes"] = int(num_classes) #num_classes was computed during kitti_to_coco_conversion

    # Example for (for each network the parameter key might be different)
    else:
        specs["training_config"]["num_epochs"] = 10 # num_epochs is the parameter name for all object detection networks
        specs["gpus"] = 1

    if "dataset_config" in specs.keys() and "image_extension" in specs["dataset_config"].keys():
        specs["dataset_config"]["image_extension"] = "jpg"

    with open(specs_path, "w") as specs_file:
        json.dump(specs, specs_file, indent=2)

    print(json.dumps(specs, indent=2))

### Run retrain <a class="anchor" id="head-21"></a>

In [None]:
if model_name not in ("deformable-detr","dino"):
    retrain_job_id = subprocess.getoutput(f"tao-client {model_name} model-retrain --id {model_id} --job {prune_job_id}")
    print(retrain_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name not in ("deformable-detr","dino"):
    log_file = f"{retrain_job_id}.txt"
    my_tail(logs_dir, log_file)

### Run evaluate on retrained model <a class="anchor" id="head-21-1"></a>

In [None]:
if model_name not in ("deformable-detr","dino"):
    eval2_job_id = subprocess.getoutput(f"tao-client {model_name} model-evaluate --id {model_id} --job {retrain_job_id}")
    print(eval2_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
if model_name not in ("deformable-detr","dino"):
    log_file = f"{eval2_job_id}.txt"
    my_tail(logs_dir, log_file)

### Provide export specs <a class="anchor" id="head-22"></a>

In [None]:
# Default export model specs
! tao-client {model_name} model-export-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/export.json

In [None]:
# Customize export model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'export.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

if model_name == "efficientdet-tf2":
    specs["dataset"]["num_classes"] = int(num_classes)
elif model_name in ("deformable-detr","dino"):
    specs["dataset"]["num_classes"] = int(num_classes) + 1

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run export <a class="anchor" id="head-23"></a>

In [None]:
export_job_id = subprocess.getoutput(f"tao-client {model_name} model-export --id {model_id} --job {train_job_id}")
print(export_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{export_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide trt engine generation specs <a class="anchor" id="head-26"></a>

In [None]:
# Default gen_trt_engine model specs
! tao-client {model_name} model-gen-trt-engine-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/gen_trt_engine.json

In [None]:
# Customize gen_trt_engine model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'gen_trt_engine.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

if model_name == "efficientdet-tf2":
    specs["gen_trt_engine"]["tensorrt"]["data_type"] = "int8"
    specs["dataset"]["num_classes"] = int(num_classes)
elif model_name in ("deformable-detr", "dino"):
    specs["gen_trt_engine"]["tensorrt"]["data_type"] = "int8"
    specs["dataset"]["num_classes"] = int(num_classes) + 1
else:
    specs["data_type"] = "int8"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TRT Engine generation using TAO-Deploy <a class="anchor" id="head-27"></a>

In [None]:
gen_trt_engine_job_id = subprocess.getoutput(f"tao-client {model_name} model-gen-trt-engine --id {model_id} --job {export_job_id}")
print(gen_trt_engine_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{gen_trt_engine_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide TAO inference specs <a class="anchor" id="head-28"></a>

In [None]:
# Default inference model specs
! tao-client {model_name} model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

#Apply changes to the specs dictionary here if required
if model_name == "efficientdet-tf1":
    specs["dataset_config"]["num_classes"] = int(num_classes)
elif model_name == "efficientdet-tf2":
    specs["dataset"]["num_classes"] = int(num_classes)
elif model_name in ("deformable-detr","dino"):
    specs["dataset"]["num_classes"] = int(num_classes) + 1 

if "dataset_config" in specs.keys() and "image_extension" in specs["dataset_config"].keys():
    specs["dataset_config"]["image_extension"] = "jpg"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TAO inference <a class="anchor" id="head-29"></a>

In [None]:
tlt_inference_job_id = subprocess.getoutput(f"tao-client {model_name} model-inference --id {model_id} --job {train_job_id}")
print(tlt_inference_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{tlt_inference_job_id}.txt"
my_tail(logs_dir, log_file)

In [None]:
from IPython.display import Image
import glob
job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{tlt_inference_job_id}"
sample_image = glob.glob(f"{job_dir}/**/*.jpg", recursive=True)[6]
Image(filename=sample_image) 

### Provide TRT inference specs <a class="anchor" id="head-30"></a>

In [None]:
# Default inference model specs
! tao-client {model_name} model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

#Apply changes to the specs dictionary here if required
if model_name == "efficientdet-tf1":
    specs["dataset_config"]["num_classes"] = int(num_classes)
elif model_name == "efficientdet-tf2":
    specs["dataset"]["num_classes"] = int(num_classes)
elif model_name in ("deformable-detr","dino"):
    specs["dataset"]["num_classes"] = int(num_classes) + 1 

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TRT inference <a class="anchor" id="head-31"></a>

In [None]:
trt_inference_job_id = subprocess.getoutput(f"tao-client {model_name} model-inference --id {model_id} --job {gen_trt_engine_job_id}")
print(trt_inference_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{trt_inference_job_id}.txt"
my_tail(logs_dir, log_file)

In [None]:
from IPython.display import Image
import glob
job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{trt_inference_job_id}"
sample_image = glob.glob(f"{job_dir}/**/*.jpg", recursive=True)[6]
Image(filename=sample_image) 

### Delete experiment <a class="anchor" id="head-32"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/models/{model_id}
! echo DONE

### Delete datasets <a class="anchor" id="head-33"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}
! echo DONE

### Unmount shared volume <a class="anchor" id="head-34"></a>

In [None]:
command = "umount ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE

### Uninstall TAO Remote Client <a class="anchor" id="head-35"></a>

In [None]:
! pip3 uninstall -y nvidia-tao-client