### Notebook to demonstrate TAO-Remote Client AutoML workflow for Object Detection using Yolo-v4

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)



### Learning Objective

This AutoML notebook applies to identifying the optimal hyperparameters (e.g., learning rate, batch size, weight regularizer, number of layers, etc.) for Yolo-v4 (default model is Yolo-v4, a list of other supported models can be found in the subsequent cells) in order to obtain better accuracy results or converge faster on AI models for object detection application.
- Take a pretrained model and choose automl algorithm/parameters to start AutoML train.
- At the end of an AutoML run, you will receive a config file that specifies the best performing model, along with the binary model file to deploy it to your application.


### The workflow in a nutshell

- Set AutoML algorithm configurations
  - Add/Remove AutoML parameters
- Override train config defaults
- Run AutoML


### Table of contents

1. [Create a model experiment ](#head-8)
1. [Find pretrained model](#head-9)
1. [Set AutoML related configurations](#head-10)
1. [Provide train specs](#head-11)
1. [Run AutoML train](#head-12)
1. [Get the best model from AutoML](#head-13)
1. [Delete experiment](#head-14)
1. [Delete datasets](#head-15)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import os
import glob
import subprocess
import getpass
import uuid
import json

In [None]:
namespace = subprocess.getoutput("echo $(helm list -A | grep tao-toolkit-api) | cut -d' ' -f2")
namespace

In [None]:
# Restore variables set in yolo_training.ipynb

with open("variables_to_store.json" , "r") as specs_file:
    variables_to_store = json.load(specs_file)

namespace = variables_to_store["namespace"]
node_addr = variables_to_store["node_addr"]
node_port = variables_to_store["node_port"]
home = variables_to_store["home"]
os.environ['USER'] = variables_to_store["USER"]
os.environ['TOKEN'] = variables_to_store["TOKEN"]
train_dataset_id = variables_to_store["train_dataset_id"]
eval_dataset_id = variables_to_store["eval_dataset_id"]

%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1

In [None]:
# Available models :
# 1. detectnet-v2 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/detectnet_v2.html
# 2. faster-rcnn - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/fasterrcnn.html
# 3. yolo-v3 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v3.html
# 4. yolo-v4 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html
# 5. yolo-v4-tiny - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4_tiny.html

# There are 3 other models supported for AutoML but not supported in this notebook - EfficientDet, SSD, RetinaNet
# To run AutoML on one of these 3 models, use the notebook at wget --content-disposition 'https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/4.0.0/files/notebooks/tao_api_starter_kit/api/automl/object_detection.ipynb'

model_name = "yolo-v4" # You can switch the model_name to one of the 5 models listed above

### Create the datasets <a class="anchor" id="head-4"></a>

We will be using NVIDIA's synthetic dataset on warehouse images based on the `kitti object detection dataset` format in this example. To find more details about kitti, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d).

**If using custom dataset; it should follow this dataset structure**
```
DATA_DIR/train
├── images/
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   ├── ...
└── labels
    ├── image_name_1.txt
    ├── image_name_2.txt
    ├── ...

DATA_DIR/val
├── images
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   ├── ...
└── labels
    ├── image_name_1.txt
    ├── image_name_2.txt
    ├── ...
```
The file name should be same for images and labels folders

### Create a model experiment <a class="anchor" id="head-8"></a>

In [None]:
network_arch = model_name.replace("-","_")
model_id = subprocess.getoutput(f"tao-client {model_name} model-create --network_arch {network_arch} --encryption_key tlt_encode ")
print(model_id)

### Assign train, eval datasets 

In [None]:
metadata_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'metadata.json')

with open(metadata_path , "r") as metadata_file:
    metadata = json.load(metadata_file)

metadata["train_datasets"] = [train_dataset_id]
metadata["eval_dataset"] = eval_dataset_id

### Find pretrained model <a class="anchor" id="head-9"></a>

In [None]:
# List all pretrained models for the chosen network architecture
pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')

for ptm_metadata_path in glob.glob(pattern):
  with open(ptm_metadata_path, 'r') as metadata_file:
    ptm_metadata = json.load(metadata_file)
    metadata_network_arch = ptm_metadata.get("network_arch")
    if metadata_network_arch == network_arch:
      if "encryption_key" not in ptm_metadata.keys():
        print(f'PTM Name: {ptm_metadata["name"]}; PTM version: {ptm_metadata["version"]}; NGC PATH: {ptm_metadata["ngc_path"]}; Additional info: {ptm_metadata["additional_id_info"]}')

In [None]:
# Out of the available pretrained models listed in the OUTPUT of the previous cell you can choose a variant you wish
# By default the PTM has been chosen as "pretrained_object_detection:resnet18" and the default train, evaluate spec files have parameters associated with resnet18
# If you are changing the PTM to say pretrained_object_detection:resnet34 in the pretrained map variable below, 
    # then you have to change the associated spec parameters in the train spec in the "Provide train spec section"
# For example you may need to change the parameter num_layers to 34 for pretrained_object_detection:resnet34
# For more explanation into the spec dependency, view the documentation at https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html

In [None]:
# Assigning pretrained models to different yolo versions
# you are changing the number of layers to 34, then you have to make the appropriate change in the pretrained model name
pretrained_map = {"detectnet_v2" : "detectnet_v2:resnet18",
                  "faster_rcnn" : "pretrained_object_detection:resnet18",
                  "yolo_v3" : "pretrained_object_detection:resnet18",
                  "yolo_v4" : "pretrained_object_detection:resnet18",
                  "yolo_v4_tiny": "pretrained_object_detection:cspdarknet_tiny"}

In [None]:
pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')

ptm_id = None
for ptm_metadata_path in glob.glob(pattern):
  with open(ptm_metadata_path, 'r') as metadata_file:
    ptm_metadata = json.load(metadata_file)
    ngc_path = ptm_metadata.get("ngc_path")
    metadata_network_arch = ptm_metadata.get("network_arch")
    if metadata_network_arch == network_arch and ngc_path.endswith(pretrained_map[network_arch]):
      ptm_id = ptm_metadata["id"]
      break

metadata["ptm"] = [ptm_id]
print(ptm_id)

### View hyperparameters that are enabled for AutoML by default

In [None]:
# View default automl specs enabled
! tao-client {model_name} model-automl-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/automl_defaults.json

### Set AutoML related configurations <a class="anchor" id="head-10"></a>
Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters: 

[DetectNet_V2](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id6), 
[FasterRCNN](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id24), 
[YOLO_V3](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id85), 
[YOLO_V4](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id92), 
[YOLO_V4_Tiny](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id94)

View detailed info on TAO AutoML in the [TAO docs](https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html) and [TAO AutoML blog](https://developer.nvidia.com/blog/training-like-an-ai-pro-using-tao-automl/)

In [None]:
# Choose automl algorithm between "Bayesian" and "HyperBand".
automl_algorithm="Bayesian" # valid options: Bayesian/HyperBand

#Don't change this, in future multiple metrics will be supported
metric = "map"

additional_automl_parameters = [] #Refer to parameter list mentioned in the above links and add any extra parameter in addition to the default enabled ones
remove_default_automl_parameters = [] #Remove any hyperparameters that are enabled by default for AutoML

metadata["automl_max_recommendations"] = 10
metadata["automl_algorithm"] = automl_algorithm
metadata["automl_enabled"] = True
metadata["metric"] = metric
metadata["automl_add_hyperparameters"] = str(additional_automl_parameters)
metadata["automl_remove_hyperparameters"] = str(remove_default_automl_parameters)

with open(metadata_path, "w") as metadata_file:
    json.dump(metadata, metadata_file, indent=2)

print(json.dumps(metadata, indent=2))

### Provide train specs <a class="anchor" id="head-11"></a>

In [None]:
# Default train model specs
! tao-client {model_name} model-train-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/train.json

In [None]:
# Customize train model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'train.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Apply changes for any of the parameters listed in the previous cell as required
# Example for yolo_v4 (for each network the parameter key might be different)
# Make any changes to specs param in the dictionary here
# For example :
specs["augmentation_config"]["output_width"] = 960 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["augmentation_config"]["output_height"] = 544 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff

if "image_extension" in specs["dataset_config"].keys():
    specs["dataset_config"]["image_extension"] = "jpg"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run AutoML train <a class="anchor" id="head-12"></a>
AutoML run for YoloV4 takes ~18.5 hours to complete. The live ETA can be viewed in poll status cell

For the default specs of the model, AutoML will yeild a model with mAP of around ~70% when compared to the ~64% baseline without AutoML 

In [None]:
train_job_id = subprocess.getoutput(f"tao-client {model_name} model-train --id " + model_id)
print(train_job_id)

In [None]:
# Set poll_automl_stats to True if just want to see what's the time left, how many epochs are remaining etc.
# Set poll_automl_stats to False if you want to skip stats and see the training logs instead. Training logs viewing are supported only for Bayesian

poll_automl_stats = True
if poll_automl_stats:
    import time
    from IPython.display import clear_output
    stats_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, "automl_metadata.json")
    controller_json_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, "controller.json")
    while True:
        time.sleep(15)
        clear_output(wait=True)
        if os.path.exists(stats_path):
            try:
                with open(stats_path , "r") as stats_file:
                    stats_dict = json.load(stats_file)
                print(json.dumps(stats_dict, indent=2))
                if float(stats_dict["Number of epochs yet to start"]) == 0.0:
                    break
            except (json.JSONDecodeError):
                print("Stats computed are being written to file. Stats will be visible on screen in a few seconds")
else:
    # Print the log file - supported only for bayesian (the file won't exist until the backend Toolkit container is running -- can take several minutes)
    if automl_algorithm == "Bayesian":
        logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id)
        max_recommendations = metadata.get("automl_max_recommendations",20)
        for experiment_num in range(max_recommendations):
            log_file = f"{train_job_id}/experiment_{experiment_num}/log.txt"
            while True:
                if os.path.exists(os.path.join(logs_dir, log_file)):
                    break
            print(f"\n\nViewing experiment {experiment_num}\n\n")
            my_tail(logs_dir, log_file)

### Get the best model from AutoML <a class="anchor" id="head-13"></a>

In [None]:
# The config and the weights of the best configuration are present at best_model folder
# Takes a few seconds to copy the original automl experiment to best_model folder

# Training times for different models benchmarked on 1 GPU V100 machine can be found here: https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#results-of-automl-experiments

!python3 -m pip install pandas
import pandas as pd

automl_job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{train_job_id}"
best_model_path =  f"{automl_job_dir}/best_model"

while True:
    if os.path.exists(best_model_path) and len(os.listdir(best_model_path)) > 0 and os.path.exists(f"{best_model_path}/controller.json"):
        #List the binary model file
        print("\nCheckpoints for the best performing experiment")
        if os.path.exists(best_model_path+"/weights") and len(os.listdir(best_model_path+"/weights")) > 0:
            print(f"Folder: {best_model_path}/weights")
            print("Files:", os.listdir(best_model_path+"/weights"))
        else:
            print(f"Folder: {best_model_path}")
            print("Files:", os.listdir(best_model_path))

        experiment_artifacts = json.load(open(f"{best_model_path}/controller.json","r"))
        data_frame = pd.DataFrame(experiment_artifacts)
        # Print experiment id/number and the corresponding result
        print("\nResults of all experiments")
        with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):
            print(data_frame[["id","result"]])

        print("\nConfig/Spec file for the best performing experiment (recommendation_id.kitti with the maximum result value in the dataframe)")
        # List the recommendation config file of the best performing checkpoint(recommendation_id.kitti with the maximum result value in the dataframe)
        !ls {best_model_path}/*.kitti 
            
        break

### Delete experiment (Optional) <a class="anchor" id="head-14"></a>

In [None]:
# ! rm -rf ~/shared/users/{os.environ['USER']}/models/{model_id}
# ! echo DONE

### Delete datasets (Optional) <a class="anchor" id="head-15"></a>

In [None]:
# ! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}
# ! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}
# ! echo DONE

At this stage, we are at the end of the Launchpad workflow.

We started off by a training a yolo-v4 model, then optimizing it for better inference performance,

And finally we ran an AutoML experiment on yolo-v4 where we could see an improvement of around 5% <br>in accuracy metrics when compared to the baseline model we trained initially

You can try several other object detection models, or other domains like classification, segmentation <br>
or even purpose built models like License Plate recognition on your machine with the TAO getting started guide from [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/tao-getting-started/files)