### TAO remote client - Optimizing YOLO

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)


### The workflow in a nutshell

- Model Actions
    - Prune, retrain
    - Export
    - Convert
    - Inference on TRT

### Table of contents

1. [Provide TAO inference specs](#head-0)
1. [Run TAO inference](#head-00)
1. [Provide FP32 export specs](#head-1)
1. [Run FP32 export](#head-2)
1. [Provide model convert specs](#head-3)
1. [Run model convert](#head-4)
1. [Provide TRT inference specs](#head-5)
1. [Run TRT inference](#head-6)
1. [Provide prune specs](#head-7)
1. [Run prune](#head-8)
1. [Provide retrain specs](#head-9)
1. [Run retrain](#head-10)
1. [Provide evaluate specs](#head-11)
1. [Run evaluate on retrain](#head-12)
1. [Provide FP16 export specs](#head-13)
1. [Run FP16 export](#head-14)
1. [Provide model convert specs](#head-15)
1. [Run model convert](#head-16)
1. [Provide TRT inference specs](#head-17)
1. [Run TRT inference](#head-18)
1. [Delete experiment](#head-19)
1. [Delete datasets](#head-20)
1. [Unmount shared volume](#head-21)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import os
import glob
import subprocess
import getpass
import uuid
import json
import time

In [None]:
# Restore variables set in yolo_training.ipynb

with open("variables_to_store.json" , "r") as specs_file:
    variables_to_store = json.load(specs_file)

namespace = variables_to_store["namespace"]
model_id = variables_to_store["model_id"]
train_job_id = variables_to_store["train_job_id"]
node_addr = variables_to_store["node_addr"]
node_port = variables_to_store["node_port"]
home = variables_to_store["home"]
os.environ['USER'] = variables_to_store["USER"]
os.environ['TOKEN'] = variables_to_store["TOKEN"]

%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1

In [None]:
def my_tail(logs_dir, log_file):
    %env LOG_FILE={logs_dir}/{log_file}
    ! mkdir -p {logs_dir}
    ! [ ! -f "$LOG_FILE" ] && touch $LOG_FILE && chmod 666 $LOG_FILE
    ! tail -f -n +1 $LOG_FILE | while read LINE; do echo "$LINE"; [[ "$LINE" == "EOF" ]] && pkill -P $$ tail; done

In [None]:
logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'logs')

### Provide TAO inference specs <a class="anchor" id="head-0"></a>

In [None]:
# Default inference model specs
! tao-client yolo-v4 model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["augmentation_config"]["output_width"] = 960 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["augmentation_config"]["output_height"] = 544 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TAO inference <a class="anchor" id="head-00"></a>

In [None]:
tao_inference_job_id = subprocess.getoutput(f"tao-client yolo-v4 model-inference --id {model_id} --job {train_job_id}")
print(tao_inference_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{tao_inference_job_id}.txt"
start_time = time.time()
my_tail(logs_dir, log_file)
tao_inference_time = time.time() - start_time

In [None]:
job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{tao_inference_job_id}"
from IPython.display import Image
Image(filename=f"{job_dir}/images_annotated/001354.jpg")

### Provide FP32 export specs <a class="anchor" id="head-1"></a>

In [None]:
# Default export model specs
! tao-client yolo-v4 model-export-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/export.json

In [None]:
# Customize export model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'export.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["data_type"] = "fp32"
specs["batches"] = "10"
specs["batch_size"] = "16"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run FP32 export <a class="anchor" id="head-2"></a>
* After training is completed, we need to create a onnx file which is done by export action
* This is the intermediate step between training and creating a TRT engine file.
* Export action modifies the original tlt model to a format which TRT engine file generation module expects
* We can export in 3 different formats: FP32,FP16,INT8
The export will take approximately 6 minutes

In [None]:
fp32_export_job_id = subprocess.getoutput(f"tao-client yolo-v4 model-export --id {model_id} --job {train_job_id}")
print(fp32_export_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{fp32_export_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide model convert specs <a class="anchor" id="head-3"></a>

In [None]:
# Default convert model specs
! tao-client yolo-v4 model-convert-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/convert.json

In [None]:
# Customize convert model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'convert.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["t"] = "fp32"
specs["b"] = 8
specs["p"] = "Input,1x3x544x960,8x3x544x960,16x3x544x960"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run model convert <a class="anchor" id="head-4"></a>
Model convert action creates TRT engine file from the onnx file. This action will take approximately 6 minutes.

In [None]:
convert_job_id = subprocess.getoutput(f"tao-client yolo-v4 model-convert --id {model_id} --job {fp32_export_job_id}")
print(convert_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{convert_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide TRT inference specs <a class="anchor" id="head-5"></a>

In [None]:
# Default inference model specs
! tao-client yolo-v4 model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["augmentation_config"]["output_width"] = 960 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["augmentation_config"]["output_height"] = 544 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TRT inference <a class="anchor" id="head-6"></a>

In [None]:
trt_inference_job_id = subprocess.getoutput(f"tao-client yolo-v4 model-inference --id {model_id} --job {convert_job_id}")
print(trt_inference_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{trt_inference_job_id}.txt"
start_time = time.time()
my_tail(logs_dir, log_file)
trt_inference_time = time.time() - start_time

In [None]:
print("Time in seconds for inference on TAO model is ", tao_inference_time)
print("Time in seconds for inference on FP32 TRT model is ", trt_inference_time)
# The number shown is the total time which includes model loading + Image I/O + Inference time + post processing time
# Full breakdown of each task given below. 

**Performance breakdown**

**TAO model inference: <br>**
* Load model: **~13.5s** <br>
* Image I/O: **~9s** <br>
* Inference Time: **~9.2s** <br>
* Post Process Time: **~5.9s** <br>

**TensorRT model inference: <br>**
* Load model: **~1.3s** <br>
* Image I/O: **~9s** <br>
* Inference Time: **~3.4s** <br>
* Post Process Time: **~5.9s** <br>


In [None]:
job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{trt_inference_job_id}"
from IPython.display import Image
Image(filename=f"{job_dir}/images_annotated/001354.jpg")

### Provide prune specs <a class="anchor" id="head-7"></a>
To control the pruned model size, the user can change the following parameters of prune action:
1. pruning_threshold - The threshold to compare a normalized norm against (default: 0.1)
1. pruning_granularity - The number of filters to remove at a time (default: 8)
1. min_num_filters - The minimum number of filters to keep per layer (default: 16)

In [None]:
# Default prune model specs
! tao-client yolo-v4 model-prune-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/prune.json

In [None]:
# Customize prune model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'prune.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["pruning_threshold"] = 0.8

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run prune <a class="anchor" id="head-8"></a>
This job will take approximately 7 minutes

In [None]:
prune_job_id = subprocess.getoutput(f"tao-client yolo-v4 model-prune --id {model_id} --job {train_job_id}")
print(prune_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{prune_job_id}.txt"
my_tail(logs_dir, log_file)

In [None]:
prune_job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{prune_job_id}"
pruned_model_size = json.loads(subprocess.getoutput(f'stat -c "%s" {prune_job_dir}/model.tlt'))

train_job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{train_job_id}"
original_train_model_size = json.loads(subprocess.getoutput(f'stat -c "%s" {train_job_dir}/weights/yolov4_resnet18_epoch_010.tlt'))

print(f"The original trained model size is {original_train_model_size} KB")
print(f"The pruned model size is {pruned_model_size} KB")
print(f"The pruned model is {round(original_train_model_size/pruned_model_size,1)}x smaller than the original model")

### Provide retrain specs <a class="anchor" id="head-9"></a>

In [None]:
# Default retrain model specs
! tao-client yolo-v4 model-retrain-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/retrain.json

In [None]:
# Customize retrain model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'retrain.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["training_config"]["num_epochs"] = 150
specs["augmentation_config"]["output_width"] = 960 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["augmentation_config"]["output_height"] = 544 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["dataset_config"]["image_extension"] = "jpg" # Setting to the dataset's image_file extension type

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run retrain <a class="anchor" id="head-10"></a>
Model needs to be re-trained to bring back accuracy after pruning. Re-training will take approximately 1 hour

In [None]:
retrain_job_id = subprocess.getoutput(f"tao-client yolo-v4 model-retrain --id {model_id} --job {prune_job_id}")
print(retrain_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{retrain_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide evaluate specs <a class="anchor" id="head-11"></a>

In [None]:
# Default evaluate model specs
! tao-client yolo-v4 model-evaluate-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/evaluate.json

In [None]:
# Customize evaluate model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'evaluate.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["augmentation_config"]["output_width"] = 960 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["augmentation_config"]["output_height"] = 544 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["dataset_config"]["image_extension"] = "jpg" # Setting to the dataset's image_file extension type

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run evaluate on retrained model <a class="anchor" id="head-12"></a>

In [None]:
eval2_job_id = subprocess.getoutput(f"tao-client yolo-v4 model-evaluate --id {model_id} --job {retrain_job_id}")
print(eval2_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{eval2_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide FP16 export specs <a class="anchor" id="head-13"></a>

In [None]:
# Default export model specs
! tao-client yolo-v4 model-export-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/export.json

In [None]:
# Customize export model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'export.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["data_type"] = "fp16"
specs["batches"] = 10
specs["batch_size"] = 16

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run FP16 export <a class="anchor" id="head-14"></a>
The export will take about 12 minutes

In [None]:
export_job_id_2 = subprocess.getoutput(f"tao-client yolo-v4 model-export --id {model_id} --job {retrain_job_id}")
print(export_job_id_2)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{export_job_id_2}.txt"
my_tail(logs_dir, log_file)

### Provide model convert specs <a class="anchor" id="head-15"></a>

In [None]:
# Default convert model specs
! tao-client yolo-v4 model-convert-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/convert.json

In [None]:
# Customize convert model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'convert.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["t"] = "fp16"
specs["b"] = 8
specs["p"] = "Input,1x3x544x960,8x3x544x960,16x3x544x960"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run model convert <a class="anchor" id="head-16"></a>
The convert operation will take approximately 10 minutes

In [None]:
convert_job_id_2 = subprocess.getoutput(f"tao-client yolo-v4 model-convert --id {model_id} --job {export_job_id_2}")
print(convert_job_id_2)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{convert_job_id_2}.txt"
my_tail(logs_dir, log_file)

### Provide TRT inference specs <a class="anchor" id="head-17"></a>

In [None]:
# Default inference model specs
! tao-client yolo-v4 model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Make any changes to specs param in the dictionary here
specs["augmentation_config"]["output_width"] = 960 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff
specs["augmentation_config"]["output_height"] = 544 # Setting to the Half-resolution, set this based on the dataset being used and the training/inference time tradeoff

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TRT inference <a class="anchor" id="head-18"></a>

In [None]:
tao_inference_job_id_2 = subprocess.getoutput(f"tao-client yolo-v4 model-inference --id {model_id} --job {convert_job_id_2}")
print(tao_inference_job_id_2)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{tao_inference_job_id_2}.txt"
start_time = time.time()
my_tail(logs_dir, log_file)
inference_time = time.time() - start_time

In [None]:
print("Time in seconds for inference on unoptimized FP32 TRT model is ", trt_inference_time)
print("Time in seconds for inference on pruned FP16  model is ", inference_time)
# The number shown is the total time which includes model loading + Image I/O + Inference time + post processing time
# Full breakdown of each task given below. 

##### Performance breakdown
**TensorRT unoptimized FP32 model inference: <br>**
* Load model: **~1.3s** <br>
* Image I/O: **~9s** <br>
* Inference Time: **~3.4s** <br>
* Post Process Time: **~5.9s** <br>

**TensorRT pruned, optimized FP16 model inference: <br>**
* Load model: **~1.3s** <br>
* Image I/O: **~9s** <br>
* Inference Time: **~2.2s** <br>
* Post Process Time: **~5.9s** <br>

Inference Time Speedup from unoptimized FP32 model to pruned FP16 model: **~55%**

In [None]:
job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{tao_inference_job_id_2}"
from IPython.display import Image
Image(filename=f"{job_dir}/images_annotated/001354.jpg")