### TAO remote client (object detection with YOLO)

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)


### The workflow in a nutshell

- Model Actions
    - Prune, retrain
    - Export
    - Convert
    - Inference on TRT

### Table of contents

1. [Provide FP32 export specs](#head-1)
1. [Run FP32 export](#head-2)
1. [Provide model convert specs](#head-3)
1. [Run model convert](#head-4)
1. [Provide TAO inference specs](#head-5)
1. [Run TAO inference](#head-6)
1. [Provide prune specs](#head-7)
1. [Run prune](#head-8)
1. [Provide retrain specs](#head-9)
1. [Run retrain](#head-10)
1. [Provide evaluate specs](#head-11)
1. [Run evaluate on retrain](#head-12)
1. [Provide Int8 export specs](#head-13)
1. [Run Int8 export](#head-14)
1. [Provide model convert specs](#head-15)
1. [Run model convert](#head-16)
1. [Provide TAO inference specs](#head-17)
1. [Run TAO inference](#head-18)
1. [Delete experiment](#head-19)
1. [Delete datasets](#head-20)
1. [Unmount shared volume](#head-21)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import os
import glob
import subprocess
import getpass
import uuid
import json
import time

In [None]:
namespace = 'default'

### FIXME

1. Set model_id in FIXME 1
1. Set train_job_id in FIXME 2
1. Assign the ip_address and port_number in FIXME 3 and FIXME 4 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
1. Set NGC API key in FIXME 5

In [None]:
model_id = FIXME #FIXME1
train_job_id = FIXME #FIXME2

### Set the remote service base URL <a class="anchor" id="head-2"></a>

In [None]:
# Define the node_addr and port number
node_addr = "<ip_address>" # FIXME3 example: 10.137.149.22
node_port = "<port_number>" # FIXME4 example: 32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'
%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1

In [None]:
# FIXME: Set ngc_api_key valiable
ngc_api_key = "<ngc_api_key>" # FIXME5 example: zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyM

# Exchange NGC_API_KEY for JWT
identity = json.loads(subprocess.getoutput(f'nvtl login --ngc-api-key {ngc_api_key}'))

%env USER={identity['user_id']}
%env TOKEN={identity['token']}

In [None]:
user = getpass.getuser()
home = os.path.expanduser('~')

! echo "Password for {user}"
password = getpass.getpass()

In [None]:
def my_tail(logs_dir, log_file):
    %env LOG_FILE={logs_dir}/{log_file}
    ! mkdir -p {logs_dir}
    ! [ ! -f "$LOG_FILE" ] && touch $LOG_FILE && chmod 666 $LOG_FILE
    ! tail -f -n +1 $LOG_FILE | while read LINE; do echo "$LINE"; [[ "$LINE" == "EOF" ]] && pkill -P $$ tail; done

In [None]:
logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'logs')

### Provide FP32 export specs <a class="anchor" id="head-1"></a>

In [None]:
# Default export model specs
! nvtl yolo-v4 model-export-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/export.json

In [None]:
# Customize export model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'export.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["data_type"] = "fp32"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run FP32 export <a class="anchor" id="head-2"></a>

In [None]:
fp32_export_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-export --id {model_id} --job {train_job_id}")
print(fp32_export_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{fp32_export_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide model convert specs <a class="anchor" id="head-3"></a>

In [None]:
# Default convert model specs
! nvtl yolo-v4 model-convert-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/convert.json

In [None]:
# Customize convert model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'convert.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["t"] = "fp32"
specs["b"] = 8
specs["p"] = "Input,1x3x736x1280,8x3x736x1280,16x3x736x1280"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run model convert <a class="anchor" id="head-4"></a>

In [None]:
convert_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-convert --id {model_id} --job {fp32_export_job_id}")
print(convert_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{convert_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide TAO inference specs <a class="anchor" id="head-5"></a>

In [None]:
# Default inference model specs
! nvtl yolo-v4 model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["augmentation_config"]["output_width"] = 1280 # Setting to the dataset's original resolution's width
specs["augmentation_config"]["output_height"] = 736 # Setting to the dataset's original resolution's height

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TAO inference <a class="anchor" id="head-6"></a>

In [None]:
tao_inference_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-inference --id {model_id} --job {convert_job_id}")
print(tao_inference_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{tao_inference_job_id}.txt"
start_time = time.time()
my_tail(logs_dir, log_file)
inference_time = time.time() - start_time

In [None]:
print("Time in seconds for inference on unoptimized model is ", inference_time)

In [None]:
job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{tao_inference_job_id}"
from IPython.display import Image
import glob
sample_image = glob.glob(f"{job_dir}/images_annotated/*.jpg")[6]
Image(filename=sample_image) 

### Provide prune specs <a class="anchor" id="head-7"></a>

In [None]:
# Default prune model specs
! nvtl yolo-v4 model-prune-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/prune.json

### Run prune <a class="anchor" id="head-8"></a>

In [None]:
prune_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-prune --id {model_id} --job {train_job_id}")
print(prune_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{prune_job_id}.txt"
my_tail(logs_dir, log_file)

In [None]:
prune_job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{prune_job_id}"
pruned_model_size = json.loads(subprocess.getoutput(f'stat -c "%s" {prune_job_dir}/model.tlt'))

train_job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{train_job_id}"
original_train_model_size = json.loads(subprocess.getoutput(f'stat -c "%s" {train_job_dir}/weights/yolov4_resnet18_epoch_010.tlt'))

print(f"The original trained model size is {original_train_model_size} KB")
print(f"The pruned model size is {pruned_model_size} KB")
print(f"The pruned model is {round(original_train_model_size/pruned_model_size,1)}x smaller than the original model")

### Provide retrain specs <a class="anchor" id="head-9"></a>

In [None]:
# Default retrain model specs
! nvtl yolo-v4 model-retrain-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/retrain.json

In [None]:
# Customize retrain model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'retrain.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["training_config"]["num_epochs"] = 10
specs["dataset_config"]["image_extension"] = "jpg" # Setting to the dataset's image_file extension type

specs["augmentation_config"]["output_width"] = 1280 # Setting to the dataset's original resolution's width
specs["augmentation_config"]["output_height"] = 736 # Setting to the dataset's original resolution's height

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run retrain <a class="anchor" id="head-10"></a>

In [None]:
retrain_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-retrain --id {model_id} --job {prune_job_id}")
print(retrain_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{retrain_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide evaluate specs <a class="anchor" id="head-11"></a>

In [None]:
# Default evaluate model specs
! nvtl yolo-v4 model-evaluate-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/evaluate.json

In [None]:
# Customize evaluate model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'evaluate.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["dataset_config"]["image_extension"] = "jpg" # Setting to the dataset's image_file extension type

specs["augmentation_config"]["output_width"] = 1280 # Setting to the dataset's original resolution's width
specs["augmentation_config"]["output_height"] = 736 # Setting to the dataset's original resolution's height

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run evaluate on retrained model <a class="anchor" id="head-12"></a>

In [None]:
eval2_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-evaluate --id {model_id} --job {retrain_job_id}")
print(eval2_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{eval2_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide Int8 export specs <a class="anchor" id="head-13"></a>

In [None]:
# Default export model specs
! nvtl yolo-v4 model-export-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/export.json

In [None]:
# Customize export model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'export.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["data_type"] = "int8"
specs["batches"] = 10
specs["batch_size"] = 16

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run Int8 export <a class="anchor" id="head-14"></a>

In [None]:
int8_export_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-export --id {model_id} --job {retrain_job_id}")
print(int8_export_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{int8_export_job_id}.txt"
my_tail(logs_dir, log_file)

### Provide model convert specs <a class="anchor" id="head-15"></a>

In [None]:
# Default convert model specs
! nvtl yolo-v4 model-convert-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/convert.json

In [None]:
# Customize convert model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'convert.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["t"] = "int8"
specs["b"] = 8
specs["p"] = "Input,1x3x736x1280,8x3x736x1280,16x3x736x1280"

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run model convert <a class="anchor" id="head-16"></a>

In [None]:
convert_job_id_2 = subprocess.getoutput(f"nvtl yolo-v4 model-convert --id {model_id} --job {int8_export_job_id}")
print(convert_job_id_2)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{convert_job_id_2}.txt"
my_tail(logs_dir, log_file)

### Provide TAO inference specs <a class="anchor" id="head-17"></a>

In [None]:
# Default inference model specs
! nvtl yolo-v4 model-inference-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/inference.json

In [None]:
# Customize TAO inference specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'inference.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["augmentation_config"]["output_width"] = 1280 # Setting to the dataset's original resolution's width
specs["augmentation_config"]["output_height"] = 736 # Setting to the dataset's original resolution's height

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run TAO inference <a class="anchor" id="head-18"></a>

In [None]:
tao_inference_job_id_2 = subprocess.getoutput(f"nvtl yolo-v4 model-inference --id {model_id} --job {convert_job_id_2}")
print(tao_inference_job_id_2)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{tao_inference_job_id_2}.txt"
start_time = time.time()
my_tail(logs_dir, log_file)
inference_time = time.time() - start_time

In [None]:
print("Time in seconds for inference on optimized model is ", inference_time)

In [None]:
job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{tao_inference_job_id_2}"
from IPython.display import Image
import glob
sample_image = glob.glob(f"{job_dir}/images_annotated/*.jpg")[6]
Image(filename=sample_image) 

### Delete experiment <a class="anchor" id="head-19"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/models/{model_id}
! echo DONE

### Delete datasets <a class="anchor" id="head-20"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}
! echo DONE

### Unmount shared volume <a class="anchor" id="head-21"></a>

In [None]:
command = "umount ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE