### TAO remote client (Simple object detection training with YOLO-V4)

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)


### The workflow in a nutshell

- Creating a dataset
- Upload kitti dataset to the service
- Running dataset convert
- Getting a PTM from NGC
- Model Actions
    - Train
    - Evaluate

### Table of contents

1. [Install TAO remote client ](#head-1)
1. [Set the remote service base URL](#head-2)
1. [Access the shared volume](#head-3)
1. [Create the datasets](#head-4)
1. [List datasets](#head-5)
1. [Provide and customize dataset convert specs](#head-6)
1. [Run dataset convert](#head-7)
1. [Create a model experiment](#head-8)
1. [Find yolo pretrained model](#head-9)
1. [Customize model metadata](#head-10)
1. [Provide train specs](#head-11)
1. [Run train](#head-12)
1. [Provide evaluate specs](#head-13)
1. [Run evaluate](#head-14)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import os
import glob
import subprocess
import getpass
import uuid
import json
import time

In [None]:
namespace = 'default'

### Install TAO remote client <a class="anchor" id="head-1"></a>

In [None]:
# SKIP this step IF you have already installed the TAO-Client wheel.
! pip3 install nvidia-transfer-learning-client

In [None]:
# View the version of the TAO-Client
! nvtl --version

### FIXME

1. Assign the ip_address and port_number in FIXME 1 and FIXME 2 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
1. Set NGC API key in FIXME 3
1. Assign path of DATA_DIR in FIXME 4

### Set the remote service base URL <a class="anchor" id="head-2"></a>

In [None]:
# Define the node_addr and port number
node_addr = "<ip_address>" # FIXME1 example: 10.137.149.22
node_port = "<port_number>" # FIXME2 example: 32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'
%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1

In [None]:
# FIXME: Set ngc_api_key valiable
ngc_api_key = "<ngc_api_key>" # FIXME3 example: zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyM

# Exchange NGC_API_KEY for JWT
identity = json.loads(subprocess.getoutput(f'nvtl login --ngc-api-key {ngc_api_key}'))

%env USER={identity['user_id']}
%env TOKEN={identity['token']}

### Access the shared volume <a class="anchor" id="head-3"></a>

In [None]:
# Get PVC ID
pvc_id = subprocess.getoutput(f'kubectl get pvc tao-toolkit-api-pvc -n {namespace} -o jsonpath="{{.spec.volumeName}}"')
print(pvc_id)

In [None]:
# Get NFS server info
provisioner = json.loads(subprocess.getoutput(f'helm get values nfs-subdir-external-provisioner -o json'))
nfs_server = provisioner['nfs']['server']
nfs_path = provisioner['nfs']['path']
print(nfs_server, nfs_path)

In [None]:
user = getpass.getuser()
home = os.path.expanduser('~')

! echo "Password for {user}"
password = getpass.getpass()

In [None]:
# Mount shared volume 
! mkdir -p ~/shared

command = "apt-get -y install nfs-common >> /dev/null"
! echo {password} | sudo -S -k {command}

command = f"mount -t nfs {nfs_server}:{nfs_path}/{namespace}-tao-toolkit-api-pvc-{pvc_id} ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE

### Create the datasets <a class="anchor" id="head-4"></a>

We will be using NVIDIA's synthetic dataset on warehouse images based on the `kitti object detection dataset` format in this example. To find more details about kitti, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d).

**The dataset follows this structure**
```
$DATA_DIR/train
├── images
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   ├── ...
└── labels
    ├── image_name_1.txt
    ├── image_name_2.txt
    ├── ...
$DATA_DIR/val
├── images
│   ├── image_name_5.jpg
│   ├── image_name_6.jpg
|   ├── ...
└── labels
    ├── image_name_5.txt
    ├── image_name_6.txt
    ├── ...
```
The file name should be same for images and labels folders

In [None]:
DATA_DIR = "tao_synthetic_data" #FIXME4

In [None]:
!aws s3 cp s3://tao-detection-synthetic-dataset-dev/tao_od_synthetic_train.tar.gz $DATA_DIR/
!aws s3 cp s3://tao-detection-synthetic-dataset-dev/tao_od_synthetic_val.tar.gz $DATA_DIR/

!mkdir -p $DATA_DIR/train/ && rm -rf $DATA_DIR/train/*
!mkdir -p $DATA_DIR/val/ && rm -rf $DATA_DIR/val/*

!tar -xzf $DATA_DIR/tao_od_synthetic_train.tar.gz -C $DATA_DIR/train/
!tar -xzf $DATA_DIR/tao_od_synthetic_val.tar.gz -C $DATA_DIR/val/

In [None]:
train_dataset_id = subprocess.getoutput(f"nvtl yolo-v4 dataset-create --dataset_type object_detection --dataset_format kitti")
print(train_dataset_id)

In [None]:
! rsync -ah --info=progress2 $DATA_DIR/train/images ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
! rsync -ah --info=progress2 $DATA_DIR/train/labels ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
! echo DONE

In [None]:
eval_dataset_id = subprocess.getoutput(f"nvtl yolo-v4 dataset-create --dataset_type object_detection --dataset_format kitti")
print(eval_dataset_id)

In [None]:
! rsync -ah --info=progress2 $DATA_DIR/val/images ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
! rsync -ah --info=progress2 $DATA_DIR/val/labels ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
! echo DONE

In [None]:
infer_dataset_id = subprocess.getoutput(f"nvtl yolo-v4 dataset-create --dataset_type object_detection --dataset_format raw")
print(infer_dataset_id)

In [None]:
! rsync -ah --info=progress2 $DATA_DIR/val/images ~/shared/users/{os.environ['USER']}/datasets/{infer_dataset_id}/
! echo DONE

### List datasets <a class="anchor" id="head-5"></a>

In [None]:
pattern = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', '*', 'metadata.json')

datasets = []
for metadata_path in glob.glob(pattern):
    with open(metadata_path, 'r') as metadata_file:
        datasets.append(json.load(metadata_file))

print(json.dumps(datasets, indent=2))

### Provide and customize dataset convert specs <a class="anchor" id="head-6"></a>

In [None]:
# Default train dataset specs
! nvtl yolo-v4 dataset-convert-defaults --id {train_dataset_id} --action convert | tee ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/specs/convert.json

In [None]:
# Customize train dataset specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', train_dataset_id, 'specs', 'convert.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["kitti_config"]["image_extension"] = ".jpg" # Setting to the dataset's image_file extension type

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

In [None]:
# Default eval dataset specs
! nvtl yolo-v4 dataset-convert-defaults --id {eval_dataset_id} --action convert | tee ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/specs/convert.json

In [None]:
# Customize eval dataset specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', eval_dataset_id, 'specs', 'convert.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["kitti_config"]["image_extension"] = ".jpg" # Setting to the dataset's image_file extension type

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run dataset convert <a class="anchor" id="head-7"></a>

In [None]:
train_convert_job_id = subprocess.getoutput(f"nvtl yolo-v4 dataset-convert --id {train_dataset_id}  --action convert ")
print(train_convert_job_id)

In [None]:
def my_tail(logs_dir, log_file):
    %env LOG_FILE={logs_dir}/{log_file}
    ! mkdir -p {logs_dir}
    ! [ ! -f "$LOG_FILE" ] && touch $LOG_FILE && chmod 666 $LOG_FILE
    ! tail -f -n +1 $LOG_FILE | while read LINE; do echo "$LINE"; [[ "$LINE" == "EOF" ]] && pkill -P $$ tail; done
    
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', train_dataset_id, 'logs')
log_file = f"{train_convert_job_id}.txt"

my_tail(logs_dir, log_file)

In [None]:
eval_convert_job_id = subprocess.getoutput(f"nvtl yolo-v4 dataset-convert --id {eval_dataset_id}  --action convert ")
print(eval_convert_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', eval_dataset_id, 'logs')
log_file = f"{eval_convert_job_id}.txt"

my_tail(logs_dir, log_file)

### Create a model experiment <a class="anchor" id="head-8"></a>

In [None]:
network_arch = "yolo_v4"
model_id = subprocess.getoutput(f"nvtl yolo-v4 model-create --network_arch {network_arch} --encryption_key tlt_encode ")
print(model_id)

### Find yolo pretrained model <a class="anchor" id="head-9"></a>

In [None]:
pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')

ptm_id = None
for metadata_path in glob.glob(pattern):
  with open(metadata_path, 'r') as metadata_file:
    metadata = json.load(metadata_file)
    ngc_path = metadata.get("ngc_path")
    metadata_architecture = metadata.get("network_arch")
    if metadata_architecture == network_arch and "pretrained_object_detection:resnet18" in ngc_path:
      ptm_id = metadata["id"]
      break

print(ptm_id)

### Customize model metadata <a class="anchor" id="head-10"></a>

In [None]:
metadata_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'metadata.json')

with open(metadata_path , "r") as metadata_file:
    metadata = json.load(metadata_file)

metadata["train_datasets"] = [train_dataset_id]
metadata["eval_dataset"] = eval_dataset_id
metadata["inference_dataset"] = infer_dataset_id
metadata["ptm"] = ptm_id

with open(metadata_path, "w") as metadata_file:
    json.dump(metadata, metadata_file, indent=2)

print(json.dumps(metadata, indent=2))

### Provide train specs <a class="anchor" id="head-11"></a>

In [None]:
# Default train model specs
! nvtl yolo-v4 model-train-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/train.json

In [None]:
# Customize train model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'train.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["training_config"]["num_epochs"] = 100
specs["dataset_config"]["image_extension"] = "jpg" # Setting to the dataset's image_file extension type

specs["augmentation_config"]["output_width"] = 1280 # Setting to the dataset's original resolution's width
specs["augmentation_config"]["output_height"] = 736 # Setting to the dataset's original resolution's height

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run train <a class="anchor" id="head-12"></a>

In [None]:
train_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-train --id {model_id}")
print(train_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'logs')
log_file = f"{train_job_id}.txt"

my_tail(logs_dir, log_file)

### Provide evaluate specs <a class="anchor" id="head-13"></a>

In [None]:
# Default evaluate model specs
! nvtl yolo-v4 model-evaluate-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/evaluate.json

In [None]:
# Customize evaluate model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'evaluate.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

specs["dataset_config"]["image_extension"] = "jpg" # Setting to the dataset's image_file extension type

specs["augmentation_config"]["output_width"] = 1280 # Setting to the dataset's original resolution's width
specs["augmentation_config"]["output_height"] = 736 # Setting to the dataset's original resolution's height

with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run evaluate <a class="anchor" id="head-14"></a>

In [None]:
eval_job_id = subprocess.getoutput(f"nvtl yolo-v4 model-evaluate --id {model_id} --job {train_job_id}")
print(eval_job_id)

In [None]:
# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)
log_file = f"{eval_job_id}.txt"
my_tail(logs_dir, log_file)

In [None]:
#Copy these 2 UUID's for FIXME4 and FIXME5 of yolo_optimization.ipynb
print(model_id)
print(train_job_id)