### Notebook to demonstrate TAO-Remote Client AutoML workflow for Image Classification

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)


### Learning Objective

This AutoML notebook applies to identifying the optimal hyperparameters (e.g., learning rate, batch size, weight regularizer, number of layers, etc.) in order to obtain better accuracy results or converge faster on AI models for classification application. 
- Take a pretrained model and choose automl algorithm/parameters to start AutoML train.
- At the end of an AutoML run, you will receive a config file that specifies the best performing model, along with the binary model file to deploy it to your application.


### The workflow in a nutshell

- Creating train and eval dataset
- Upload datasets to the service
- Set AutoML algorithm configurations
  - Add/Remove AutoML parameters
- Override train config defaults
- Run AutoML


### AutoML Workflow

User starts with selecting model topology, create and upload dataset, configuring parameters, training with AutoML to comparing the model.

![image](https://raw.githubusercontent.com/vpraveen-nv/model_card_images/main/api/automl_workflow.png)

### Table of contents

1. [Install TAO remote client](#head-1)
1. [Set the remote service base URL](#head-2)
1. [Access the shared volume](#head-3)
1. [Create the datasets](#head-4)
1. [List datasets](#head-5)
1. [Create a model experiment](#head-6)
1. [Find pretrained model](#head-7)
1. [Set AutoML related configurations](#head-8)
1. [Provide train specs](#head-9)
1. [Run AutoML train](#head-10)
1. [Get the best model from AutoML](#head-11)
1. [Delete experiment](#head-12)
1. [Delete datasets](#head-13)
1. [Unmount shared volume](#head-14)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import os
import glob
import subprocess
import getpass
import uuid
import json

In [None]:
namespace = 'default'

### FIXME

1. Assign a model_name in FIXME 1
2. Choose between default or custom dataset in FIXME 2
3. Assign the ip_address and port_number in FIXME 3 and FIXME 4 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
3. Set NGC API key in FIXME 5
4. Assign path of data directory in FIXME 6
5. Choose between Bayesian and Hyperband automl_algorithm in FIXME 7

In [None]:
# Available models (#FIXME 1):
# 1. classification - https://docs.nvidia.com/tao/tao-toolkit/text/image_classification.html
# 2. multitask-classification - https://docs.nvidia.com/tao/tao-toolkit/text/multitask_image_classification.html
# classification is the same as multi-class classification

model_name = "multitask-classification"  # FIXME1 (Add the model name from the above mentioned list)
dataset_to_be_used = "default" # FIXME2 example: default/custom; default for the dataset used in this tutorial notebook; custom for a different dataset

### Install TAO remote client <a class="anchor" id="head-1"></a>

In [None]:
# SKIP this step IF you have already installed the TAO-Client wheel.
! pip3 install nvidia-tao-client

In [None]:
# View the version of the TAO-Client
! tao-client --version

### Set the remote service base URL <a class="anchor" id="head-2"></a>

In [None]:
# Define the node_addr and port number
node_addr = "<ip_address>" # FIXME3 example: 10.137.149.22
node_port = "<port_number>" # FIXME4 example: 32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'
%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1

In [None]:
!echo $BASE_URL

In [None]:
# FIXME: Set ngc_api_key valiable
ngc_api_key = "<ngc_api_key>" # FIXME5 example: zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyM

# Exchange NGC_API_KEY for JWT
identity = json.loads(subprocess.getoutput(f'tao-client login --ngc-api-key {ngc_api_key}'))

%env USER={identity['user_id']}
%env TOKEN={identity['token']}

### Access the shared volume <a class="anchor" id="head-3"></a>

In [None]:
# Get PVC ID
pvc_id = subprocess.getoutput(f'kubectl get pvc tao-toolkit-api-pvc -n {namespace} -o jsonpath="{{.spec.volumeName}}"')
print(pvc_id)

In [None]:
# Get NFS server info
provisioner = json.loads(subprocess.getoutput(f'helm get values nfs-subdir-external-provisioner -o json'))
nfs_server = provisioner['nfs']['server']
nfs_path = provisioner['nfs']['path']
print(nfs_server, nfs_path)

In [None]:
user = getpass.getuser()
home = os.path.expanduser('~')

! echo "Password for {user}"
password = getpass.getpass()

In [None]:
# Mount shared volume 
! mkdir -p ~/shared

command = "apt-get -y install nfs-common >> /dev/null"
! echo {password} | sudo -S -k {command}

command = f"mount -t nfs {nfs_server}:{nfs_path}/{namespace}-tao-toolkit-api-pvc-{pvc_id} ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE

### Create the datasets <a class="anchor" id="head-4"></a>

**For multi-class classification:**

We will be using the `pascal VOC dataset` for the tutorial. To find more details please visit [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit). Please download the dataset present [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) to the environment variable $DATA_DIR.

**If using custom dataset; it should follow this dataset structure, and skip running** "**Split dataset into train and val sets**"
```
DATA_DIR
├── images_test
│   ├── class_name_1
│   │   ├── image_name_1.jpg
│   │   ├── image_name_2.jpg
│   │   ├── ...
|   |   ... 
│   └── class_name_n
│       ├── image_name_3.jpg
│       ├── image_name_4.jpg
│       ├── ...
├── images_train
│   ├── class_name_1
│   │   ├── image_name_5.jpg
│   │   ├── image_name_6.jpg
|   |   ...
│   └── class_name_n
│       ├── image_name_7.jpg
│       ├── image_name_8.jpg
│       ├── ...
|
└── images_val
    ├── class_name_1
    │   ├── image_name_9.jpg
    │   ├── image_name_10.jpg
    │   ├── ...
    |   ...
    └── class_name_n
        ├── image_name_11.jpg
        ├── image_name_12.jpg
        ├── ...
```
- Each class name folder should contain the images corresponding to that class
- Same class name folders should be present across images_test, images_train and images_val

**For multi-task classification:**

We will be using the Fashion Product Images (Small) for the tutorial. This dataset is available on Kaggle.In this tutorial, our trained classification network will perform three tasks: article category classification, base color classification and target season classification.

To download the dataset, you will need a Kaggle account. After login, you can download the dataset zip file [here](https://www.kaggle.com/paramaggarwal/fashion-product-images-small). The downloaded file is archive.zip with a subfolder called myntradataset. Unzip contents in this subfolder to your workdir created in the cell above and you should have a folder called images and a CSV file called styles.csv

**If using custom dataset; it should follow this dataset structure**
```
DATA_DIR
├── images
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   |   ├── ...
├── styles.csv
```

In [None]:
DATA_DIR = model_name # FIXME6
os.environ['DATA_DIR']= DATA_DIR
!mkdir -p $DATA_DIR

In [None]:
if dataset_to_be_used == "default":
    if model_name == "classification":
        if not os.path.exists(os.path.join(DATA_DIR,"VOCtrainval_11-May-2012.tar")):
            print("Download VOC tar data into ", DATA_DIR)
        else:
            !tar -xf $DATA_DIR/VOCtrainval_11-May-2012.tar -C $DATA_DIR
    elif model_name == "multitask-classification":
        if not os.path.exists(os.path.join(DATA_DIR,"archive.zip")):
            print(f"Download Fashion zip data into ", DATA_DIR)
        else:
            !unzip -uq $DATA_DIR/archive.zip -d $DATA_DIR/

In [None]:
# Check the dataset is present
if model_name == "classification" and dataset_to_be_used == "default":
    !if [ ! -d $DATA_DIR/VOCdevkit ]; then echo 'Images folder NOT found.'; else echo 'Found images folder.';fi
    !rm -rf $DATA_DIR/split
elif model_name == "multitask-classification":
    !if [ ! -d $DATA_DIR/images ]; then echo 'Images folder NOT found.'; else echo 'Found images folder.';fi
    !if [ ! -f $DATA_DIR/styles.csv ]; then echo 'CSV file NOT found.'; else echo 'Found CSV file.';fi
    # Create subdirectories and remove existing files in them
    !mkdir -p $DATA_DIR/images_train && rm -rf $DATA_DIR/images_train/*
    !mkdir -p $DATA_DIR/images_val && rm -rf $DATA_DIR/images_val/*
    !mkdir -p $DATA_DIR/images_test && rm -rf $DATA_DIR/images_test/*

### Split dataset into train and val sets

In [None]:
# Split dataset into train and val sets
if model_name == "classification" and dataset_to_be_used == "default":
    !python3 -m pip install tqdm
    from os.path import join as join_path
    import os
    import glob
    import re
    import shutil

    DATA_DIR=os.environ.get('DATA_DIR')
    source_dir = join_path(DATA_DIR, "VOCdevkit/VOC2012")
    target_dir = join_path(DATA_DIR, "formatted")


    suffix = '_trainval.txt'
    classes_dir = join_path(source_dir, "ImageSets", "Main")
    images_dir = join_path(source_dir, "JPEGImages")
    classes_files = glob.glob(classes_dir+"/*"+suffix)
    for file in classes_files:
        # get the filename and make output class folder
        classname = os.path.basename(file)
        if classname.endswith(suffix):
            classname = classname[:-len(suffix)]
            target_dir_path = join_path(target_dir, classname)
            if not os.path.exists(target_dir_path):
                os.makedirs(target_dir_path)
        else:
            continue
        print(classname)

        with open(file) as f:
            content = f.readlines()

        for line in content:
            tokens = re.split('\s+', line)
            if tokens[1] == '1':
                # copy this image into target dir_path
                target_file_path = join_path(target_dir_path, tokens[0] + '.jpg')
                src_file_path = join_path(images_dir, tokens[0] + '.jpg')
                shutil.copyfile(src_file_path, target_file_path)
    
    from random import shuffle
    from tqdm import tqdm

    DATA_DIR=os.environ.get('DATA_DIR')
    SOURCE_DIR=os.path.join(DATA_DIR, 'formatted')
    TARGET_DIR=os.path.join(DATA_DIR,'split')
    # list dir
    print(os.walk(SOURCE_DIR))
    dir_list = next(os.walk(SOURCE_DIR))[1]
    # for each dir, create a new dir in split
    for dir_i in tqdm(dir_list):
        newdir_train = os.path.join(TARGET_DIR, 'images_train', dir_i)
        newdir_val = os.path.join(TARGET_DIR, 'images_val', dir_i)
        newdir_test = os.path.join(TARGET_DIR, 'images_test', dir_i)

        if not os.path.exists(newdir_train):
                os.makedirs(newdir_train)
        if not os.path.exists(newdir_val):
                os.makedirs(newdir_val)
        if not os.path.exists(newdir_test):
                os.makedirs(newdir_test)

        img_list = glob.glob(os.path.join(SOURCE_DIR, dir_i, '*.jpg'))
        # shuffle data
        shuffle(img_list)

        for j in range(int(len(img_list)*0.7)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'images_train', dir_i))

        for j in range(int(len(img_list)*0.7), int(len(img_list)*0.8)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'images_val', dir_i))

        for j in range(int(len(img_list)*0.8), len(img_list)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'images_test', dir_i))

    print('Done splitting dataset.')

elif model_name == "multitask-classification" and dataset_to_be_used == "default":
    !python3 -m pip install numpy
    !python3 -m pip install pandas
    import os
    import shutil
    import numpy as np
    import pandas as pd

    df = pd.read_csv(os.environ['DATA_DIR'] + '/styles.csv', error_bad_lines=False, warn_bad_lines=False)
    df = df[['id', 'baseColour', 'subCategory', 'season']]
    df = df.dropna()
    category_cls = df.subCategory.value_counts()[:10].index # 10-class multitask-classification
    season_cls = ['Spring', 'Summer', 'Fall', 'Winter'] # 4-class multitask-classification
    color_cls = df.baseColour.value_counts()[:11].index # 11-class multitask-classification

    # Get all valid rows
    df = df[df.subCategory.isin(category_cls) & df.season.isin(season_cls) & df.baseColour.isin(color_cls)]
    df.columns = ['fname', 'base_color', 'category', 'season']
    df.fname = df.fname.astype(str)
    df.fname = df.fname + '.jpg'

    # remove entries whose image file is missing
    all_img_files = os.listdir(os.environ['DATA_DIR'] + '/images')
    df = df[df.fname.isin(all_img_files)]

    idx = np.arange(len(df))
    np.random.shuffle(idx)

    train_split_idx = int(len(df)*0.8)
    train_df = df.iloc[idx[:train_split_idx]]
    val_df = df.iloc[idx[train_split_idx:]]

    # Add a simple sanity check
    assert len(train_df.season.unique()) == 4 and len(train_df.base_color.unique()) == 11 and \
        len(train_df.category.unique()) == 10, 'Training set misses some classes, re-run this cell!'
    assert len(val_df.season.unique()) == 4 and len(val_df.base_color.unique()) == 11 and \
        len(val_df.category.unique()) == 10, 'Validation set misses some classes, re-run this cell!'

    for image_name in train_df["fname"]:
        source_file_name = os.path.join(DATA_DIR, "images", image_name)
        destination_file_name = os.path.join(DATA_DIR, "images_train", image_name)
        shutil.copy(source_file_name, destination_file_name)

    for image_name in val_df["fname"]:
        source_file_name = os.path.join(DATA_DIR, "images", image_name)
        destination_file_name = os.path.join(DATA_DIR, "images_train", image_name)
        shutil.copy(source_file_name, destination_file_name)
        destination_file_name = os.path.join(DATA_DIR, "images_val", image_name)
        shutil.copy(source_file_name, destination_file_name)

    # save processed data labels
    train_df.to_csv(os.environ['DATA_DIR'] + '/train.csv', index=False)
    val_df.to_csv(os.environ['DATA_DIR'] + '/val.csv', index=False)

### Verify the dataset split

In [None]:
# verify
if model_name == "classification":
    !if [ ! -d $DATA_DIR/split/images_train ]; then echo 'train folder NOT found.'; else echo 'Found train images folder.';fi
    !if [ ! -d $DATA_DIR/split/images_val ]; then echo 'val folder NOT found.'; else echo 'Found val images folder.';fi
    !if [ ! -d $DATA_DIR/split/images_test ]; then echo 'test folder NOT found.'; else echo 'Found test images folder.';fi
elif model_name == "multitask_classification":
    import pandas as pd

    print("Number of images in the train set. {}".format(
        len(pd.read_csv(os.environ['DATA_DIR'] + '/train.csv'))
    ))
    print("Number of images in the validation set. {}".format(
        len(pd.read_csv(os.environ['DATA_DIR'] + '/val.csv'))
    ))

### Create and upload datasets

In [None]:
if model_name == "classification":
    ds_format = "default"
elif model_name == "multitask-classification":
    ds_format = "custom"

In [None]:
train_dataset_id = subprocess.getoutput(f"tao-client {model_name} dataset-create --dataset_type image_classification --dataset_format {ds_format}")
print(train_dataset_id)

In [None]:
if model_name == "classification":
    ! rsync -ah --info=progress2 {DATA_DIR}/split/images_train ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
elif model_name == "multitask-classification":
    ! rsync -ah --info=progress2 {DATA_DIR}/images_train ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
    ! rsync -ah --info=progress2 {DATA_DIR}/train.csv ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
    ! rsync -ah --info=progress2 {DATA_DIR}/val.csv ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
! echo DONE

In [None]:
eval_dataset_id = subprocess.getoutput(f"tao-client {model_name} dataset-create --dataset_type image_classification --dataset_format {ds_format}")
print(eval_dataset_id)

In [None]:
if model_name == "classification":
    ! rsync -ah --info=progress2 {DATA_DIR}/split/images_val ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
elif model_name == "multitask-classification":
    ! rsync -ah --info=progress2 {DATA_DIR}/images_val ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}
    ! rsync -ah --info=progress2 {DATA_DIR}/val.csv ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
! echo DONE

### List datasets <a class="anchor" id="head-5"></a>

In [None]:
pattern = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', '*', 'metadata.json')

datasets = []
for metadata_path in glob.glob(pattern):
    with open(metadata_path, 'r') as metadata_file:
        datasets.append(json.load(metadata_file))

print(json.dumps(datasets, indent=2))

### Create a model experiment <a class="anchor" id="head-6"></a>

In [None]:
network_arch = model_name.replace("-","_")
if network_arch == "classification":
    encode_key = "nvidia_tlt"
else:
    encode_key = "tlt_encode"
model_id = subprocess.getoutput(f"tao-client {model_name} model-create --network_arch {network_arch} --encryption_key {encode_key} ")
print(model_id)

### Assign train, eval datasets 

In [None]:
metadata_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'metadata.json')

with open(metadata_path , "r") as metadata_file:
    metadata = json.load(metadata_file)

metadata["train_datasets"] = [train_dataset_id]
metadata["eval_dataset"] = eval_dataset_id

### Find pretrained model <a class="anchor" id="head-7"></a>

In [None]:
# Assigning pretrained models to different yolo versions
# print base_url+"/model" to get the details of all pretrained models and make the appropriate changes to this map for experiments like for example 
# you are changing the number of layers to 34, then you have to make the appropriate change in the pretrained model name
# print(base_url+"/model")
pretrained_map = {"classification" : "pretrained_classification:resnet18",
                  "multitask_classification" : "pretrained_classification:resnet10"}

In [None]:
pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')

ptm_id = None
for ptm_metadata_path in glob.glob(pattern):
  with open(ptm_metadata_path, 'r') as metadata_file:
    ptm_metadata = json.load(metadata_file)
    ngc_path = ptm_metadata.get("ngc_path")
    metadata_network_arch = ptm_metadata.get("network_arch")
    if metadata_network_arch == network_arch and ngc_path.endswith(pretrained_map[network_arch]):
      ptm_id = ptm_metadata["id"]
      break

metadata["ptm"] = ptm_id
print(ptm_id)

### View hyperparameters that are enabled for AutoML by default

In [None]:
# View default automl specs enabled
! tao-client {model_name} model-automl-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/automl_defaults.json

### Set AutoML related configurations <a class="anchor" id="head-8"></a>
Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters: [Multiclass_classification](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#train), 
[Multitask_classification](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id27)

In [None]:
# Choose automl algorithm between "Bayesian" and "HyperBand".
automl_algorithm="Bayesian" # FIXME7 example: Bayesian/HyperBand

metric = "kpi" #Don't change this, in future multiple metrics will be supported
additional_automl_parameters = [] #Refer to parameter list mentioned in the above links and add any extra parameter in addition to the default enabled ones
remove_default_automl_parameters = [] #Remove any hyperparameters that are enabled by default for AutoML

metadata["automl_algorithm"] = automl_algorithm
metadata["automl_enabled"] = True
metadata["metric"] = metric
metadata["automl_add_hyperparameters"] = str(additional_automl_parameters)
metadata["automl_remove_hyperparameters"] = str(remove_default_automl_parameters)

with open(metadata_path, "w") as metadata_file:
    json.dump(metadata, metadata_file, indent=2)

print(json.dumps(metadata, indent=2))

### Provide train specs <a class="anchor" id="head-9"></a>

In [None]:
# Default train model specs
! tao-client {model_name} model-train-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/train.json

In [None]:
# Customize train model specs
specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'train.json')

with open(specs_path , "r") as specs_file:
    specs = json.load(specs_file)

# Apply changes for any of the parameters listed in the previous cell as required
# Example for multitask-classification (for each network the parameter key might be different)
specs["training_config"]["num_epochs"] = 10
# Example for classification
# specs["train_config"]["n_epochs"] = 80


with open(specs_path, "w") as specs_file:
    json.dump(specs, specs_file, indent=2)

print(json.dumps(specs, indent=2))

### Run AutoML train <a class="anchor" id="head-10"></a>

In [None]:
train_job_id = subprocess.getoutput(f"tao-client {model_name} model-train --id " + model_id)
print(train_job_id)

In [None]:
#utility function to print log file for the upcoming cell
def my_tail(logs_dir, log_file):
    %env LOG_FILE={logs_dir}/{log_file}
    ! mkdir -p {logs_dir}
    ! [ ! -f "$LOG_FILE" ] && touch $LOG_FILE && chmod 666 $LOG_FILE
    ! tail -f -n +1 $LOG_FILE | while read LINE; do echo "$LINE"; [[ "$LINE" == "EOF" ]] && pkill -P $$ tail; done

In [None]:
# Set poll_automl_stats to True if just want to see what's the time left, how many epochs are remaining etc.
# Set poll_automl_stats to False if you want to skip stats and see the training logs instead. Training logs viewing are supported only for Bayesian

# Training times for different models benchmarked on 1 GPU V100 machine can be found here: https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#results-of-automl-experiments

poll_automl_stats = True
if poll_automl_stats:
    import time
    from IPython.display import clear_output
    stats_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, "automl_metadata.json")
    controller_json_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, "controller.json")
    while True:
        time.sleep(15)
        clear_output(wait=True)
        if os.path.exists(stats_path):
            try:
                with open(stats_path , "r") as stats_file:
                    stats_dict = json.load(stats_file)
                print(json.dumps(stats_dict, indent=2))
                if float(stats_dict["Number of epochs yet to start"]) == 0.0:
                    break
            except (json.JSONDecodeError):
                print("Stats computed are being written to file. Stats will be visible on screen in a few seconds")
else:
    # Print the log file - supported only for bayesian (the file won't exist until the backend Toolkit container is running -- can take several minutes)
    if automl_algorithm == "Bayesian":
        logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id)
        max_recommendations = metadata.get("automl_max_recommendations",20)
        for experiment_num in range(max_recommendations):
            log_file = f"{train_job_id}/experiment_{experiment_num}/log.txt"
            while True:
                if os.path.exists(os.path.join(logs_dir, log_file)):
                    break
            print(f"\n\nViewing experiment {experiment_num}\n\n")
            my_tail(logs_dir, log_file)

### Get the best model from AutoML <a class="anchor" id="head-11"></a>

In [None]:
# The config and the weights of the best configuration are present at best_model folder
# Takes a few seconds to copy the original automl experiment to best_model folder
!python3 -m pip install pandas
import pandas as pd

automl_job_dir = f"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{train_job_id}"
best_model_path =  f"{automl_job_dir}/best_model"

while True:
    if os.path.exists(best_model_path) and len(os.listdir(best_model_path)) > 0 and os.path.exists(f"{best_model_path}/controller.json"):
        #List the binary model file
        print("\nCheckpoints for the best performing experiment")
        if os.path.exists(best_model_path+"/weights") and len(os.listdir(best_model_path+"/weights")) > 0:
            print(f"Folder: {best_model_path}/weights")
            print("Files:", os.listdir(best_model_path+"/weights"))
        else:
            print(f"Folder: {best_model_path}")
            print("Files:", os.listdir(best_model_path))

        experiment_artifacts = json.load(open(f"{best_model_path}/controller.json","r"))
        data_frame = pd.DataFrame(experiment_artifacts)
        # Print experiment id/number and the corresponding result
        print("\nResults of all experiments")
        with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):
            print(data_frame[["id","result"]])

        print("\nConfig/Spec file for the best performing experiment (recommendation_id.kitti with the maximum result value in the dataframe)")
        # List the recommendation config file of the best performing checkpoint(recommendation_id.kitti with the maximum result value in the dataframe)
        !ls {best_model_path}/*.kitti 
            
        break

### Delete experiment <a class="anchor" id="head-12"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/models/{model_id}
! echo DONE

### Delete datasets <a class="anchor" id="head-13"></a>

In [None]:
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}
! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}
! echo DONE

### Unmount shared volume <a class="anchor" id="head-14"></a>

In [None]:
command = "umount ~/shared"
! echo {password} | sudo -S -k {command} && echo DONE

### Uninstall TAO Remote Client <a class="anchor" id="head-32"></a>

In [None]:
! pip3 uninstall -y nvidia-tao-client