# Road Follower - Train Model

In this notebook we will train a neural network to take an input image, and output a set of x, y values corresponding to a target.

We will be using PyTorch deep learning framework to train ResNet18 neural network architecture model for road follower application.

In [None]:
# execute this script as followings:
# 1. set project interpreter to PC python of version 3.8 pr above, and then
# 2. run jupyter lab in command window 

import torch
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision.transforms as transforms
import glob
import PIL.Image
import os
import numpy as np
import ipywidgets


TRAIN_MODEL = "inception_v3"  # resnet18, resnet34, resnet50, resnet101, mobilenet_v2, vgg11, mobilenet_v3_large, inception_v3, efficientnet_b4, googlenet

# *** refererence : https://pytorch.org/docs/stable/optim.html#algorithms
# use the following learning algorithms for evaluation
TRAIN_METHOD = "Adam"  # "Adam", "SGD", "ASGD", "Adadelta", "RAdam"; the parameters lr=0.01, momentum=0.92 is needed for SGD

### Download and extract data

Before you start, you should upload the zip file that you created in the ``data_collection.ipynb`` notebook on the robot. 

> If you're training on the JetBot you collected data on, you can skip this!

You should then extract this dataset by calling the command below:

In [None]:
# !unzip -q road_following.zip
from zipfile import ZipFile
dir_depo = 'D:\\AI_Lecture_Demos\\Data_Repo\\Cuterbot_2004_Repo'
os.makedirs(dir_depo, exist_ok=True)
# dir_depo = os.getcwd()
training_datafile = 'dataset_xy_0916_1.zip'  # check the data file is loaded to dir_depo

with ZipFile(os.path.join(dir_depo, training_datafile), 'r') as zObject:
    zObject.extractall(path=dir_depo)

You should see a folder named ``dataset_xy`` appear in the file directory dir_depo.

### Create Dataset Instance

Here we create a custom ``torch.utils.data.Dataset`` implementation, which implements the ``__len__`` and ``__getitem__`` functions.  This class
is responsible for loading images and parsing the x, y values from the image filenames.  Because we implement the ``torch.utils.data.Dataset`` class,
we can use all of the torch data utilities :)

We hard coded some transformations (like color jitter) into our dataset.  We made random horizontal flips optional (in case you want to follow a non-symmetric path, like a road
where we need to 'stay right').  If it doesn't matter whether your robot follows some convention, you could enable flips to augment the dataset.

In [None]:
def get_x(path, width):
    """Gets the x value from the image filename"""
    return (float(int(path.split("_")[1])) - width/2) / (width/2)

def get_y(path, height):
    """Gets the y value from the image filename"""
    return (float(int(path.split("_")[2])) - height/2) / (height/2)

class XYDataset(torch.utils.data.Dataset):
    
    def __init__(self, directory, random_hflips=False):
        self.directory = directory
        self.random_hflips = random_hflips
        self.image_paths = glob.glob(os.path.join(self.directory, '*.jpg'))
        self.color_jitter = transforms.ColorJitter(0.3, 0.3, 0.3, 0.3)
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        image_path = self.image_paths[idx]
        
        image = PIL.Image.open(image_path)
        width, height = image.size
        x = float(get_x(os.path.basename(image_path), width))
        y = float(get_y(os.path.basename(image_path), height))
      
        if float(np.random.rand(1)) > 0.5:
            image = transforms.functional.hflip(image)
            x = -x
        
        image = self.color_jitter(image)
        if TRAIN_MODEL == 'inception_v3':
            image = transforms.functional.resize(image, (299, 299))
        else:
            image = transforms.functional.resize(image, (224, 224))
        image = transforms.functional.to_tensor(image)
        image = image.numpy()[::-1].copy()
        image = torch.from_numpy(image)
        image = transforms.functional.normalize(image, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        
        return image, torch.tensor([x, y]).float()
    
dataset = XYDataset(os.path.join(dir_depo, 'dataset_xy'), random_hflips=False)

### Split dataset into train and test sets
Once we read dataset, we will split data set in train and test sets. In this example we split train and test a 90%-10%. The test set will be used to verify the accuracy of the model we train.

In [None]:
test_percent = 0.1
num_test = int(test_percent * len(dataset))
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [len(dataset) - num_test, num_test])

### Create data loaders to load data in batches

We use ``DataLoader`` class to load data in batches, shuffle data and allow using multi-subprocesses. In this example we use batch size of 64. Batch size will be based on memory available with your GPU and it can impact accuracy of the model.

In [None]:
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=8,
    shuffle=True,
    num_workers=0
)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=8,
    shuffle=True,
    num_workers=0
)

### Define Neural Network Model 
We use the models (the parameter setting TRAIN_MODEL above) available on PyTorch TorchVision, such as resnet18, resnet34, resnet50, resnet101, mobilenet_v2, vgg11, mobilenet_v3_large, inception_v3, efficientnet_b4, googlenet.

Referring the following information pytorch model web site:
1. classifier : https://pytorch.org/vision/0.10/models.html#classification
2. github : https://github.com/pytorch/vision/tree/release/0.11/torchvision/models
More details on ResNet-18 and other variants: https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py

### Define Neural Network Model Learning Algoritmom
We can use the following learning algorithms (the parameter setting TRAIN_METHOD above) for training a model
"Adam", "SGD", "ASGD", "Adadelta", "RAdam"; the parameters lr=0.01, momentum=0.92 is needed for SGD
1. reference web site : https://pytorch.org/docs/stable/optim.html#algorithms

In a process called transfer learning, we can repurpose a pre-trained model (trained on millions of images) for a new task that has possibly much less data available.
More Details on Transfer Learning: https://www.youtube.com/watch?v=yofjFQddwHE 



In [None]:
# model = models.resnet18(pretrained=True)
model = getattr(models, TRAIN_MODEL)()


ResNet model has fully connect (fc) final layer with 512 as ``in_features`` and we will be training for regression thus ``out_features`` as 2

Finally, we transfer our model for execution on the GPU

In [None]:

# ----- modify last layer for classification, and the model used in notebook should be modified as well.

if TRAIN_MODEL == 'mobilenet_v3_large':  # MobileNet
    model.classifier[3] = torch.nn.Linear(model.classifier[3].in_features, 2)  # for mobilenet_v3 model. must add the block expansion factor 4

elif TRAIN_MODEL == 'mobilenet_v2':
    model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, 2)  # for mobilenet_v2 model. must add the block expansion factor 4

elif TRAIN_MODEL == 'vgg11': # VGGNet
    model.classifier[6] = torch.nn.Linear(model.classifier[6].in_features, 2)  # for VGG model. must add the block expansion factor 4

elif 'resnet' in TRAIN_MODEL: # ResNet
    model.fc = torch.nn.Linear(model.fc.in_features, 2)  # for resnet model must add the block expansion factor 4
    # model.fc = torch.nn.Linear(512, 2)

elif TRAIN_MODEL == 'inception_v3':   # Inception_v3
    model.fc = torch.nn.Linear(model.fc.in_features, 2)
    if model.aux_logits:
        model.AuxLogits.fc = torch.nn.Linear(model.AuxLogits.fc.in_features, 2)

elif 'efficientnet' in TRAIN_MODEL:   # efficientnet
    model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, 2)  # for efficientnet_b1 
    # model.classifier[0] = torch.nn.Dropout(p=0.4, inplace=True)
    model.classifier


# ** you may use CPU or GPU for training
processor = 'GPU'
if processor == 'GPU':
    device = torch.device('cuda')
    print("torch cuda version : ", torch.version.cuda)
    print("cuda is available for pytorch: ", torch.cuda.is_available())    
elif processor == 'CPU':
    device = torch.device('cpu')
model = model.float()
model = model.to(device, dtype=torch.float)


### Train Regression:

We train for 70 epochs and save best model if the loss is reduced. 

In [None]:
%cd "../../jetbot/utils"
import time
from tqdm.notebook import tqdm
from training_profile import *

dir_training_records = os.path.join(dir_depo, 'training records', processor, TRAIN_MODEL)
os.makedirs(dir_training_records, exist_ok=True)

DIR_MODEL_REPO = os.path.join(dir_depo, 'model_repo', processor)
os.makedirs(DIR_MODEL_REPO, exist_ok=True)    
# BEST_MODEL_PATH = 'best_steering_model_xy.pth'
BEST_MODEL_PATH = os.path.join(DIR_MODEL_REPO, "best_steering_model_xy_" + TRAIN_MODEL + ".pth")

NUM_EPOCHS = 70
best_loss = 1e9

optimizer = getattr(optim, TRAIN_METHOD)(model.parameters(), weight_decay=0)
# optimizer = getattr(optim, TRAIN_METHOD)(model.parameters(), lr=0.01, momentum=0.95)

loss_data = []
lt_epoch = []  # learning time per epoch
lt_sample = []  # learning time per epoch

batch_size = len(train_loader)
pbar_overall_format = "{desc} {percentage:.2f}% | {bar} | elapsed: {elapsed}; estimated to finish: {remaining}"
pbar_overall = tqdm(total=100, bar_format = pbar_overall_format)
show_batch_progress = False   # Set True if need to show the batch learning progress in an epoch
show_training_plot = False  # Set True if need to show the converbent profile during training

best_loss = None

for epoch in range(NUM_EPOCHS):
    start_epoch = time.time()
    
    model.train()
    train_loss = 0.0
    
    if show_batch_progress:
        pbar_batch = tqdm(train_loader, total = batch_size)
    else:
        pbar_batch = iter(train_loader)
    for index, (images, labels) in enumerate(pbar_batch):
        start_sample = time.time()
        
        images = images.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
        if TRAIN_MODEL == 'inception_v3':
            outputs = model(images)
            loss_main = F.mse_loss(outputs.logits, labels, reduction='mean')
            loss_aux = F.mse_loss(outputs.aux_logits, labels, reduction='mean')
            loss = loss_main + 0.3 * loss_aux
        else:
            outputs = model(images)
            loss = F.mse_loss(outputs, labels, reduction='mean')
        train_loss += float(loss)
        loss.backward()
        optimizer.step()
        
        end_sample = time.time()
        lt_sample.append(end_sample - start_sample)
        
        pbar_overall.update(round(100/(NUM_EPOCHS*batch_size), 2))
        pbar_overall.set_description(desc = f'Overall progress - Epoch [{epoch+1}/{NUM_EPOCHS}]')
        pbar_overall.set_postfix(best_loss = best_loss, train_loss = train_loss/(index+1))
               
        if show_batch_progress:
            pbar_batch.set_description(desc = f'Progress in the epoch {epoch+1} ')
            pbar_batch.set_postfix(mean_batch_loss = train_loss/(index+1))
    
    train_loss /= len(train_loader)
    
    model.eval()
    test_loss = 0.0
    for images, labels in iter(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        loss = F.mse_loss(outputs, labels)
        test_loss += float(loss)
    test_loss /= len(test_loader)

    end_epoch = time.time()
    lt_epoch.append(end_epoch - start_epoch)
      
    if best_loss == None or test_loss < best_loss :
        torch.save(model.state_dict(), BEST_MODEL_PATH)
        best_loss = test_loss
    
    loss_data.append([train_loss, test_loss])
    
# function plot_loss(loss_data, best_loss, no_epoch, dir_training_records, train_model, train_method) is in jetbot.utils
    if epoch == NUM_EPOCHS:
        show_training_plot=True    
    plot_loss(loss_data=loss_data, best_loss=best_loss, no_epoch=NUM_EPOCHS,
              dir_training_records=dir_training_records, # the directory stored training records
              train_model=TRAIN_MODEL, train_method=TRAIN_METHOD, processor=processor, # this 2 parameters are for plot title only
              show_training_plot=show_training_plot)

overall_time = pbar_overall.format_dict["elapsed"]
# function lt_plot(lt_epoch, lt_sample, dir_training_records, train_model, train_method) is in jetbot.utils
lt_plot(lt_epoch=lt_epoch, lt_sample=lt_sample, overall_time=overall_time,
        dir_training_records=dir_training_records, # the directory stored training records
        train_model=TRAIN_MODEL, train_method=TRAIN_METHOD, processor=processor) # this 2 parameters are for plot title only

print('Training is completed! \n you can close the figures by restart the kernel!')

Once the model is trained, it will generate ``best_steering_model_xy_<TRAIN_MODEL>.pth`` file which you can use for inferencing in the live demo notebook.

If you trained on a different machine other than JetBot, you'll need to upload this to the JetBot to the ``road_following`` example folder.