# Side Walk Guide - Train Model

In this notebook we will train a neural network to take an input image, and output x values corresponding to a target.

We will be using PyTorch deep learning framework to train ResNet18 neural network architecture model for road follower application.

In [5]:
import torch
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision.transforms as transforms
import glob
import PIL.Image
import os
import numpy as np
import torch.utils.data


### Download and extract data

Before you start, you should upload the ``training_dataset.zip`` file that you created in the ``data_collection.ipynb``


In [6]:
!unzip training_data2.zip
!mv training_data1 training_dataset
# !unzip training_data2
# !mv training_data2/* training_dataset

Archive:  training_data2.zip
   creating: training_data2/
replace __MACOSX/._training_data2? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

### Create Dataset Instance

Here we create a custom ``torch.utils.data.Dataset`` implementation, which implements the ``__len__`` and ``__getitem__`` functions.  This class
is responsible for loading images and parsing the x, y values from the image filenames.  Because we implement the ``torch.utils.data.Dataset`` class,
we can use all of the torch data utilities

In [7]:
def get_x(file_name):
    """Gets the x value from the image filename"""
    token = file_name.split("-")
    return (float(int(token[1].split(".")[0]) - 300.0) * 5 / 300.0)

class XLableAugmentImageTrainData(torch.utils.data.Dataset):

    def __init__(self, directory, random_hflips=False):
        self.directory = directory
        self.random_hflips = random_hflips
        self.image_paths = glob.glob(os.path.join(self.directory, '*.jpg'))
        self.augmentation = transforms.Compose([
            transforms.ColorJitter(0.3, 0.3, 0.3, 0.3),
            transforms.RandomRotation(degrees=30),
            transforms.RandomVerticalFlip(p=0.5),
            transforms.RandomApply([transforms.ColorJitter(0.5, 0.5, 0.5, 0.5)], p=0.5),
            transforms.RandomApply([transforms.Grayscale(num_output_channels=3)], p=0.2),
        ])

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image_path = self.image_paths[idx]

        image = PIL.Image.open(image_path)
        x = float(get_x(os.path.basename(image_path)))

        if self.random_hflips and float(np.random.rand(1)) > 0.5:
            image = transforms.functional.hflip(image)
            x = -x

        # Apply augmentations
        image = self.augmentation(image)

        image = transforms.functional.resize(image, (224, 224))
        image = transforms.functional.to_tensor(image)
        image = image.numpy()[::-1].copy()
        image = torch.from_numpy(image)
        image = transforms.functional.normalize(image, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

        return image, torch.tensor([x]).float()

# Example usage
dataset = XLableAugmentImageTrainData('training_dataset', random_hflips=True)

for i in range(5):
    image, x = dataset[i]
    print(f"Sample {i+1} - Image Shape: {image.shape}, x: {x.item()}")
total_samples = len(dataset)
print(f"Total number of samples in the dataset: {total_samples}")


Sample 1 - Image Shape: torch.Size([3, 224, 224]), x: -1.649999976158142
Sample 2 - Image Shape: torch.Size([3, 224, 224]), x: -0.23333333432674408
Sample 3 - Image Shape: torch.Size([3, 224, 224]), x: -4.300000190734863
Sample 4 - Image Shape: torch.Size([3, 224, 224]), x: 1.4166666269302368
Sample 5 - Image Shape: torch.Size([3, 224, 224]), x: -2.6500000953674316
Total number of samples in the dataset: 201


### Split dataset into train and test sets
Once we read dataset, we will split data set in train and test sets. In this example we split train and test a 80%-20%. The test set will be used to verify the accuracy of the model we train.

In [8]:
test_percent = 0.2
num_test = int(test_percent * len(dataset))
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [len(dataset) - num_test, num_test])

### Create data loaders to load data in batches

We use ``DataLoader`` class to load data in batches, shuffle data and allow using multi-subprocesses. In this example we use batch size of 64. Batch size will be based on memory available with your GPU and it can impact accuracy of the model.

In [9]:
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=16,
    shuffle=True,
    num_workers=2
)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=16,
    shuffle=True,
    num_workers=2
)

### Define Neural Network Model

We use ResNet-18 model available on PyTorch TorchVision.

In a process called transfer learning, we can repurpose a pre-trained model (trained on millions of images) for a new task that has possibly much less data available.


More details on ResNet-18 : https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py

More Details on Transfer Learning: https://www.youtube.com/watch?v=yofjFQddwHE

In [11]:
model = models.resnet18(pretrained=True)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 123MB/s]


ResNet model has fully connect (fc) final layer with 512 as ``in_features`` and we will be training for regression thus ``out_features`` as 1

Finally, we transfer our model for execution on the GPU

In [12]:
model.fc = torch.nn.Linear(512, 1)
device = torch.device('cuda')
model = model.to(device)

### Train Regression:

We train for 50 epochs and save best model if the loss is reduced.

In [13]:
NUM_EPOCHS = 50
CHECKPOINT_PATH = 'checkpoint.pth'
best_loss = 1e9

optimizer = optim.Adam(model.parameters())

for epoch in range(NUM_EPOCHS):
    model.train()
    train_loss = 0.0

    # Initialize counters for debugging
    total_train_batches = len(train_loader)
    train_batch_count = 0

    for images, labels in iter(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = F.mse_loss(outputs, labels)
        train_loss += float(loss)
        loss.backward()
        optimizer.step()

        # Print detailed training information for each batch
        train_batch_count += 1
        print(f"Epoch [{epoch + 1}/{NUM_EPOCHS}] - Batch [{train_batch_count}/{total_train_batches}] - Train Loss: {loss:.4f}")

    train_loss /= len(train_loader)

    model.eval()
    test_loss = 0.0

    # Initialize counters for debugging
    total_test_batches = len(test_loader)
    test_batch_count = 0

    for images, labels in iter(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        loss = F.mse_loss(outputs, labels)
        test_loss += float(loss)

        # Print detailed testing information for each batch
        test_batch_count += 1
        print(f"Epoch [{epoch + 1}/{NUM_EPOCHS}] - Batch [{test_batch_count}/{total_test_batches}] - Test Loss: {loss:.4f}")

    test_loss /= len(test_loader)

    print(f'Epoch [{epoch + 1}/{NUM_EPOCHS}] - Train Loss: {train_loss:.4f}, Test Loss: {test_loss:.4f}')

    if test_loss < best_loss:
        torch.save(model.state_dict(), CHECKPOINT_PATH)
        best_loss = test_loss
        print(f"---<checkpoint saved>---{best_loss}")


Epoch [1/50] - Batch [1/11] - Train Loss: 7.9862
Epoch [1/50] - Batch [2/11] - Train Loss: 5.6437
Epoch [1/50] - Batch [3/11] - Train Loss: 7.7024
Epoch [1/50] - Batch [4/11] - Train Loss: 8.2697
Epoch [1/50] - Batch [5/11] - Train Loss: 5.4610
Epoch [1/50] - Batch [6/11] - Train Loss: 5.9512
Epoch [1/50] - Batch [7/11] - Train Loss: 6.5032
Epoch [1/50] - Batch [8/11] - Train Loss: 6.0601
Epoch [1/50] - Batch [9/11] - Train Loss: 4.3758
Epoch [1/50] - Batch [10/11] - Train Loss: 6.8699
Epoch [1/50] - Batch [11/11] - Train Loss: 1.0570
Epoch [1/50] - Batch [1/3] - Test Loss: 59.8005
Epoch [1/50] - Batch [2/3] - Test Loss: 49.9575
Epoch [1/50] - Batch [3/3] - Test Loss: 42.4331
Epoch [1/50] - Train Loss: 5.9891, Test Loss: 50.7304
---<checkpoint saved>---50.73037083943685
Epoch [2/50] - Batch [1/11] - Train Loss: 2.7954
Epoch [2/50] - Batch [2/11] - Train Loss: 5.4013
Epoch [2/50] - Batch [3/11] - Train Loss: 3.0810
Epoch [2/50] - Batch [4/11] - Train Loss: 3.8367
Epoch [2/50] - Batch [5

Once the model is trained, it will generate ``checkpoint.pth`` file will save the model with lowest loss function. And this model can be used for prediction.