# Week 1: Introduction to Computer Vision

## Notebook 2: Keypoint Localization with a Convolutional Neural Network using PyTorch

Welcome to the second notebook of this week's Applied AI Study Group! We will study localization problem with [Facial Keypoints Dataset](https://www.kaggle.com/c/facial-keypoints-detection/data) provided by [Kaggle](https://www.kaggle.com/). The aim of our task will be to locate distinguishing points on human faces.

### 1. Localization

Localization is one of the core tasks of Computer Vision. In this task, we aim to find the locations of objects in the given data. Then, we indicate their position in various representations based on our task. For instance, if we are looking for dogs in given images, we represent their location with bounding boxes. In our case, we will locate the keypoints of human faces, hence, we will have (x, y) coordinate representation as output vectors. An example of localization task can be seen in the following image:

![Localization example](./images/localization.png "Example Localization")

### 2. Facial Keypoints Dataset

Our dataset is provided by Kaggle. It consists of CSV files. Training data contains 7049 images. Each row of the CSV table contains (x, y) coordinates of 15 keypoints we want to detect per face, and input image as row-ordered list of pixels in the last column of each row. Examples of our dataset will be shown in upcoming code cells.

The following first three code cells can be used to download facial keypoints dataset we will use for this notebook using Kaggle API.

### 3. Imports and Checks

You should have installed Numpy and Matplotlib using `pip` and, PyTorch using [Week 0 - Notebook 2](https://github.com/inzva/Applied-AI-Study-Group/blob/add-frameworks-week/Applied%20AI%20Study%20Group%20%236%20-%20January%202022/Week%200/2-mnist_classification_convnet_pytorch.ipynb).

As we introduce first time today, you can install `pandas` via `pip` as well.

    pip install pandas

As we already stated, the following first three code cells can be used to download facial keypoints dataset we will use for this notebook using Kaggle API.

In [None]:
#!kaggle competitions download -c facial-keypoints-detection

In [None]:
# if keypoint dataset is unzipped
#!unzip facial-keypoints-detection.zip
#!ls

In [None]:
# if training is unzipped
#!unzip training.zip
#!ls

If you are using `Google Colab`, you can use the following commented line to import drive into this notebook in runtime. For further information: [Google Colab](https://colab.research.google.com/?utm_source=scs-index#scrollTo=GJBs_flRovLc)

In [None]:
import pandas as pd
# from google.colab import drive
import numpy as np
import matplotlib.pyplot as plt
import random
import torch

If the following cell runs successfully, then, you are good to go.

In [None]:
print(torch.__version__)
print(pd.__version__)
print(np.__version__)

We explore our dataset further in the following cell. First, we load our dataset using `Pandas`. Pandas is a great library dealing with `CSV` files. Then, we print the columns and the first 5 line of our dataset to check what our data looks like.

In the third print line, we check if there is any `null` values in our dataset which may hurt our training in later phases. As a data preprocessing step, we utilize pandas function `fillna` to change these `null` values into the last valid observation we made. It is similar to nearest neighbor interpolation. For further information: [fiilna](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html).

Then, we check if there is any `null` left.

In [None]:
data = pd.read_csv('./datasets/facial_keypoints/training.csv')
print(data.columns)
print(data.head())
print(data.isnull().any().value_counts())
data.fillna(method = 'ffill',inplace = True)
print(data.isnull().any().value_counts())

In addition to `null` values, we may have empty image pixels in the given dataset. We fill those values with `0`, so that, we can iterate over our dataset without any type error or empty pixel error.

In [None]:
num_train_data = len(data)
pixel_list = []
for i in range(num_train_data):
    row = data['Image'][i].split(' ')
    pixel = ['0' if x == '' else x for x in row] # handling empty image pixels
    pixel_list.append(pixel)

In `PyTorch`, the image data is represented by (B, C, W, H) where B = batch size, C = number of channels, W = width of the image, H = height of the image. In our case, the dataset is represented by (B, W, H, C). So, we need to swap the dimensions to properly process the data throughout our model.

In [None]:
# Pytorch takes channels in the second dimension. For that, I swap axes (dimensions)
image_tensor = np.array(pixel_list, dtype = 'float')
print(np.shape(image_tensor))
image_tensor = image_tensor.reshape(-1, 96, 96, 1)
image_tensor = np.swapaxes(image_tensor, 2, 3)
image_tensor = np.swapaxes(image_tensor, 1, 2)
print(np.shape(image_tensor))

Here, we separate our input data into the training image and training ground-truth data. To do that, we remove the `Image` column from our dataset, then, the columns left are human face keypoint coordinates which are ground-truths.

In [None]:
labels = data.drop('Image',axis = 1)

label_list = []
for i in range(num_train_data):
    label = labels.iloc[i,:]
    label_list.append(label)
label_tensor = np.array(label_list,dtype = 'float')

We visualize some of them to understand what our task looks like. Also, to double-check if every processing we make so far is correct.

In [None]:
from matplotlib.pyplot import Circle

index = random.randint(0,1000)

fig, ax = plt.subplots(1)
ax.set_aspect('equal')

ax.imshow(image_tensor[index].reshape(96,96),cmap='gray')

for xx, yy in label_tensor[index].reshape((15,2)):
    circ = Circle((xx,yy),2,color='red')
    ax.add_patch(circ)

Since we will not upload our model into the `Kaggle`, we separate some of our training images into the another set. Because we will use them as our test images. Then, we create our dataloaders for training and testing.

In [None]:
train_len = 6000
img_and_label = []
for i in range(train_len):
    img_and_label.append([image_tensor[i], label_tensor[i]])

# we use Dataloader objects in pytorch to easily iterate on our dataset while performing training loops
train_loader = torch.utils.data.DataLoader(img_and_label, shuffle=True, batch_size=500)
img1, lbl1 = next(iter(train_loader))
print("first training batch: \n" + "input shape: " + str(img1.shape) + "\n" + "label shape: " + str(lbl1.shape))

test_data = []
for i in range(train_len, num_train_data): # since we have no labels for real test data!
    test_data.append([image_tensor[i], label_tensor[i]])

test_loader = torch.utils.data.DataLoader(test_data, shuffle=True, batch_size=500)
test1, tlbl1 = next(iter(test_loader))
print("test batch: \n" + "input shape: " + str(test1.shape) + "\n" + "label shape: " + str(tlbl1.shape))

Using PyTorch, we build our model in the following cell. Our model consists of 2D Convolution filters, Max pooling, Batch Normalization layers and Leaky ReLU as our activation function in the phase of feature extraction. Then, we utilize Linear layers, 1D Batch normalization, and Sigmoid activation function for further keypoint regression. Be careful that we do not use sigmoid layer in the final layer since we want to regress exact positions but not [0, 1] projected coordinates. Also, we flatten our mid-layer feature vector while we move forward from 2D feature extraction stage to 1D regression stage.

In [None]:
import torch.nn as nn
import torch.nn.functional as F

# we write our networks as class instances. dont forget to inherit from nn.Module
class Net(nn.Module):
    # we always need an init method to define our output matrices (similar to nodes in graph)
    def __init__(self):
        super(Net, self).__init__()
        
        self.max_pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.leaky_relu = nn.LeakyReLU(0.1)
        
        self.conv1 = nn.Conv2d(1, 32, 5) #1, 32
        self.conv1_bn = nn.BatchNorm2d(32)
        
        self.conv2 = nn.Conv2d(32, 64, 5) #32, 64
        self.conv2_bn = nn.BatchNorm2d(64)

        #self.conv3 = nn.Conv2d(16, 32, 5) #64, 128
        #self.conv3_bn = nn.BatchNorm2d(32)

        #self.conv4 = nn.Conv2d(32, 64, 5) #128, 256
        #self.conv4_bn = nn.BatchNorm2d(64)
        
        self.fc1 = nn.Linear(64 * 21 * 21, 120)
        self.fc1_bn = nn.BatchNorm1d(120)
        self.fc2 = nn.Linear(120, 84)
        self.fc2_bn = nn.BatchNorm1d(84)
        self.fc3 = nn.Linear(84, 30)
        
    # we always need an forward method to draw our computational graph (similar to completing the graph with edges)
    def forward(self, x):

        x = self.max_pool(self.leaky_relu(self.conv1_bn(self.conv1(x))))
        x = self.max_pool(self.leaky_relu(self.conv2_bn(self.conv2(x))))
        #x = self.leaky_relu(self.conv3_bn(self.conv3(x)))
        #x = self.max_pool(self.leaky_relu(self.conv4_bn(self.conv4(x))))

        # vectorize (flatten)
        x = x.reshape(-1, 64 * 21 * 21)
        #x = torch.flatten(x)
        #x = torch.reshape(x, (input_shape, -1))
        x = torch.sigmoid(self.fc1_bn(self.fc1(x)))        
        x = torch.sigmoid(self.fc2_bn(self.fc2(x)))
        x = self.fc3(x)
        return x

inzvaNet = Net()

We specify our loss function, optimizer, and the number of epochs we will train our model in the following cell.

In [None]:
import torch.optim as optim

criterion = nn.MSELoss()
optimizer = optim.SGD(inzvaNet.parameters(), lr=0.0001, momentum=0.9)
num_epoch = 10

In [None]:
inzvaNet = inzvaNet.float()

We switch from CPU to GPU if we have any available GPU devices. It will bring a huge increase in inference time for our model training. 

Note: Without GPU, it will take much time to train this model.

In [None]:
#we get info on our gpu, put it in the variable "device"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)

#we carry our model into gpu
inzvaNet.to(device)

Then, we will initiate our training loop. We iterate through our dataset and update our model. Also, we can observe the improvement of our model with a decrease in the loss values throughout the training.

In [None]:
# our training loop
# check for free memory option -> this code may lead to memory explosion if grads are not cleared, etc.
for epoch in range(num_epoch):  # loop over the dataset multiple times

    running_loss = 0.0

    # here we use the dataloader object. it performs .next() operation in each iteration of the loop
    # we also group our batches with numbers. we do this with enumerate. we do this so we can know in which batch we are 
    for i, data in enumerate(train_loader, start = 0):
        # get the inputs; data is a list of [inputs, labels]

        #inputs, labels = data
        inputs, labels = data[0].float().to(device), data[1].float().to(device)
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = inzvaNet(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 100 == 0:    # print every 2000 mini-batches
            print('Epoch %d Loss: %.3f' %
                  (epoch + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

We will test our trained model, so that, we can have a better understanding where our model fails and where our model performs better.

In [None]:
rand_test = random.randint(0,49)
test_batch = next(iter(test_loader))

test_batch_data = test_batch[0].float().to(device)
test_batch_label = test_batch[1].float().to(device)

preds = inzvaNet(test_batch_data).cpu()

fig, ax = plt.subplots(1)
ax.set_aspect('equal')

ax.imshow(test_batch_data[rand_test].cpu().view((96,96)), cmap = 'gray')

for xx, yy in preds[rand_test].reshape((15,2)):
    circ = Circle((xx, yy), 2, color='red')
    ax.add_patch(circ)

In [None]:
fig, ax = plt.subplots(1)
ax.set_aspect('equal')
ax.imshow(test_batch_data[rand_test].cpu().view((96,96)), cmap = 'gray')

for xx, yy in test_batch_label[rand_test].reshape((15,2)):
    circ = Circle((xx, yy), 2, color='red')
    ax.add_patch(circ)