# Classification of Pet's Real-Life Images

Lab Assignment from [AI for Beginners Curriculum](https://github.com/microsoft/ai-for-beginners).

Now it's time to deal with more challenging task - classification of the original [Oxford-IIIT Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/). Let's start by loading and visualizing the dataset.

In [None]:
# !wget https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
# !tar xfz images.tar.gz
# !rm images.tar.gz

We will define generic function to display a series of images from a list:

In [None]:
import matplotlib.pyplot as plt
import os
from PIL import Image
import numpy as np

# def display_images(l,titles=None,fontsize=12):
#     n=len(l)
#     fig,ax = plt.subplots(1,n)
#     for i,im in enumerate(l):
#         ax[i].imshow(im)
#         ax[i].axis('off')
#         if titles is not None:
#             ax[i].set_title(titles[i],fontsize=fontsize)
#     fig.set_size_inches(fig.get_size_inches()*n)
#     plt.tight_layout()
#     plt.show()

You can see that all images are located in one directory called `images`, and their name contains the name of the class (breed):

In [None]:
# fnames = os.listdir('images')[:5]
# display_images([Image.open(os.path.join('images',x)) for x in fnames],titles=fnames,fontsize=30)

To simplify classification and use the same approach to loading images as in the previous part, let's sort all images into corresponding directories:

In [None]:
# for fn in os.listdir('images'):
#     cls = fn[:fn.rfind('_')].lower()
#     os.makedirs(os.path.join('images',cls),exist_ok=True)
#     os.replace(os.path.join('images',fn),os.path.join('images',cls,fn))

Let's also define the number of classes in our dataset:

In [None]:
num_classes = len(os.listdir('images'))
num_classes

## Preparing dataset for Deep Learning

To start training our neural network, we need to convert all images to tensors, and also create tensors corresponding to labels (class numbers). Most neural network frameworks contain simple tools for dealing with images:
* In Tensorflow, use `tf.keras.preprocessing.image_dataset_from_directory`
* In PyTorch, use `torchvision.datasets.ImageFolder`

As you have seen from the pictures above, all of them are close to square image ratio, so we need to resize all images to square size. Also, we can organize images in minibatches.

In [None]:
# PREPARE THE DATASET
from torchvision import transforms, datasets
from torch.utils.data import DataLoader


std_normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                          std=[0.229, 0.224, 0.225])
trans = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(256),
        transforms.ToTensor(), 
        std_normalize])

dataset = datasets.ImageFolder('images', transform= trans)
print(len(dataset))

Now we need to separate dataset into train and test portions:

In [None]:
# SPLIT INTO TRAIN-TEST DATASETS
from torch.utils.data import random_split

# Calculate the lengths of splits
total_len = len(dataset)
train_len = int(total_len * 0.67)
test_len = total_len - train_len

# Create the random splits
train_dataset, test_dataset = random_split(dataset, [train_len, test_len])


Now define data loaders:

In [None]:
# DEFINE DATA LOADERS if needed
# Create data loaders
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=True)
print(len(train_dataloader))
print(len(test_dataloader))

In [None]:
# [OPTIONAL] Plot the dataset

## Define a neural network

For image classification, you should probably define a convolutional neural network with several layers. What to keep an eye for:
* Keep in mind the pyramid architecture, i.e. number of filters should increase as you go deeper
* Do not forget activation functions between layers (ReLU) and Max Pooling
* Final classifier can be with or without hidden layers, but the number of output neurons should be equal to number of classes.

An important thing is to get the activation function on the last layer + loss function right:
* In Tensorflow, you can use `softmax` as the activation, and `sparse_categorical_crossentropy` as loss. The difference between sparse categorical cross-entropy and non-sparse one is that the former expects output as the number of class, and not as one-hot vector.
* In PyTorch, you can have the final layer without activation function, and use `CrossEntropyLoss` loss function. This function applies softmax automatically. 

> **Hint:** In PyTorch, you can use `LazyLinear` layer instead of `Linear`, in order to avoid computing the number of inputs. It only requires one `n_out` parameter, which is number of neurons in the layer, and the dimension of input data is picked up automatically upon first `forward` pass.

In [None]:
import torch
from torchinfo import summary

torch.__version__
# Kaleb code
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")
print(torch.cuda.get_device_name(0))

In [None]:
# DEFINE NEURAL NETWORK ARCHITECTURE
import torch.nn as nn
class KeiNet(nn.Module):
    def __init__(self):
        super(KeiNet, self).__init__()
        self.conv = nn.Conv2d(in_channels=3,out_channels=9,kernel_size=(5,5))
        self.flatten = nn.Flatten()
        self.fc = nn.Linear(571536,37)

    def forward(self, x):
        # [3, 256, 256]
        x = nn.functional.relu(self.conv(x))
        # [9, 252, 252]
        x = self.flatten(x)
        # [571536]
        x = nn.functional.log_softmax(self.fc(x),dim=1)
        # [37]
        return x

test_net = KeiNet().to(device)
print(summary(test_net,input_size=(1,3,256,256)))

class KeiNet2(nn.Module):
    def __init__(self):
        super(KeiNet2, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3,out_channels=8,kernel_size=(3,3))
        self.pool = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(in_channels=8,out_channels=32,kernel_size=(3,3))
        self.conv3 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size=(3,3))
        self.conv4 = nn.Conv2d(in_channels=64,out_channels=128,kernel_size=(3,3))
        self.conv5 = nn.Conv2d(in_channels=128,out_channels=256,kernel_size=(3,3))
        # self.conv6 = nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=(3,3))
        self.flat = nn.Flatten()
        self.fc1 = nn.Linear(36864,1024)
        self.fc2 = nn.Linear(1024,512)
        self.fc3 = nn.Linear(512,37)

    def forward(self, x):
        # [3, 256, 256]
        x = nn.functional.relu(self.conv1(x))
        # [8, 254, 254]
        x = self.pool(x)
        # [8, 127, 127]
        x = nn.functional.relu(self.conv2(x))
        # [32, 125, 125]
        x = self.pool(x)
        # [32, 62, 62]
        x = nn.functional.relu(self.conv3(x))
        # [64, 60, 60]
        x = self.pool(x)
        # [64, 30, 30]
        x = nn.functional.relu(self.conv4(x))
        # [128, 28, 28]
        x = self.pool(x)
        # [256, 14, 14]
        x = nn.functional.relu(self.conv5(x))
        # [256, 12, 12]
        
        
        

        
        x = self.flat(x)
        # [1, 230400]
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        
        
        return x
test_net_adv = KeiNet2().to(device)
print(summary(test_net_adv,input_size=(1,3,256,256)))

## Train the Neural Network

Now we are ready to train the neural network. During training, please collect accuracy on train and test data on each epoch, and then plot the accuracy to see if there is overfitting.

In [None]:
# TRAIN THE NEURAL NETWORK CODE AND PLOT CODE
def train(net, dataloader_train, dataloader_test, epochs = 25, lr = 0.001, verbose = False):
    # Initialize output
    train_loss, train_acc = [],[]
    test_loss, test_acc = [],[]

    # Select necissary loss functions, optimizer, etc.
    loss_fn = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters(), lr=lr)

    # Begin training process
    for ep in range(epochs):
        train_loss_list, test_loss_list = [], []
        sum_train_acc, sum_test_acc = [],[]

        # Training
        net.train()
        for X, y in dataloader_train:
            X, y = X.to(device), y.to(device)

            # Make predictions and loss calculations
            pred = net(X)
            loss = loss_fn(pred, y)
            train_loss_list.append(loss.item())

            # Back prop
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Batch accuracy calculations
            class_pred = torch.argmax(pred, axis=1)
            correct = (class_pred == y).sum()
            curr_acc = correct / len(y)
            sum_train_acc.append(curr_acc)

        train_loss.append(loss.item())
        temp_acc = (sum(sum_train_acc) / len(sum_train_acc)).cpu().detach().numpy()
        train_acc.append(temp_acc)
        if verbose:
            print(f"Epoch {ep} TRAIN last loss: {round(loss.item(), 4)}, average accuracy: {round(temp_acc * 100,4)}%")
        
        # Evaluating
        net.eval()
        for X, y in dataloader_test:
            X, y = X.to(device), y.to(device)

            # Make predictions and loss calculations
            pred = net(X)
            loss = loss_fn(pred, y)

             # Batch accuracy calculations
            class_pred = torch.argmax(pred, axis=1)
            correct = (class_pred == y).sum()
            curr_acc = correct / len(y)
            sum_test_acc.append(curr_acc)

        test_loss.append(loss.item())
        temp_acc = (sum(sum_test_acc) / len(sum_test_acc)).cpu().detach().numpy()
        test_acc.append(temp_acc)
        if verbose:
            print(f"Epoch {ep} TEST last loss: {round(loss.item(),4)}, average accuracy: {round(temp_acc * 100,4)}%")
        
            
    
    return train_loss, train_acc, test_loss, test_acc



def graph_loss_acc(train_loss_list, train_acc_list, test_loss_list, test_acc_list):
    epochs = range(1, len(train_loss_list) + 1)
    plt.plot(epochs, train_loss_list, label='Training Loss')
    plt.plot(epochs, test_loss_list, label='Testing Loss')
    plt.legend()
    plt.xlabel('Epoch')
    plt.ylabel('Value')
    plt.title('Loss')
    plt.show()
    plt.plot(epochs, train_acc_list, label='Training Accuracy')
    plt.plot(epochs, test_acc_list, label='Testing Accuracy')
    plt.legend()
    plt.xlabel('Epoch')
    plt.ylabel('Percent')
    plt.title('Training Accuracy')
    plt.show()

In [None]:
# TRAIN AND PLOT
convnet = KeiNet2().to(device)
epochs = 10
lr = 0.0005
train_loss_list, train_acc_list, test_loss_list, test_acc_list = train(convnet, train_dataloader, test_dataloader, epochs=epochs, lr=lr, verbose=True)
graph_loss_acc(train_loss_list, train_acc_list, test_loss_list, test_acc_list)

Even if you have done everything correctly, you will probably see that the accuracy is quite low.

## Transfer Learning

To improve the accuracy, let's use pre-trained neural network as feature extractor. Feel free to experiment with VGG-16/VGG-19 models, ResNet50, etc.

> Since this training is slower, you may start with training the model for the small number of epochs, eg. 3. You can always resume training to further improve accuracy if needed.

We need to normalize our data differently for transfer learning, thus we will reload the dataset again using different set of transforms:

In [None]:
# LOAD THE DATASET
# Perform standard transformations for VGG-16/VGG-19 if needed 
std_normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                          std=[0.229, 0.224, 0.225])
trans = transforms.Compose([
        transforms.Resize(244),
        transforms.CenterCrop(244),
        transforms.ToTensor(), 
        std_normalize])

dataset = datasets.ImageFolder('images', transform= trans)
print(len(dataset))

train_dataset_vgg, test_dataset_vgg = random_split(dataset, [train_len, test_len])


train_dataloader_vgg = DataLoader(train_dataset_vgg, batch_size=32, shuffle=True)
test_dataloader_vgg = DataLoader(test_dataset_vgg, batch_size=32, shuffle=True)
print(len(train_dataloader_vgg))
print(len(test_dataloader_vgg))

Let's load the pre-trained network:

In [None]:
import torchvision
vgg = torchvision.models.vgg16(pretrained=True)
summary(vgg,input_size=(1,3,244,244))

Now define the classification model for your problem:
* In PyTorch, there is a slot called `classifier`, which you can replace with your own classifier for the desired number of classes.
* In TensorFlow, use VGG network as feature extractor, and build a `Sequential` model with VGG as first layer, and your own classifier on top

In [None]:
# BUILD MODEL for your problem with your own linear layers
vgg.classifier = torch.nn.Linear(512*7*7,37).to(device)


summary(vgg,(1, 3,244,244))

Make sure to set all parameters of VGG feature extractor not to be trainable

In [None]:
# MAKE VGG Layers not trainable
for x in vgg.features.parameters():
    x.requires_grad = False

Now we can start the training. Be very patient, as training takes a long time, and our train function is not designed to print anything before the end of the epoch.

In [None]:
# TRAIN THE MODEL
epochs = 3
lr = 0.0005
train_loss_list_vgg, train_acc_list_vgg, test_loss_list_vgg, test_acc_list_vgg = train(vgg, train_dataloader_vgg, test_dataloader_vgg, epochs=epochs, lr=lr, verbose=True)
graph_loss_acc(train_loss_list_vgg, train_acc_list_vgg, test_loss_list_vgg, test_acc_list_vgg)

It seems much better now!

## Optional: Calculate Top 3 Accuracy

We can also computer Top 3 accuracy using the same code as in the previous exercise.


In [None]:
# CALCULATE TOP-3 Accuracy of the model