#  **ICT303 - Assignment 2**

**Your name: <enter here your full name>**

**Student ID: <enter here your student ID>** 

**Email: <enter here your email address>** 

In this assignment, you will build a deep learning model for identifying $120$ different breeds of dogs. Similar to the previous assignment, you will use real images from the [Kaggle competition](https://www.kaggle.com/c/dog-breed-identification). 

In this assignment, your are required to use a ResNet network. You can use ResNet implementation provided in PyTorch. Note however  that there are many versions of ResNet (they differ in terms of number of layers). Your task is to find the best configuration that gives the best performance. 

The rule is similar to the previous assignment:

1. Develop a better model to reduce the recognition error.  
2. Submit your results to Kaggle and take a sceenshot of your score. Then insert here the screenshot of your result. 

It is important that you start as earlier as possible. Tuning hyper-parameters takes time, and Kaggle limits the number of submissions per day.

The top 3 students in the Kaggle ranking will be invited for a coffee!

## **1. Obtaining and Organizing the Data Set**

The competition data is divided into a training set and testing set:
- The training set contains $10,222$ color images.
- The testing set contains $10,357$ color images. 

The images in both sets are in JPEG format. Each image contains three channels (R, G and B). The images have  different heights and widths.

There are $120$ breeds of dogs in the training set, e.g., *Labradors, Poodles, Dachshunds,
Samoyeds, Huskies, Chihuahuas, and Yorkshire Terriers*.

### **1.1. Downloading the Data Set**

After logging in to Kaggle, click on the “Data” tab on the dog breed identification competition webpage and download:
- the training data set `train.zip` and their corresponing labels `label.csv.zip`,
- the testing data set `test.zip`, 

After downloading the files, place them in the three paths below:
- kaggle_dog/train.zip
- kaggle_dog/test.zip
- kaggle_dog/labels.csv.zip

Run the code below to extract the data. 

In [None]:
import zipfile

data_dir = './kaggle_dog'

zipfiles = ['train.zip', 'test.zip', 'labels.csv.zip']
for f in zipfiles:
  with zipfile.ZipFile(data_dir + '/' + f, 'r') as z:
    z.extractall(data_dir)

In [2]:
data_dir = '../MLData-dog-breed-image-recognition'

### **1.2. Organizing the Data Set**

Next, we define the reorg_train_valid function to split the validation set from the original Kaggle competition training set. The parameter valid_ratio in this function is the ratio of the number of examples of each dog breeds in the validation set to the number of examples of the
breed with the least examples (66) in the original training set. 

After organizing the data, images of the same breed will be placed in the same folder so that we can read them later.

In [None]:
# Let's first install d2l package, since we will need some functions from this package
! pip install d2l==1.0.0a1.post0

In [None]:
! pip install torch
! pip install torchvision

In [3]:
def mkdir_if_not_exist(path):
    if not isinstance(path, str):
        path = os.path.join(*path)
    os.makedirs(path, exist_ok=True)

In [5]:
import collections
import d2l
import shutil
import os
import math

def reorg_train_valid(data_dir, train_dir, input_dir, valid_ratio, idx_label):
  # The number of examples of the least represented breed in the training set.
  min_n_train_per_label = (
      collections.Counter(idx_label.values()).most_common()[:-2:-1][0][1])
  
  # The number of examples of each breed in the validation set.
  n_valid_per_label = math.floor(min_n_train_per_label * valid_ratio)
  label_count = {}
  for train_file in os.listdir(os.path.join(data_dir, train_dir)):
    idx = train_file.split('.')[0]
    label = idx_label[idx]

    mkdir_if_not_exist([data_dir, input_dir, 'train_valid', label])
    
    shutil.copy(os.path.join(data_dir, train_dir, train_file),
                os.path.join(data_dir, input_dir, 'train_valid', label))
    
    if label not in label_count or label_count[label] < n_valid_per_label:
      mkdir_if_not_exist([data_dir, input_dir, 'valid', label])
      shutil.copy(os.path.join(data_dir, train_dir, train_file),
                  os.path.join(data_dir, input_dir, 'valid', label))
      label_count[label] = label_count.get(label, 0) + 1
      
    else:
      mkdir_if_not_exist([data_dir, input_dir, 'train', label])
      shutil.copy(os.path.join(data_dir, train_dir, train_file),
                  os.path.join(data_dir, input_dir, 'train', label))

The `reorg_dog_data` function below is used to read the training data labels, segment the validation set, and organize the training set.

In [6]:
def reorg_dog_data(data_dir, label_file, train_dir, test_dir, input_dir, valid_ratio):
  # Read the training data labels.
  with open(os.path.join(data_dir, label_file), 'r') as f:
    # Skip the file header line (column name).
    lines = f.readlines()[1:]
    tokens = [l.rstrip().split(',') for l in lines]
    idx_label = dict(((idx, label) for idx, label in tokens))
  
  reorg_train_valid(data_dir, train_dir, input_dir, valid_ratio, idx_label)

  # Organize the training set.
  mkdir_if_not_exist([data_dir, input_dir, 'test', 'unknown'])
  for test_file in os.listdir(os.path.join(data_dir, test_dir)):
    shutil.copy(os.path.join(data_dir, test_dir, test_file),
                os.path.join(data_dir, input_dir, 'test', 'unknown'))

During actual training and testing, we would use the entire Kaggle Competition data set and call the reorg_dog_data function to organize the data set. Likewise, we would need to set the batch_size to a larger integer, such as 128.

In [7]:
label_file, train_dir, test_dir = 'labels.csv', 'train', 'test'
input_dir, batch_size, valid_ratio = 'train_valid_test', 128, 0.1
reorg_dog_data(data_dir, label_file, train_dir, test_dir, input_dir, valid_ratio)

In [8]:
import torch
import torchvision.datasets as datasets
import torchvision.transforms as transforms

def ComputeImageDatasetStats(image_dir):

  transform = transforms.Compose([transforms.ToTensor()])
  # ImageFolder expects a certain directory structure where each class
  # corresponds to one folder, and all the images belonging to that class
  # reside within that folder. So, within your train directory, there should
  # be separate directories for each class, each containing the corresponding
  # images.
  dataset = datasets.ImageFolder(root=image_dir, transform=transform)
  dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False)

  mean = 0.0
  for images, _ in dataloader:
      batch_samples = images.size(0)
      images = images.view(batch_samples, images.size(1), -1)

      # calculates the mean along the 2nd dimension of your images tensor.
      # In the context of image data, tensors are usually in the shape
      # (Batch size, Channels, Height, Width), so this operation computes
      # the mean of every channel in each image in the batch.
      # sum(0) part sums up the calculated means along the 0th dimension
      # (i.e., adds up the means of all images in the batch).
      mean += images.mean(2).sum(0)
  mean = mean / len(dataloader.dataset)

  var = 0.0
  total_elements = 0
  for images, _ in dataloader:
      batch_samples = images.size(0)
      images = images.view(batch_samples, images.size(1), -1)
      # The unsqueeze operation is adding an extra dimension at the
      # 1st index of the tensor mean. This operation is performed to make
      # the mean tensor align correctly with the images tensor for the
      # upcoming subtraction operation.
      # The sum([0,2]) operation sums up the values along the 0th and
      # 2nd dimensions of the tensor. For an image tensor with shape
      # (Batch size, Channels, Height, Width), this means it's adding up the
      # squared differences for each image in the batch and for each pixel
      # in each channel.
      total_elements += images.numel()/images.size(1)  # Calculate total pixel count
      var += ((images - mean.unsqueeze(1))**2).sum([0,2])
  std = torch.sqrt(var / total_elements)

  return mean, std

image_mean, image_std = ComputeImageDatasetStats(os.path.join(data_dir, input_dir, 'train_valid'))
print(f'Image mean: {image_mean} std: {image_std}')

Image mean: tensor([0.4765, 0.4523, 0.3923]) std: tensor([0.2654, 0.2606, 0.2648])


## **2. Image Augmentation**

Sometimes, when we do not have enough images to train our deep learning model, we data augmentation to simulate new data. For example, in the case of images, assume we only have $10$ images per class. We can create more instance by applying transformations to these images. For example, if the image is of a standin dog, we can rotate it $90$ and $180$ degrees to create two additional instances of the same dog. We can also scale it, etc.

Here are some more image augmentation operations that might be useful.

Start by training your model on the data set, the way it is provided. Then, think of the types of transformations you can apply to the training images to improve the performance. 

You can find more about how to apply transformations to images in this [link](https://pytorch.org/vision/stable/transforms.html).

In [9]:
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

# Define the transformations to be applied to the images
transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load the dataset
train_valid_dataset = ImageFolder('../MLData-dog-breed-image-recognition/train_valid_test/train_valid/', transform=transform)

# Create a dataloader to load the images in batches
train_valid_loader = DataLoader(train_valid_dataset, batch_size=32, shuffle=True)

# Train your model using the augmented data

## **3. Loading (Reading) the Data Set**

Similar to previous labs, write here the Python code tat reads the training, validation and test set.

In [10]:
# Define the transformations to be applied to the images
transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load the training set
train_dataset = ImageFolder('../MLData-dog-breed-image-recognition/train_valid_test/train/', transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Load the validation set
valid_dataset = ImageFolder('../MLData-dog-breed-image-recognition/train_valid_test/valid/', transform=transform)
valid_loader = DataLoader(valid_dataset, batch_size=32, shuffle=True)

# Load the test set
test_dataset = ImageFolder('../MLData-dog-breed-image-recognition/train_valid_test/test/', transform=transform)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True)

## **4. Defining and Training ResNet**

Here, you are required to use ResNet to recognise the breed of the dogs in the images. You need to write the class that defines the network, the training class and the code for training the network. You are not required to implement ResNet from scratch. Instead, use PyTorch's implementation of ResNet. 

Note that there are many versions of ResNet. They differ in the number of layers they use. You are required to test with at least 2 versions and report their respective performances. 

Note that you are required to follow the good practices when training your network. In particular, you need to look at the loss curves (training and validation losses).

In [10]:
if torch.cuda.is_available():
    device = torch.device('cuda')
    print('Using GPU:', torch.cuda.get_device_name(0))
else:
    device = torch.device('cpu')
    print('Using CPU')

Using CPU


In [13]:
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models

# Define the ResNet model
class ResNetModel(nn.Module):
    def __init__(self, num_classes, dropout_prob=0.5):
        super(ResNetModel, self).__init__()
        self.resnet = models.resnet50(pretrained=True)
        num_features = self.resnet.fc.in_features
        self.resnet.fc = nn.Sequential(
            nn.Linear(num_features, 512),
            nn.ReLU(),
            nn.Dropout(dropout_prob),
            nn.Linear(512, num_classes)
        )
        self.dropout = nn.Dropout(dropout_prob)

    def forward(self, x):
        x = self.resnet(x)
        x = self.dropout(x)
        return x

# Define the training class
class Trainer:
    def __init__(self, model, train_loader, valid_loader, criterion, optimizer, device):
        self.model = model
        self.train_loader = train_loader
        self.valid_loader = valid_loader
        self.criterion = criterion
        self.optimizer = optimizer
        self.device = device

    def train(self, num_epochs):
        self.model.to(self.device)
        for epoch in range(num_epochs):
            train_loss = 0.0
            valid_loss = 0.0
            self.model.train()
            for i, (inputs, labels) in enumerate(self.train_loader):
                inputs = inputs.to(self.device)
                labels = labels.to(self.device)
                self.optimizer.zero_grad()
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                loss.backward()
                self.optimizer.step()
                train_loss += loss.item() * inputs.size(0)
            train_loss /= len(self.train_loader.dataset)
            self.model.eval()
            with torch.no_grad():
                for i, (inputs, labels) in enumerate(self.valid_loader):
                    inputs = inputs.to(self.device)
                    labels = labels.to(self.device)
                    outputs = self.model(inputs)
                    loss = self.criterion(outputs, labels)
                    valid_loss += loss.item() * inputs.size(0)
                valid_loss /= len(self.valid_loader.dataset)
            print('Epoch: {}, Training Loss: {:.4f}, Validation Loss: {:.4f}'.format(epoch+1, train_loss, valid_loss))

# Define the hyperparameters
num_classes = 120
learning_rate = 0.0001
momentum = 0.9
num_epochs = 10

# Define the device to use for training
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Load the ResNet model and move it to the device
model = ResNetModel(num_classes, dropout_prob=0.5).to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Define the device to use for training
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Define the trainer object and train the model
trainer = Trainer(model, train_loader, valid_loader, criterion, optimizer, device)
trainer.train(num_epochs)

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to C:\Users\Hecatia/.cache\torch\hub\checkpoints\resnet50-0676ba61.pth
100.0%


Epoch: 1, Training Loss: 4.3181, Validation Loss: 3.4902
Epoch: 2, Training Loss: 3.7763, Validation Loss: 2.7907
Epoch: 3, Training Loss: 3.5484, Validation Loss: 2.7268


KeyboardInterrupt: 

## **6. Run on the Testing Set and Submit teh Results on Kaggle**

Finally, test your trained model on the test set and upload the results to the [Kaggle competition](https://www.kaggle.com/c/dog-breed-identification). 

You are required to submit a screenshot of your score.

## **7. Hints to Improve Your Results**
- Try to increase the batch size and the number of epochs.
- Try deeper ResNet networks.
