# PyTorch Implementation of Cats n'Dogs
*Created by Ben Perkins*

---
*Notable References:*

https://heartbeat.fritz.ai/basics-of-image-classification-with-pytorch-2f8973c51864

https://wandb.ai/authors/ayusht/reports/Dropout-in-PyTorch-An-Example--VmlldzoxNTgwOTE

----

I created this notebook to attempt to classify the Cats n' Dogs images with PyTorch. It includes:
* an alternative approach to creating the task datasets;
* **Data augmentation** steps to gain more accuracy in predictions;
* implementation of manual calculation of the image **mean and standard deviation** to pass the correct data to the `Normalize()` transform step. This step will remove the `mean_pixel` value from the data.
* **Object-Oriented Neural Network model**, adapted and augmented from the *Fritz.AI* website shown above;
* Use of **Sequential API** within the model to order the flow of the neural network.

The model was able to reach consistent 75% test accuracy, but so far not able to overcome what seems like overfitting. It starts with a 32x32 image size, which may be the best it can do for that resolution. Although train accuracy reached up to 98%, with a loss of around .05%, the test accuracy stagnated at around 75%.

`Epoch 9, Train Accuracy: 0.7727535963058472 , TrainLoss: 0.4672778248786926 , Test Accuracy: 0.7463377118110657, Best Accuracy: 0.7463377118110657`

In [1]:
# Import general modules
import os
import glob
from time import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tarfile
from tqdm.notebook import tqdm
from PIL import Image
import warnings

# PyTorch Modules
import torch
import torch.nn
import torch.nn.functional as F
from torch.nn import ReLU
from torch import nn, optim
from torch.optim import Adam
from torch.autograd import Variable
import torch.utils
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import torchvision
from torchvision import datasets, transforms, utils
from torchvision.io import read_image

In [2]:
def extract_tar(file, path):
    """
    function to extract tar.gz files to specified location
    
    Args:
        file (str): path where the file is located
        path (str): path where you want to extract
    """
    with tarfile.open(file) as tar:
        files_extracted = 0
        for member in tqdm(tar.getmembers()):
            if os.path.isfile(path + member.name[1:]):
                continue
            else:
                tar.extract(member, path)
                files_extracted += 1
        tar.close()
        if files_extracted < 3:
            print('Files already exist')

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
path = '/content/drive/MyDrive/Colab Notebooks/CatsNDogs/data/cadod/'

extract_tar('/content/drive/MyDrive/Colab Notebooks/CatsNDogs/data/cadod.tar.gz', path)

HBox(children=(FloatProgress(value=0.0, max=25936.0), HTML(value='')))




In [5]:
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/CatsNDogs/cadod.csv')

In [6]:
df.head()

Unnamed: 0,ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside,XClick1X,XClick2X,XClick3X,XClick4X,XClick1Y,XClick2Y,XClick3Y,XClick4Y
0,0000b9fcba019d36,xclick,/m/0bt9lr,1,0.165,0.90375,0.268333,0.998333,1,1,0,0,0,0.63625,0.90375,0.74875,0.165,0.268333,0.506667,0.998333,0.661667
1,0000cb13febe0138,xclick,/m/0bt9lr,1,0.0,0.651875,0.0,0.999062,1,1,0,0,0,0.3125,0.0,0.3175,0.651875,0.0,0.410882,0.999062,0.999062
2,0005a9520eb22c19,xclick,/m/0bt9lr,1,0.094167,0.611667,0.055626,0.998736,1,1,0,0,0,0.4875,0.611667,0.243333,0.094167,0.055626,0.226296,0.998736,0.305942
3,0006303f02219b07,xclick,/m/0bt9lr,1,0.0,0.999219,0.0,0.998824,1,1,0,0,0,0.508594,0.999219,0.0,0.478906,0.0,0.375294,0.72,0.998824
4,00064d23bf997652,xclick,/m/0bt9lr,1,0.240938,0.906183,0.0,0.694286,0,0,0,0,0,0.678038,0.906183,0.240938,0.522388,0.0,0.37,0.424286,0.694286


In [7]:
df.LabelName.unique()

array(['/m/0bt9lr', '/m/01yrx'], dtype=object)

# Create train_csv.csv

In [8]:
def label_img(row):
    if row['LabelName'] == '/m/0bt9lr':
        return 1
    if row['LabelName'] == '/m/01yrx':
        return 0


In [9]:
df2 = df
df2.head()

Unnamed: 0,ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside,XClick1X,XClick2X,XClick3X,XClick4X,XClick1Y,XClick2Y,XClick3Y,XClick4Y
0,0000b9fcba019d36,xclick,/m/0bt9lr,1,0.165,0.90375,0.268333,0.998333,1,1,0,0,0,0.63625,0.90375,0.74875,0.165,0.268333,0.506667,0.998333,0.661667
1,0000cb13febe0138,xclick,/m/0bt9lr,1,0.0,0.651875,0.0,0.999062,1,1,0,0,0,0.3125,0.0,0.3175,0.651875,0.0,0.410882,0.999062,0.999062
2,0005a9520eb22c19,xclick,/m/0bt9lr,1,0.094167,0.611667,0.055626,0.998736,1,1,0,0,0,0.4875,0.611667,0.243333,0.094167,0.055626,0.226296,0.998736,0.305942
3,0006303f02219b07,xclick,/m/0bt9lr,1,0.0,0.999219,0.0,0.998824,1,1,0,0,0,0.508594,0.999219,0.0,0.478906,0.0,0.375294,0.72,0.998824
4,00064d23bf997652,xclick,/m/0bt9lr,1,0.240938,0.906183,0.0,0.694286,0,0,0,0,0,0.678038,0.906183,0.240938,0.522388,0.0,0.37,0.424286,0.694286


In [10]:
df2['cdlabel'] = df2.apply(lambda row: label_img(row), axis=1)
df2['cdlabel'].unique()

array([1, 0])

In [11]:
df3 = pd.concat([df2['ImageID'], df2['cdlabel']], axis=1, keys=['ImageID', 'label'])

In [12]:
df3.head()

Unnamed: 0,ImageID,label
0,0000b9fcba019d36,1
1,0000cb13febe0138,1
2,0005a9520eb22c19,1
3,0006303f02219b07,1
4,00064d23bf997652,1


In [13]:
df3['label'].unique()

array([1, 0])

In [14]:
df3.to_csv(r'train_csv.csv', index=False, header=True)

# PyTorch Implementation

In [15]:
# Configure device for GPU or CPU depending on what is available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


# Create Custom Dataset

In [17]:
train_file = 'train_csv.csv'
img_dir = '/content/drive/MyDrive/Colab Notebooks/CatsNDogs/data/cadod/'

In [18]:
class CustomDataset(Dataset):
    def __init__(self, root_dir, annotation_file, transform=None):
        self.root_dir = root_dir
        self.annotations = pd.read_csv(annotation_file)
        self.transform = transform

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        img_id = self.annotations.iloc[index, 0]
        img = Image.open(os.path.join(self.root_dir, img_id+'.jpg')).convert("RGB")
        y_label = torch.tensor(self.annotations.iloc[index, 1])
       
        if self.transform is not None:
            img = self.transform(img)

        return (img, y_label)


# Create dataset to calculate the Mean and Standard Deviation for Normailze()
*This ultimately went unused, but I did experiment with Normalize() to subtract the `mean_pixel` from the image tensors.*

In [19]:
calctransform = transforms.Compose(([        # transforms.Compose - removed
    transforms.Resize((32,32)),
    transforms.ToTensor()
]))

In [20]:
calcdata = CustomDataset(root_dir=img_dir, annotation_file='train_csv.csv', transform=calctransform)

In [21]:
calcdata[1][0]

tensor([[[0.3608, 0.3843, 0.5412,  ..., 0.9961, 0.9961, 0.9961],
         [0.4078, 0.4902, 0.5765,  ..., 0.9961, 0.9961, 0.9961],
         [0.4863, 0.5373, 0.4118,  ..., 0.9961, 0.9961, 0.9961],
         ...,
         [0.6471, 0.5804, 0.3843,  ..., 0.9961, 0.9961, 0.9961],
         [0.5765, 0.5294, 0.2941,  ..., 0.9647, 0.9608, 0.9804],
         [0.4235, 0.3529, 0.2588,  ..., 0.9373, 0.9412, 0.9686]],

        [[0.3608, 0.3843, 0.5373,  ..., 1.0000, 1.0000, 1.0000],
         [0.4039, 0.4902, 0.5725,  ..., 1.0000, 1.0000, 1.0000],
         [0.4824, 0.5333, 0.4078,  ..., 1.0000, 1.0000, 1.0000],
         ...,
         [0.6431, 0.5765, 0.3843,  ..., 1.0000, 1.0000, 1.0000],
         [0.5725, 0.5255, 0.2941,  ..., 0.9686, 0.9647, 0.9843],
         [0.4196, 0.3490, 0.2588,  ..., 0.9451, 0.9412, 0.9725]],

        [[0.3529, 0.3765, 0.5294,  ..., 1.0000, 1.0000, 1.0000],
         [0.3961, 0.4824, 0.5647,  ..., 1.0000, 1.0000, 1.0000],
         [0.4745, 0.5255, 0.4000,  ..., 1.0000, 1.0000, 1.

In [22]:
calcloader = DataLoader(
    calcdata,
    batch_size=10,
    num_workers=1,
    shuffle=False,
    pin_memory=True
)


mean = 0.
std = 0.
nb_samples = 0.
for idx, (data, label) in enumerate(calcloader, 0):
    batch_samples = data.size(0)
    data = data.view(batch_samples, data.size(1), -1)
    mean += data.mean(2).sum(0)
    std += data.std(2).sum(0)
    nb_samples += batch_samples

mean /= nb_samples
std /= nb_samples

In [24]:
print("Mean of image tensors: ", mean)
print("Standard deviation of image tensors: ", std)

Mean of image tensors:  tensor([0.4708, 0.4266, 0.3788])
Standard deviation of image tensors:  tensor([0.2201, 0.2147, 0.2091])


# Create datasets for use in model training and testing

In [25]:
# Initial data transformation to resized tensors
init_transform = transforms.Compose(([        
    transforms.Resize((32,32)),
    transforms.ToTensor(),
    # transforms.Normalize((0.4708, 0.4266, 0.3788), (0.2201, 0.2147, 0.2091))
]))

In [27]:
# Create dataset
all_data = CustomDataset(root_dir=img_dir, annotation_file='train_csv.csv', transform=init_transform)

In [28]:
batch_size = 64

num_train = int(len(all_data) * 0.8)

train_set, test_set = torch.utils.data.random_split(all_data, [num_train, len(all_data) - num_train])

# Make Train Loader:
train_dataloader = DataLoader(train_set, shuffle=True, batch_size=batch_size)
                                  
# Make test loader:
test_dataloader = DataLoader(test_set, shuffle=False, batch_size=batch_size)

# Define the Model

In [41]:
class Unit(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Unit, self).__init__()

        self.conv = nn.Conv2d(in_channels=in_channels, kernel_size=3, 
                              out_channels=out_channels, stride=1, padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()

    def forward(self, input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)

        return output

class ImgNet(nn.Module):
    def __init__(self, num_classes):
        super(ImgNet, self).__init__()

         #Create 14 layers of the unit with max pooling in between
        self.unit1 = Unit(in_channels=3,out_channels=32)
        self.unit2 = Unit(in_channels=32, out_channels=32)
        self.unit3 = Unit(in_channels=32, out_channels=32)

        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.unit4 = Unit(in_channels=32, out_channels=64)
        self.unit5 = Unit(in_channels=64, out_channels=64)
        self.unit6 = Unit(in_channels=64, out_channels=64)
        # self.unit7 = Unit(in_channels=64, out_channels=64)

        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.unit8 = Unit(in_channels=64, out_channels=128)
        self.unit9 = Unit(in_channels=128, out_channels=128)
        self.unit10 = Unit(in_channels=128, out_channels=128)
        # self.unit11 = Unit(in_channels=128, out_channels=128)

        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.unit12 = Unit(in_channels=128, out_channels=128)
        self.unit13 = Unit(in_channels=128, out_channels=128)
        # self.unit14 = Unit(in_channels=128, out_channels=128)

        self.avgpool = nn.AvgPool2d(kernel_size=4)
        
        #Add all the units into the Sequential layer in exact order
        self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, 
                                 self.unit4, self.unit5, self.unit6, 
                                 self.pool2, self.unit8, 
                                 self.unit9, self.unit10, self.pool3,
                                 self.unit12, self.unit13, self.avgpool)
        
        self.fc = nn.Linear(in_features=128, out_features=num_classes)

    def forward(self, input):
        output = self.net(input)
        output = output.view(-1,128)
        output = self.fc(output)
        return output


# Train Model

In [42]:
from torch.optim import Adam

cuda_avail = torch.cuda.is_available()

model = ImgNet(num_classes=2)

if cuda_avail:
    model.cuda()

optimizer = Adam(model.parameters(), lr=0.001, weight_decay=0.0001) # Need to try different lr values
loss_fn = nn.CrossEntropyLoss()

In [43]:
for name, param in model.named_parameters():
  print(name, '\t', param.shape)

unit1.conv.weight 	 torch.Size([32, 3, 3, 3])
unit1.conv.bias 	 torch.Size([32])
unit1.bn.weight 	 torch.Size([32])
unit1.bn.bias 	 torch.Size([32])
unit2.conv.weight 	 torch.Size([32, 32, 3, 3])
unit2.conv.bias 	 torch.Size([32])
unit2.bn.weight 	 torch.Size([32])
unit2.bn.bias 	 torch.Size([32])
unit3.conv.weight 	 torch.Size([32, 32, 3, 3])
unit3.conv.bias 	 torch.Size([32])
unit3.bn.weight 	 torch.Size([32])
unit3.bn.bias 	 torch.Size([32])
unit4.conv.weight 	 torch.Size([64, 32, 3, 3])
unit4.conv.bias 	 torch.Size([64])
unit4.bn.weight 	 torch.Size([64])
unit4.bn.bias 	 torch.Size([64])
unit5.conv.weight 	 torch.Size([64, 64, 3, 3])
unit5.conv.bias 	 torch.Size([64])
unit5.bn.weight 	 torch.Size([64])
unit5.bn.bias 	 torch.Size([64])
unit6.conv.weight 	 torch.Size([64, 64, 3, 3])
unit6.conv.bias 	 torch.Size([64])
unit6.bn.weight 	 torch.Size([64])
unit6.bn.bias 	 torch.Size([64])
unit8.conv.weight 	 torch.Size([128, 64, 3, 3])
unit8.conv.bias 	 torch.Size([128])
unit8.bn.weight 	

In [None]:
# Check for available GPU for modelNN:
if torch.cuda.is_available():
    model.cuda()

**Maybe a learning rate adjuster??**

In [32]:
# https://heartbeat.fritz.ai/basics-of-image-classification-with-pytorch-2f8973c51864

# # Create a learning rate adjustment function that divides the learning rate by 10 every 5 epochs
# def adjust_learning_rate(epoch):
#     lr = 0.001

#     if epoch > 35:
#         lr = lr / 1000000
#     elif epoch > 30:
#         lr = lr / 100000
#     elif epoch > 25:
#         lr = lr / 10000
#     elif epoch > 20:
#         lr = lr / 1000
#     elif epoch > 15:
#         lr = lr / 100
#     elif epoch > 10:
#         lr = lr / 10

#     for param_group in optimizer.param_groups:
#         param_group["lr"] = lr


### Test Function

In [44]:
def save_models(epoch):
    torch.save(model.state_dict(), "cadodModel_{}.model".format(epoch))
    print("Checkpoint saved")

In [45]:
def test():
    model.eval()
    test_acc = 0.0
    for i, (images, labels) in enumerate(test_dataloader):

        if cuda_avail:
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())

        outputs = model(images)
        _, prediction = torch.max(outputs.data, 1)

        test_acc += torch.sum(prediction == labels.data)

    test_acc = test_acc / 2594

    return test_acc

# Train Function

In [47]:
# https://heartbeat.fritz.ai/basics-of-image-classification-with-pytorch-2f8973c51864

def train(num_epochs):
    best_acc = 0.0

    for epoch in range(num_epochs):
        model.train()
        train_acc = 0.0
        train_loss = 0.0
        for i, (images, labels) in enumerate(train_dataloader):
            # Move images and labels to gpu if available
            if cuda_avail:
                images = Variable(images.cuda())
                labels = Variable(labels.cuda())

            # Clear all accumulated gradients
            optimizer.zero_grad()
            # Predict classes using images from the test set
            outputs = model(images)
            # Compute the loss based on the predictions and actual labels
            loss = loss_fn(outputs, labels)
            # Backpropagate the loss
            loss.backward()

            # Adjust parameters according to the computed gradients
            optimizer.step()

            train_loss += loss.cuda().data * images.size(0)
            _, prediction = torch.max(outputs.data, 1)
            
            train_acc += torch.sum(prediction == labels.data)

        # Call the learning rate adjustment function
        # adjust_learning_rate(epoch)

        # Compute the average acc and loss over all 10372 training images
        train_acc = train_acc / num_train
        train_loss = train_loss / num_train

        # Evaluate on the test set
        test_acc = test()

        # Save the model if the test acc is greater than our current best
        if test_acc > best_acc:
            save_models(epoch)
            best_acc = test_acc

        # Print the metrics
        print("Epoch {}, Train Accuracy: {} , TrainLoss: {} , Test Accuracy: {}, Best Accuracy: {}".format(epoch,
                                                                                        train_acc,
                                                                                        train_loss,
                                                                                        test_acc,
                                                                                        best_acc))
                        

# Run Tests

In [48]:
if __name__ == '__main__':
    train(35)



Checkpoint saved
Epoch 0, Train Accuracy: 0.5468569397926331 , TrainLoss: 0.6885741949081421 , Test Accuracy: 0.5666923522949219, Best Accuracy: 0.5666923522949219
Checkpoint saved
Epoch 1, Train Accuracy: 0.6025838851928711 , TrainLoss: 0.6654961705207825 , Test Accuracy: 0.5767154693603516, Best Accuracy: 0.5767154693603516
Epoch 2, Train Accuracy: 0.6386424899101257 , TrainLoss: 0.6384814381599426 , Test Accuracy: 0.5481880903244019, Best Accuracy: 0.5767154693603516
Checkpoint saved
Epoch 3, Train Accuracy: 0.667759358882904 , TrainLoss: 0.6104883551597595 , Test Accuracy: 0.6723207235336304, Best Accuracy: 0.6723207235336304
Checkpoint saved
Epoch 4, Train Accuracy: 0.6932125091552734 , TrainLoss: 0.5798431038856506 , Test Accuracy: 0.6838858723640442, Best Accuracy: 0.6838858723640442
Epoch 5, Train Accuracy: 0.7147126793861389 , TrainLoss: 0.5567981004714966 , Test Accuracy: 0.6808018088340759, Best Accuracy: 0.6838858723640442
Epoch 6, Train Accuracy: 0.7321635484695435 , Train

# Brief Discussion

Test accuracy topped at about **75%** in most experiments. It would be interesting to try more measures. I tried `Dropout` but that seemed to make the testing loss erratic. 