# Dogs Vs Cats with Torch

I made this notebook to learn Pytorch,since I’m moving from Tensorflow to PyTorch.

# Imports

* `zipfile` - To extract the train and test images from ZIP files
* `time` - Save the timestamp when some metric was save on disk
* `os` - Useful for navigate through the files
* `random` - Select a random number for a seed
* `numpy` - For shuffling and math in general
* `pandas` - Create submission CSV
* `shutil` - Move images
* `PIL.Image` - Load evaluation images
* `collections` - Create nested dictionaries
* `tqdm` - Fancy progressbars
* `torch.utils.data.DataLoader` - Create batches in an easy way
* `torch.utils.data.Dataset` - Create a custom Dataset for evaluation
* `torch.utils.data.sampler.SubsetRandomSampler` - Choose samples from a subset of indices
* `torchvision.datasets` - Load images and labels from a root directory
* `torchvision.transforms` - Apply transformations on a given dataset
* `torchvision.models` - Load pretrained models
* `torchvision.utils.makegrid` - Plot multiple images
* `torch` - General methods from Pytorch
* `torch.nn` - Modules to build neural net layers
* `torch.nn.functional` - Methods like activation functions
* `torch.optim` - Optimizers
* `matplotlib.pyplot` - General ploting
* `matplotlib.style` - Change plots style

In [None]:
import zipfile
import time
import os
import random
import numpy as np
import pandas as pd
import shutil
from PIL import Image
import collections
from tqdm import tqdm
from torch.utils.data import DataLoader, Dataset
from torch.utils.data.sampler import SubsetRandomSampler
from torchvision import datasets, transforms, models
from torchvision.utils import make_grid
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
from matplotlib import style
%matplotlib inline

If we re-run without restart the kernel the `metrics.log` keeps on disk and will accumulate with the metrics from the last run.

To prevent that let's remove it

In [None]:
%rm '/kaggle/working/metrics.log'

Choose the device to run the model on training/validation/evaluation

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# DataLoader

Unzip the train and test sets to `/kaggle/working/` directory

In [None]:
# Extrair os ZIPs
with zipfile.ZipFile("../input/dogs-vs-cats/train.zip", "r") as unzip:
    unzip.extractall(".")

    
with zipfile.ZipFile("../input/dogs-vs-cats/test1.zip", "r") as unzip:
    unzip.extractall(".")

Create another directory with 2 folders inside (1 folder to each class/label). If they exist it will not create again.

In [None]:
os.makedirs("my_train/dogs", exist_ok=True)
os.makedirs("my_train/cats", exist_ok=True)

Here the images from `train` are moved to the correct label folder inside `my_train`.

The images name start with the correct label, so it's easy to know what is the correct folder to move

In [None]:
root = "train"
imgs = os.walk(root).__next__()[2]
folders = {
    "cat": "my_train/cats",
    "dog": "my_train/dogs"
}

for img in tqdm(imgs):
    label = img.split(".")[0]
    
    old_path = os.path.join(root, img)
    new_path = os.path.join(folders[label], img)
    
    shutil.move(old_path, new_path)
    
    

Here the images from `my_train` are transformed and split into 2 datasets: `train` and `val` - one for training, another one for validation (during the training).

Probably you are asking: "Why you put those specific values on the Normalize method?" - When we use pretrained ImageNet datasets, those values was estimated by them using millions of images, so it seems a good estimation.

And why 2 arrays with 3 values each?

The first array  is for the mean and the second array is for the standard deviation. And we use 3 values, because the images are normalized in-depth (in the channels dimension) and we have RGB images, so we have 3 channels (1 value for each channel).

In [None]:
# Image size
IMG_SIZE = 260

# Transformations
data_transforms = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    transforms.RandomHorizontalFlip(0.5)
])

# Load the images and labels
dataset = datasets.ImageFolder(root="my_train", transform=data_transforms)

# Get the name for each numeric label
classes = dataset.classes

# 20% of the train dataset will be used for
# validation
val_split = 0.2
dataset_size = len(dataset)
idxs = list(range(dataset_size))

# Number of validation images
split = int(val_split * dataset_size)

# Set a random seed and shuffle the ids
np.random.seed(random.randint(0, 99999))
np.random.shuffle(idxs)

dataset_size = {
    "train": dataset_size - split,
    "val": split
}

# Set data samplers to select N unique images for
# each dataset
train_sampler = SubsetRandomSampler(idxs[:-split])
val_sampler = SubsetRandomSampler(idxs[-split:])

Split the dataset in 2 dataloaders and each dataloader will get a subset of unique images, divided in batches.

In [None]:
dataloaders = {
    "train": DataLoader(dataset, batch_size=64, sampler=train_sampler),
    "val": DataLoader(dataset, batch_size=64, sampler=val_sampler),
}

Just a simple function to show 4 images from the train dataloader.

Some people don't understand why we need this lines:\
`mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)`

If you're reading the markdown cells, you know that we use the mean and the standar deviaton to normalize the images. So the first 2 lines are explained. Now, why `inp` is `std * inp + mean` ?

In statistics we have a thing called ***Standard Scores*** or ***Z-score*** and we can use it to normalize values (only if we know the mean and std of a population). The expression is:\
$\frac{X - \mu}{\sigma}$, where $X$ is our image, $\mu$ the mean and $\sigma$ the std.

More info about that [here](https://en.wikipedia.org/wiki/Standard_score)

So our images were normalized with that expression and matplotlib will not plot the image as we expect, because the pixel values are normalized. To "unnormalize" we need to reverse the ***z-score*** expression:\
$X * \sigma + \mu$

As you can see we reverse all the operations from the previous expression, and that converted in code is:\
`std * inp + mean`

Now, what the `clip()` method does? Clip will grab all the values and translate them into an interval, in this case: [0,1] - If during the "unnormalize" some values were < 0 or > 1, they're transformed to a value close to 0 (if value < 0) or close to 1 (if value > 1)

In [None]:
def imshow(inp, title=None):
    inp = inp.numpy().transpose((1,2,0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.grid(False)
    plt.imshow(inp)
    
    if title != None:
        plt.title(title)
    
    plt.pause(0.001)

    
features, labels = next(iter(dataloaders["train"]))
features = features[:4]
labels = labels[:4]
out = make_grid(features)

imshow(out, title=[classes[x] for x in labels])

# Test DataLoader
This custom dataset just grab all file names on `test1` directory and when we loop through it it will read each image, apply transformations (if we passed them) and return the image and the name of the image (it will be usefull when creating the submission CSV)

In [None]:
class TestDataset(Dataset):
    
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.paths = os.walk(root_dir).__next__()[2]
        self.transform = transform
    
    def __len__(self):
        return len(self.paths)
    
    def __getitem__(self, idx):
        
        if torch.is_tensor(idx):
            idx = idx.tolist()
        
        img_path = os.path.join(self.root_dir, self.paths[idx])
        
        img = Image.open(img_path)
        
        if self.transform != None:
            img = self.transform(img)
        
        return img, self.paths[idx].split(".")[0]

Create a dataloader with the custom dataset. The transforms are the same, except the `RandomFlip` that was excluded because the images shouldn't be transformed

In [None]:
test_dataset = TestDataset(root_dir="./test1", transform=transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]))

test_dataloader = DataLoader(test_dataset, batch_size=64)

In [None]:
imgs, _ = next(iter(test_dataloader))
imgs = imgs[:4]
out = make_grid(imgs)
imshow(out)

# Train Function

The most important function, here the train and validation happens

In [None]:
def train(model, loss_fn, optimizer, num_epochs=5, model_name="model", lr_scheduler=None):
    
    for epoch in range(num_epochs):
        print("Epoch {}/{}\n".format(epoch+1, num_epochs))
        
        for step in dataloaders:
            
            # Set all layers to trainning=True
            if step == "train":
                model.train()
            
            #Set all layers to trainning=False
            else:
                model.eval()
            
            # Loss
            l = 0
            # Accuracy
            acc = 0
            
            for X_batch, y_batch in tqdm(dataloaders[step]):
                X_batch, y_batch = X_batch.to(device), y_batch.to(device)
                
                # Zero gradient, each batch should have your own gradient
                optimizer.zero_grad()
                
                # Training
                # Gradients set to True in all layers
                with torch.set_grad_enabled(step == "train"):
                    preds = model(X_batch)
                    
                    # max return a tuple with the max values and their indices
                    # I only want the indices because CrossEntropy returns an Tensor
                    # with all the unnormalized probabilities from each label
                    # The indice from the highest prob is the predicted label
                    _, max_idxs = torch.max(preds, dim=1)
                    
                    
                    # Calculate errors
                    loss = loss_fn(preds, y_batch)
                    
                    # Real train right here
                    if step == "train":
                        ## L2 regularization 
                        ## (doesn't worth it with my own model)
                        #l2_factor = 0.005
                        #l2_reg = l2_factor * np.sum([(w**2).sum() for w in model.parameters()])
                        #loss = loss + l2_reg
                        
                        # Backprop + Weighs update
                        loss.backward()
                        optimizer.step()
                    
                # Running metrics
                l += loss.item() * X_batch.size(0)
                acc += torch.sum(max_idxs == y_batch.data)
                
                # Write on file the batch metrics
                with open(model_name+".log", "a") as f:
                    f.write("{},{},{},{}\n".format(
                        round(time.time(), 3),
                        step,
                        torch.sum(max_idxs == y_batch.data).item() / y_batch.size(0),
                        loss.item()
                    ))
                
            # Calculate the average metrics per epoch
            epoch_loss = l / dataset_size[step]
            epoch_acc = acc.double() / dataset_size[step]
            
            # Learning rate update
            # We can change the learning rate in each epoch
            # using learning rate scheduler
            if step == "train" and lr_scheduler != None:
                lr_scheduler.step()
            
            print("{} Acc: {:.3f} - Loss: {:.3f}".format(step, epoch_acc, epoch_loss))
        print()
    
    return model

# Custom Model

Here is my test cell, where I tried to wrote some custom models, but they din't work well. 

**You can skip to the next cell, because this model will not be used.**

In [None]:
class CNN(nn.Module):
    def __init__(self):
        
        super().__init__()
        
        self.conv1 = nn.Conv2d(3, 64, 5, stride=(2,2))
        self.res_conv1 = None
        self.conv2 = nn.Conv2d(64, 64, 3, padding=3//2)
        self.conv3 = nn.Conv2d(64, 64, 3, padding=3//2)
        self.res_conv3 = None
        self.conv4 = nn.Conv2d(64, 64, 3, padding=3//2)
        self.conv5 = nn.Conv2d(64, 64, 3, padding=3//2)
        
         
        
        x = torch.randn(3, IMG_SIZE, IMG_SIZE).view(-1, 3, IMG_SIZE, IMG_SIZE)
        self.get_flatten = None
        
        # Get number of parameters in the last conv. layer
        self.forward_convolutions(x)
        
        self.dropout = nn.Dropout(0.5)
        
        self.fc1 = nn.Linear(self.get_flatten, 2)
    
    def forward_convolutions(self, X):
        X = F.relu( self.conv1(X) )
        self.res_conv1 = X
        
        X = F.relu( self.conv2(X) )
        X = F.relu( self.conv3(X) )
        X = self.res_conv1 + X
        self.res_conv3 = X
        
        X = F.relu( self.conv4(X) )
        X = F.relu( self.conv5(X) )
        X = self.res_conv3 + X
        
        
        # If we are trying to find the number of
        # parameters in the last convolution
        if self.get_flatten == None:
            self.get_flatten = 1

            for sz in X.size()[1:]:
                self.get_flatten *= sz
        
        
        return X
    
    def forward(self, X):
        X = self.forward_convolutions(X)
        
        X = X.view(-1, self.get_flatten)
        X = self.dropout(X)
        
        return self.fc1(X)

    
    def calc_outputs(self):
        # Calculate the output size for
        # each Conv2d layer
        
        conv_idx = 1
        
        out_h_prev = out_w_prev = IMG_SIZE
        
        for layer in self.children():
            if isinstance(layer, nn.Conv2d):
                inp = (out_h_prev, out_w_prev)
                out = layer.out_channels
                k = layer.kernel_size
                s = layer.stride
                p = layer.padding

                out_h = ( ( inp[0] + (2*p[0]) - k[0] ) // s[0] ) + 1
                out_w = ( ( inp[1] + (2*p[1]) - k[1] ) // s[1] ) + 1 
                
                print("Output Convolution Layer {} = ({},{})".format(
                    conv_idx,
                    out_h//2, # convs are divided by 2 because they're all pooled by a 2x2 pool window
                    out_w//2  # with 2x2 stride, making the size of the feature map being half
                ))
                
                out_h_prev, out_w_prev = out_h, out_w
                
                conv_idx += 1

**Just a test for checking the final output of our feature extractor**

In [None]:
#test = CNN()

#test.calc_outputs()


# Train

Here I tested different parameters for my own model and for the resnet18 pretrained model.

I realize:
* Dropout with 50% of the neurons temporarily "dead" decrease the validation loss
* Adam optimizer starting with 0.001 learning rate converge pretty fast
* Step learning rate with gamma = 5 and only change the learning rate on the 5th epoch, helps to increase the accuracy and decrease the loss of the validation set

In [None]:
## After some tests I found that 10 epochs are enough for the resnet18 model
model_names = {
    #"model_cnn_3conv,3k,1epoch": 1,
    #"model_cnn_3conv,3k,3epoch": 3,
    #"model_resnet18,5epoch": 5,
    "model_resnet18,10epoch": 10,
}

for k,v in model_names.items():
    ## Transfer Learning
    model = models.resnet18(pretrained=True)
    n_features = model.fc.in_features
    
    # Freeze layers, except our new Linear layer
    for layer in model.parameters():
        layer.requires_grad = False
    
    model.fc = nn.Sequential(collections.OrderedDict([
        ("fc_dropout1", nn.Dropout(0.5)),
        ("fc_softmax", nn.Linear(n_features, 2))
    ]))
    model = model.to(device)
    
    
    ## Own Model
    #model = CNN().to(device)
    
    # Loss function
    loss_fn = nn.CrossEntropyLoss()

    # Optimizer
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    ## Decay LR by a factor of 5 every 5 epochs
    exp_lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=5)
    model = train(model, loss_fn, optimizer, num_epochs=v, 
                  model_name=k, lr_scheduler=exp_lr_scheduler)
    
    ## When we have mutliple model_names it's good to delete
    ## the model object, to prevent using the previous trained
    ## weights
    #del model

# Visualize metrics

Here starts a big dictionary is created to save all the train and test accuracies and losses (those that we save in a file during the train/validation.

In [None]:
nested_dict = lambda: collections.defaultdict(nested_dict)

models_metrics = nested_dict()

for k,v in model_names.items():

    train_accs = []
    train_losses =[]
    val_accs = []
    val_losses = []
    losses_removed = 0
    with open(k+".log") as f:
        for line in f:
            acc = line.split(",")[2]
            loss = line.split(",")[3]
            
            # The model goes really bad with some batches
            # To keep the plot with values between 0 and 1
            # I delete all the losses greater than 1
            if float(loss) > 1.0:
                losses_removed += 1
                continue
                

            if "train" in line:
                train_accs.append(float(acc))
                train_losses.append(float(loss))
            else:
                val_accs.append(float(acc))
                val_losses.append(float(loss))
    
    models_metrics[k]["train_accs"] = train_accs
    models_metrics[k]["train_losses"] = train_losses
    models_metrics[k]["val_accs"] = val_accs
    models_metrics[k]["val_losses"] = val_losses

## Smooth functions

Because the batch metrics can be really instable, I tried to implement 2 methods to smooth the plotted lines.

The **Hanning Window** is a function used in signal processing to smooth values. It is defined as:\
$\frac{1}{2} - \frac{1}{2}cos(\frac{2\pi n}{M-1})$, where $n$ is the current value and $M$ is the total of values.

If we multiply the result by the current value, it should be smoothed (Numerically it becomes smaller, but visually the difference is almost imperceptible).


The **Moving Average** basically is a mean that use all the previous values to compute it. So when we reach the last value, the first value will not impact much in the average. I tried to implement the version used in Tensorboard, but it doesn't work (probably because I need to set the exponents for each value). So I just gave up and I used the hanning window.

In [None]:
def hanning_window(metric):
    for i in range(len(metric)):
        n = metric[i]
        h_metric = 0.5 - 0.5 * np.cos( (2*np.pi*n) / (len(metric)-1) )
        metric[i] *= n

    return metric


def moving_average(metric, w):
    
    last_smooth = metric[0]
    smoothed_metric = []
    
    for i, p in enumerate(metric):
        if i == 0:
            smooth = p
        else:
            smooth = last_smooth * w + (1 - w) + p 
        
        smoothed_metric.append( smooth )
        last_smooth = smooth
        
    return smoothed_metric

Compute the for each metric and transform the metric saved values using the Hanning Window 

In [None]:
for k,v in model_names.items():
    
    mean_train_acc, mean_train_loss = np.mean(models_metrics[k]["train_accs"]), np.mean(models_metrics[k]["train_losses"])
    mean_val_acc, mean_val_loss = np.mean(models_metrics[k]["val_accs"]), np.mean(models_metrics[k]["val_losses"])
    
    models_metrics[k]["mean_train_acc"] = mean_train_acc
    models_metrics[k]["mean_train_loss"] = mean_train_loss
    models_metrics[k]["mean_val_acc"] = mean_val_acc
    models_metrics[k]["mean_val_loss"] = mean_val_loss

    
        
    models_metrics[k]["train_accs"] = hanning_window(models_metrics[k]["train_accs"])
    models_metrics[k]["train_losses"] = hanning_window(models_metrics[k]["train_losses"])
    models_metrics[k]["val_accs"] = hanning_window(models_metrics[k]["val_accs"])
    models_metrics[k]["val_losses"] = hanning_window(models_metrics[k]["val_losses"])

Now I used the big dictionary to plot train accuracy/loss and validation accuracy/loss 

In [None]:
style.use("ggplot")

for k,v in model_names.items():
    f, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6))

    ax1.set_title(k)
    
    ax1.plot(np.arange(len(models_metrics[k]["train_accs"])), models_metrics[k]["train_accs"], label="Acc - Mean: {:.2f}".format(models_metrics[k]["mean_train_acc"]))
    ax1.plot(np.arange(len(models_metrics[k]["train_losses"])), models_metrics[k]["train_losses"], label="Loss - Mean: {:.2f}".format(models_metrics[k]["mean_train_loss"]))

    ax1.legend()
    
    ax2.plot(np.arange(len(models_metrics[k]["val_accs"])), models_metrics[k]["val_accs"], label="Val Acc - Mean: {:.2f}".format(models_metrics[k]["mean_val_acc"]))
    ax2.plot(np.arange(len(models_metrics[k]["val_losses"])), models_metrics[k]["val_losses"], label="Val Loss - Mean: {:.2f}".format(models_metrics[k]["mean_val_loss"]))
    ax2.legend()
    
    
    # Save the plot as image to analyze and compare with
    # other tests I made
    plt.savefig(k+".png")

# Evaluate

Here I just used the test dataloader to predict the label from each image and place the image name as key and the predicted label as value of a dictionary

In [None]:

pred_dict = dict()

with torch.no_grad():
    for X_batch,names in tqdm(test_dataloader):
        X_batch = X_batch.to(device)
        
        preds = model(X_batch)
        
        _, ys = torch.max(preds, dim=1)
        
        for y, name in zip(ys, names):
            pred_dict[int(name)] = y.item()
    

With the dictionary a DataFrame is created using the submission structure

In [None]:
df_test = pd.DataFrame(pred_dict.items(), columns=["id", "label"])

Save the datatframe as a CSV file

In [None]:
df_test.to_csv("submission.csv", index=False)