# CE-40719: Deep Learning
## HW3 - CNN / CNN Case Studies / CNN Applications
(23 points)

#### Name: Seyed Shayan Nazemi
#### Student No.: 98209037

In this assignment we go through the following topics:
- Writing custome pytorch modules
- Using `tensorboard` for logging and visualization
- Data Augmentation
- Saving / Loading Models

Please Keep in mind that:
- You can not use out-of-the-box pytorch modules (nn.Conv2d, nn.Linear, nn.BatchNorm, nn.Dropout, ...)
- You can run this notebook on your computer. If you prefer using Google Colab you may lose some of the functionalities of `tensorboard` (like Projector). You can install `tensorboard` on your computer using package manager of your choice, and download `runs` folder from Google Colab and run it locally using `tensorboard --logdir=runs`.
- Use the [documentation](https://pytorch.org/docs/stable/index.html).

In this assignment we are going to train a convolutional neural network to classify images from [fashion-mnist](https://github.com/zalandoresearch/fashion-mnist) dataset. Fashion-mnist is a simple dataset containing 60000 training and 10000 test $28 \times 28$ grayscale images of 10 different classes. Each class corresponds to a different kind of clothing. 

## 1. Setup (1.5 pts)

In [0]:
import torch
from torch import nn, optim
from torch.nn import functional as F
from torchvision import datasets, transforms, utils
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

In [217]:
print(torch.__version__)

1.4.0


In [0]:
%reload_ext tensorboard

In [219]:
# import os
# logs_base_dir = "runs"
# os.makedirs(logs_base_dir, exist_ok=True)
%tensorboard --logdir {logs_base_dir}

Reusing TensorBoard on port 6006 (pid 7655), started 0:31:04 ago. (Use '!kill 7655' to kill it.)

<IPython.core.display.Javascript object>

To easily train your model on different gpu devices or your computer's cpu you can define a `torch.device` object corresponding to that device and use `.to(device)` method to easily move modules or tensors to different devices. Pytorch provides helper functions in [torch.cuda](https://pytorch.org/docs/stable/cuda.html) package.

In [220]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# create a cpu device if cuda is not available or cuda_device=None otherwise
# create a cuda:{cuda_device} device.
#################################################################################
device = None
if torch.cuda.is_available():
    cuda_device = torch.cuda.current_device()
    device = torch.device('cuda', cuda_device)
else:
    device = torch.device('cpu', 0)
pass
#################################################################################
#                                   THE END                                     #
#################################################################################
print(device)

cuda:0


Fashion-mnist dataset is available in `torchvision.datasets` package.

In [0]:
batch_size = 32
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# Initialize and download trainset and testset with datasets.FashionMNIST and
# transform data into torch.Tensor. Initialize trainloader and testloader with
# given batch_size.
#################################################################################

transform = transforms.ToTensor()
trainset = datasets.FashionMNIST(root='./data',
                                    train=True,
                                    download=True,
                                    transform=transform)

testset = datasets.FashionMNIST(root='./data',
                                   train=False,
                                   download=True,
                                   transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size)

#################################################################################
#                                   THE END                                     #
#################################################################################
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')

To get a sense of data it is always helpfull to see a few of samples. We can do this using `tensorboard`. Please read [this](https://pytorch.org/docs/stable/tensorboard.html) documentation page to get familiar with tensorboard. Run the following cell to intialize a SummaryWriter and log some of the training images to tensorboard. You can run tensorboard using `tensorboard --logdir=runs` and view images.

In [0]:
writer = SummaryWriter('./runs/FashionMNIST')

dataiter = iter(trainloader)
images, labels = dataiter.next()

img_grid = utils.make_grid(images[:16], nrow=4)

writer.add_image('FashionMNIST', img_grid)

We can also visualize data (or any representation of it) using dimmensionality reduction techniques provided by tensorboard. The following cell adds raw pixel values as embeddings to visualize data. You can see visualizations in projector tab of tensorboard after running the following cell.

In [223]:
def select_n_random(data, labels, n=100):
    perm = torch.randperm(len(data))
    return data[perm][:n], labels[perm][:n]

nimages, nlabels = select_n_random(trainset.data, trainset.targets)

writer.add_embedding(nimages.view(-1, 28 * 28), 
                     metadata=[classes[label] for label in nlabels],
                     label_img=nimages.unsqueeze(1), 
                     tag='raw_pixels')
writer.flush()

AttributeError: ignored

## 2. Modules (7 pts)

In this part you will define all the required modules for a convolutional model. You can only use functional package `torch.nn.functional` unless stated otherwise.

### 2.1 Convolution Module (1.5 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define convolution parameters using nn.Parameter.
# initialize weihgt using nn.init.kaiming_uniform and bias by zeroes
# use F.conv2d in forward method.
#################################################################################
class Conv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True):
        super(Conv2d, self).__init__()
        self.kernel_size = kernel_size
        self.padding = padding
        self.stride = stride
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.W = nn.Parameter(nn.init.kaiming_uniform_(torch.empty(out_channels, in_channels, kernel_size[0], kernel_size[1])))
        self.b = nn.Parameter(nn.init.zeros_(torch.empty(out_channels)))
        pass

    def forward(self, x):
        out = F.conv2d(x, self.W, bias=self.b, stride=self.stride, padding=self.padding)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

### 2.2 Linear (Fully-connected) Module (1.5 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define parameters using nn.Parameter.
# initialize weihgt using nn.init.kaiming_uniform and bias by zeroes
# use F.linear in forward method.
#################################################################################
class Linear(nn.Module):
    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        
        self.in_size = in_features
        self.out_size = out_features

        self.W = nn.Parameter(nn.init.kaiming_uniform_(torch.empty(out_features, in_features)))
        self.b = nn.Parameter(nn.init.zeros_(torch.empty(out_features)))
        pass

    def forward(self, x):
        out = F.linear(x, self.W, bias=self.b)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

### 2.3 1D Batch Normalization Module (2 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define and intitialize running_mean and running_var by zeroes and ones
# respectively.
# define weight and bias using nn.Parameter. initialize weights to a 
# normal distribution (std=1) and bias to zero.
# use F.batch_norm in forward method.
# use self.training to differ between training and test phase.
#################################################################################
class BatchNorm(nn.Module):
    def __init__(self, num_features):
        super(BatchNorm, self).__init__()
        self.running_mean = nn.init.zeros_(torch.empty(num_features)).to(device)
        self.running_var = nn.init.ones_(torch.empty(num_features)).to(device)
        self.training = True
        self.W = nn.Parameter(nn.init.normal_(torch.empty(num_features)))
        self.b = nn.Parameter(nn.init.zeros_(torch.empty(num_features)))

        pass
        
    def forward(self, x):
        out = F.batch_norm(x, self.running_mean, self.running_var, weight=self.W, bias=self.b, training=self.training)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

### 2.4 2D Batch Normalization Module (2 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define and intitialize running_mean and running_var by zeroes and ones
# respectively.
# define weight and bias using nn.Parameter. initialize weights to a
# normal distribution (std=1) and bias to zero.
# use F.batch_norm in forward method.
# use self.training to differ between training and test phase.
# more info on 2d batch normalization:
# https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network
#################################################################################
class BatchNorm2d(nn.Module):
    def __init__(self, num_features):
        super(BatchNorm2d, self).__init__()
        self.running_mean = nn.init.zeros_(torch.empty(num_features)).to(device)
        self.running_var = nn.init.ones_(torch.empty(num_features)).to(device)
        self.training = True
        self.W = nn.Parameter(nn.init.normal_(torch.empty(num_features)))
        self.b = nn.Parameter(nn.init.zeros_(torch.empty(num_features)))

    def forward(self, x):
        out = F.batch_norm(x, self.running_mean, self.running_var, weight=self.W, bias=self.b, training=self.training)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

## 3. Model (3.5 pts)

Using the modules defined in previous part define the following model:

`[Conv2d(3, 3), channels=8, stride=1, padding=1] > [BatchNorm2d] > [relu]`

`[Conv2d(5, 5), channels=16, stride=1, padding=0] > [BatchNorm2d] > [relu] > [max_pool2d(2, 2), stride=(2, 2), padding=0]`

`[Conv2d(5, 5), channels=32, stride=1, padding=0] > [BatchNorm2d] > [relu] > [max_pool2d(2, 2), stride=(2, 2), padding=0]`

`[Linear(128)] > [BatchNorm] > [relu]`

`[Linear(64)] > [BatchNorm] > [relu]`        __(features)__

`[Linear(10)]`

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################

class Model(nn.Module):
    def __init__(self, dropout=None):
        super(Model, self).__init__()

        self.dropout = dropout

        self.is_training = True

        self.conv1 = Conv2d(in_channels=1, out_channels=8, kernel_size=(3,3), stride=1, padding=1)
        self.batch1 = BatchNorm2d(num_features=8)

        self.conv2 = Conv2d(in_channels=8, out_channels=16, kernel_size=(5,5), stride=1, padding=0)
        self.batch2 = BatchNorm2d(num_features=16)

        self.conv3 = Conv2d(in_channels=16, out_channels=32, kernel_size=(5,5), stride=1, padding=0)
        self.batch3 = BatchNorm2d(num_features=32)

        self.linear4 = Linear(in_features=512, out_features=128)
        self.batch4 = BatchNorm(128)

        self.linear5 = Linear(in_features=128, out_features=64)
        self.batch5 = BatchNorm(64)

        self.linear6 = Linear(in_features=64, out_features=10)

    def forward(self, x):
        out = self.conv1.forward(x)
        out = self.batch1.forward(out)
        out = F.relu(out)

        out = self.conv2.forward(out)
        out = self.batch2.forward(out)
        out = F.relu(out)
        out = F.max_pool2d(out, kernel_size=(2,2), stride=(2,2), padding=0)

        out = self.conv3.forward(out)
        out = self.batch3.forward(out)
        out = F.relu(out)
        out = F.max_pool2d(out, kernel_size=(2,2), stride=(2,2), padding=0)

        out = out.view(-1, 32 * 4 * 4)

        out = self.linear4.forward(out)
        out = self.batch4(out)
        out = F.relu(out)
        if(self.dropout):
            out = F.dropout(out, p = 0.5, training=self.is_training)

        out = self.linear5.forward(out)
        out = self.batch5(out)
        features = F.relu(out)
        if(self.dropout):
            features = F.dropout(features, p = 0.5, training=self.is_training)

        out = self.linear6(features)

        return out, features
#################################################################################
#                                   THE END                                     #
#################################################################################

## 4. Training the Model (5 pts)

In [0]:
def train(model, optimizer, trainloader, testloader, device, num_epoches, label):
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# write the main training loop procedure:
# move data to defined device
# zero_grad optimizer
# forward
# compute loss using F.cross_entropy
# backward
# step the optimizer
# accumulate running loss
#################################################################################
    is_training = True
    for epoch in range(num_epoches):
        print('EPOCH {:2d}:'.format(epoch + 1))
        model.train()
        running_loss = 0.
        for i, (x, y) in enumerate(trainloader):
            optimizer.zero_grad()
            y_pred, feature = model(x.to(device))
            loss = F.cross_entropy(y_pred, y.to(device))
            loss.backward()
            optimizer.step()

            running_loss += loss.item() * x.shape[0]
#################################################################################
#                                   THE END                                     #
#################################################################################
       
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# compute test loss:
# dont forget to change model mode from train to eval
# write the code in a with torch.no_grad() block to prevent computing and 
# accumulating gradients
# accumulate loss in test_loss variable
#################################################################################
            if i % 100 == 99:
                test_loss = 0.
                is_training = model.training
                model.eval()
                with torch.no_grad():
                    for (x_t, y_t) in testloader:
                        y_pred_t, _ = model(x_t.to(device))
                        loss = F.cross_entropy(y_pred_t, y_t.to(device))
                        test_loss += loss.item() * x_t.shape[0]

                    # test_loss /= len(testloader.sampler)
#################################################################################
#                                   THE END                                     #
#################################################################################
                writer.add_scalars('loss/'+label, 
                                   {'train': running_loss/100, 'test': test_loss/len(testloader)},
                                  global_step=epoch * len(trainloader) + i + 1)
                writer.flush()
                print('\titeration {:4d}: training_loss = {:5f}, test_loss = {:5f}'.format(i + 1, running_loss/100, test_loss/len(testloader)))
                running_loss = 0.
            
                model.train(mode = is_training)
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# compute test accuracy:
# dont forget to change model mode from train to eval
# write the code in a with torch.no_grad() block to prevent computing and 
# accumulating gradients
# accumulate number of correct predictions in correct variable and total test
# samples in total variable
# accumulate number of classwise correct predictions in class_correct list 
# and total classwise test samples in class_total list
#################################################################################
        model.eval()
        with torch.no_grad():
            correct = 0
            total = 0
            
            class_correct = [0.] * 10
            class_total = [0.] * 10

            for (data, target) in testloader:
                output, _ = model(data.to(device))

                pred = torch.argmax(output, dim = 1)
                for idx, k in enumerate(pred):
                    class_total[target[idx]] += 1
                    if(k == target[idx]):
                        class_correct[k] += 1

            correct = sum(class_correct)
            total = sum(class_total)


            pass
#################################################################################
#                                   THE END                                     #
#################################################################################
        writer.add_scalars('accuracy/'+label, {'test': correct / total},
                           global_step=(epoch + 1) * len(trainloader))
        print('test_accuracy = {:5f}'.format(correct / total))
        
        writer.add_scalars('classwise_accuracy/'+label, 
                           {classes[i]: class_correct[i]/class_total[i] for i in range(10)},
                           global_step=(epoch + 1) * len(trainloader))
        for i in range(10):
            print('  >> {:11s}: {:5f}'.format(classes[i], class_correct[i]/class_total[i]))
            
        writer.flush()
        torch.save(model.state_dict(), './saved_models/model_{}.chkpt'.format(label))

In [275]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# initilize model and train for 10 epoches using Adam optimizer
#################################################################################
num_epoches = 10
model = Model(dropout = False)
model.to(device)

optimizer = torch.optim.Adam(model.parameters())

writer.add_graph(model, images.to(device))
train(model, optimizer, trainloader, testloader, device, num_epoches, 'base')
#################################################################################
#                                   THE END                                     #
#################################################################################

EPOCH  1:
	iteration  100: training_loss = 39.446511, test_loss = 26.881039
	iteration  200: training_loss = 24.069481, test_loss = 21.521734
	iteration  300: training_loss = 21.462413, test_loss = 19.309621
	iteration  400: training_loss = 19.161621, test_loss = 18.050663
	iteration  500: training_loss = 17.978520, test_loss = 16.673897
	iteration  600: training_loss = 17.946598, test_loss = 16.959615
	iteration  700: training_loss = 16.863912, test_loss = 16.337858
	iteration  800: training_loss = 16.100125, test_loss = 15.573174
	iteration  900: training_loss = 15.639371, test_loss = 15.775224
	iteration 1000: training_loss = 15.479437, test_loss = 14.610753
	iteration 1100: training_loss = 14.423972, test_loss = 14.365429
	iteration 1200: training_loss = 15.159401, test_loss = 14.571200
	iteration 1300: training_loss = 14.684134, test_loss = 13.278542
	iteration 1400: training_loss = 13.932522, test_loss = 14.229921
	iteration 1500: training_loss = 14.512334, test_loss = 13.993987


In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# add model features corresponding to nimages as embedding to tensorboard
#################################################################################
pass
writer.flush()
#################################################################################
#                                   THE END                                     #
#################################################################################

## 5. Dropout and Data Augmentation (6 pts)

Add dropout with p=0.5 to first two linear layers of the model using `F.dropout`. You can either modify the model module to take an additional parameter or write a seperate module. 

Data Augmentation is a strategy for increasing dataset size to prevent overfitting and better generalization. Dataset can be augmented by any transformation on data that do not change its label.

Pytorch provides data augmentation transforms in `torchvision.transforms` package.

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# compose a transform using transforms.Compose that horizontally flips images 
# and use transforms.RandomResizedCrop to crop a 20 * 20 patch of the image 
# and resizing back to 28 * 28
#################################################################################
transform = transforms.Compose([transforms.RandomHorizontalFlip(p = 1), 
                                transforms.RandomResizedCrop(28, scale=(20/28, 1), ratio=(1,1)), 
                               transforms.ToTensor()]
                               )

# transform = transforms.Compose([transforms.ToTensor()])

trainset = datasets.FashionMNIST(root='./data',
                                    train=True,
                                    download=True,
                                    transform=transform)

testset = datasets.FashionMNIST(root='./data',
                                   train=False,
                                   download=True,
                                   transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size)

#################################################################################
#                                   THE END                                     #
#################################################################################

In [0]:
dataiter = iter(trainloader)
images, labels = dataiter.next()

img_grid = utils.make_grid(images[:16], nrow=4)

writer.add_image('FashionMNIST/augmented', img_grid)
writer.flush()

In [273]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# initilize model and train using augmented dataset for 10 epoches
#################################################################################
num_epoches = 10
model2 = Model(dropout = True)
model2.to(device)

optimizer = torch.optim.Adam(model2.parameters())

writer.add_graph(model2, images.to(device))
train(model2, optimizer, trainloader, testloader, device, num_epoches, 'dropout')
#################################################################################
#                                   THE END                                     #
#################################################################################

	%input.13 : Float(32, 128) = aten::dropout(%input.12, %169, %170) # /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:807:0
	%input : Float(32, 64) = aten::dropout(%input.16, %184, %185) # /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:807:0
This may cause errors in trace checking. To disable trace checking, pass check_trace=False to torch.jit.trace()
  check_tolerance, _force_outplace, True, _module_class)
Not within tolerance rtol=1e-05 atol=1e-05 at input[27, 7] (18.296180725097656 vs. 1.8828994035720825) and 319 other locations (100.00%)
  check_tolerance, _force_outplace, True, _module_class)
Not within tolerance rtol=1e-05 atol=1e-05 at input[27, 39] (36.843589782714844 vs. 0.0) and 777 other locations (37.00%)
  check_tolerance, _force_outplace, True, _module_class)


EPOCH  1:
	iteration  100: training_loss = 67.220806, test_loss = 53.943564
	iteration  200: training_loss = 48.266687, test_loss = 41.780737
	iteration  300: training_loss = 39.520360, test_loss = 35.435418
	iteration  400: training_loss = 34.763588, test_loss = 32.095810
	iteration  500: training_loss = 31.692995, test_loss = 29.405137
	iteration  600: training_loss = 30.571603, test_loss = 27.748336
	iteration  700: training_loss = 28.042048, test_loss = 26.188073
	iteration  800: training_loss = 26.657152, test_loss = 24.731947
	iteration  900: training_loss = 25.435404, test_loss = 23.887915
	iteration 1000: training_loss = 24.130634, test_loss = 22.661997
	iteration 1100: training_loss = 23.028777, test_loss = 21.974150
	iteration 1200: training_loss = 22.931733, test_loss = 21.311382
	iteration 1300: training_loss = 22.214107, test_loss = 20.536878
	iteration 1400: training_loss = 21.582360, test_loss = 20.447222
	iteration 1500: training_loss = 22.479642, test_loss = 20.145850


In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# add model features corresponding to nimages as embedding to tensorboard
#################################################################################
pass

writer.flush()
writer.close()
#################################################################################
#                                   THE END                                     #
#################################################################################