<a href="https://colab.research.google.com/github/MidgeLiu/Buying-tickets-Fast-and-Automatically/blob/master/Homework4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1><center> Homework 4 <center></h1>

## Deadline: 11/01/2019 Friday 5pm ET 

### Review

Review sections 8.6, 9.6 in [Dive into Deep Learning book](https://en.d2l.ai/d2l-en.pdf).

### Writing (50 points)

1. Please read two papers below. Choose one of them and write a 1 page summary in latex format. You can find the template latex file in the latex folder. Make sure to use neurips2018.tex and submit both the tex and pdf files.

* ResNet: https://arxiv.org/pdf/1512.03385.pdf  
* Batch Normalization: https://arxiv.org/pdf/1502.03167.pdf

You can use Texmaker (https://www.xm1math.net/texmaker/) to edit .tex files.

### Coding Questions [8.6, 9.6 in the book]

This time we want to use AWS GPU to run LeNet and ResNet18 models. Training Deep learning models is very time-consuming in CPUs. Therefore, deep learning practitioners use GPUs almost all the time. For detailed information about how to use AWS GPU and open ipython notebook in remote GPU, follow the steps in "Step-by-step guide to run template in AWS GPU" in main repostory.

In [0]:
pip install d2l

In [0]:
pip install mxnet

In [0]:
# import packages 

from google.colab import files
src = list(files.upload().values())[0]
open('untils.py','wb').write(src)
import torch
from torch import nn, optim
import torch.nn.functional as F
import utils
from torchvision.models.resnet import ResNet, BasicBlock

## The Model

### LeNet
LeNet-5 is a convolutional network designed for handwritten and machine-printed character recognition.
You can find more detailed information here: http://yann.lecun.com/exdb/lenet/

To compare the results with previously achieved with vanilla softmax regression, we continue to use the Fashion-MNIST image classification dataset. The input size of LeNet-5 is 32*32, so we may need to resize the original shape.

In [0]:
batch_size = 256
train_iter, test_iter = utils.load_data_fashion_mnist(batch_size, resize=32)

In [0]:
train_iter

#### Define the Model (10 points)

##### Structure of LeNet-5:  
32×32 input image >  
Six 28×28 feature maps convolutional layer (5×5 size) >  
Average Pooling layers (2×2 size) >  
Sixteen 10×10 feature maps convolutional layer (5×5 size) >  
Average Pooling layers (2×2 size) >  
Fully connected to 120 neurons >  
Fully connected to 84 neurons >  
Fully connected to 10 outputs  

In [0]:
class LeNet5(nn.Module): 
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(in_channels = 1, out_channels = 6, kernel_size = 5)
        self.relu1 = nn.ReLU()
        self.ave_pool1 = nn.AvgPool2d(kernel_size = 2, stride = 2)
        self.conv2 = nn.Conv2d(in_channels = 6, out_channels = 16, kernel_size = 5)
        self.relu2 = nn.ReLU()
        self.ave_pool2 = nn.AvgPool2d(kernel_size = 2, stride = 2)
        self.linear1 = nn.Linear(16 * 5 * 5, 120)
        self.relu3 = nn.ReLU()
        self.linear2 = nn.Linear(120, 84)
        self.relu4 = nn.ReLU()
        self.linear3 = nn.Linear(84, 10)
    def forward(self, x):
        output1 = self.ave_pool1(self.relu1(self.conv1(x)))
        output2 = self.ave_pool2(self.relu2(self.conv2(output1)))
        output3 = output2.view(-1, self.num_flat_features(output2))
        output4 = self.relu3(self.linear1(output3))
        output5 = self.relu4(self.linear2(output4))
        output6 = self.linear3(output5)
        return output6
    def num_flat_features(self, x):
        size = x.size()[1:]
        num_featurs = 1
        for s in size:
            num_featurs *= s
        return num_featurs



#### Copy the network to GPU
We want to do forward propagation and backward propagation in GPU rather than CPU, so we move the mode to GPU.

In [0]:
device = torch.device("cuda")
net = LeNet5().to(device)

#### See if your model works by invoking train and predict (10 points)

In [0]:
num_epochs, lr = 10, 0.5
optimizer = optim.SGD(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss()


utils.train(net, train_iter, test_iter, loss, num_epochs, optimizer,device)

In [0]:
utils.predict(net, test_iter,device, model='lenet')

### ResNet

#### Define the Model (20 points)

##### Structure of ResNet:  
Resnet18 Architecture
![title](img/ResNet-18-Architecture.png)  

We want to do forward propagation and backward propagation in GPU rather than CPU, so we move the mode to GPU.
You can use the resnet18 model that pytorch provides or you can build your own resnet18 model.  
We need to consider two things here:
1. The input channel of Resnet18 is 3, while what we have is 1.
2. The output size of Resnet18 is 1000, while what we want is 10.

In [0]:
import math
class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=10):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

In [0]:
def fashion_MNIST_resnet():
    model = ResNet(BasicBlock, [2, 2, 2, 2] )
    return model

In [0]:
device = torch.device('cuda')
net = fashion_MNIST_resnet().to(device)

#### Initialize the model parameters 
The input size of LeNet-5 is 224 x 224, so we may need to resize the shape of data to 224 x 224. We have done that in load function.

In [0]:
batch_size = 256
train_iter, test_iter = utils.load_data_fashion_mnist_resnet(batch_size)

#### See if your model works by invoking train and predict (10 points)

In [0]:
num_epochs, lr = 10, 0.5
optimizer = optim.SGD(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss()

utils.train(net, train_iter, test_iter, loss, num_epochs, optimizer, device)

In [0]:
utils.predict(net, test_iter, device, model='resnet')