# Introduction
Image classification is the process of taking an input (like a picture) and outputting a class (like “cat”) or a probability that the input is a particular class (“there’s a 90% probability that this input is a cat”). You can look at a picture and know that you’re looking at a terrible shot of your own face, but how can a computer learn to do that? With a convolutional neural network!

-----
# Goals
We would like you to establish a neural network involving advance DNN modules (i.e. convolution layers, RELU, pooling and fully connection layers and etc.)  to distinguish the specific category of an input image.

-------------
## Packages
Let's first import the necessary packages,

In [155]:
from __future__ import division

import warnings
from collections import namedtuple
import torch
import torch.nn as nn
from torch.jit.annotations import Optional, Tuple
from torch import Tensor
import os
import numpy as np
import os.path
from glob import glob
from PIL import Image
from tqdm import tqdm
import torchvision.datasets as dset
import torch.utils.data as data
from ipywidgets import IntProgress

-----
## GPU Device Configuration
Then, we set up and configure our computational devices: 
Whether we use GPU or perform the calculation on CPU.
we use the torch.devices() and torch.cude.is_available() functions to configure our computational devices

In [156]:
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = 'cpu'
print(device)

-----
## Configuration
### hyper parameters
We then set up and hyper parameters that need for the our model.
we need to define several hyper parameters for our model:
1. learning rate
2. batch size when training
3. batch size when testing
4. numbper of epoches
5. out put directory

In [157]:
learnRate = 0.001
trainBatchSize = 100
testBatchSize = 100
epochs = 5
out_name = "./output"

Create a directory if not exists
using os.path.exists() to check whether it is exist
using os.makedires to create a directory.

In [158]:
if not os.path.exists(out_name):
    os.makedirs(out_name)

-----
##  Data Loading
Next, we are going to load our data. 
### We need to prepare our data:

### We first import necessary librarys for data loading

In [159]:
import torchvision.transforms as transforms
from matplotlib import pyplot as plt

-----
###  Image processing
Then, we define a image preprocessing object that our dataloader can directly use this object to preprocess our data
We use the pytorch API to preform the data processing.
1. Use transforms.Compose()
2. Use .RandomHorizontalFlip()
3. You add any extra transforms you like.
4. Create this transform for both training set and testting set. Note that the testing spilit do not require any transform

In [160]:
train_transform = transforms.Compose([transforms.RandomHorizontalFlip(), transforms.ToTensor()])
test_transform = transforms.Compose([transforms.ToTensor()])

-----
### We then download and prepare the data with the transforms defined above:
1. Use command torchvision.datasets.CIFAR10() with root, train, download and transform posional arguments.
2. Use the same command to create both train split and test split.
3. Use torch.utils.data.DataLoader() to create the data loader based on the data we have.
3. Use this command for both training split data loader and test split data loader

In [161]:
train_set = dset.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
train_loader = torch.utils.data.DataLoader(dataset=train_set, batch_size=trainBatchSize, shuffle=True)
test_set = dset.CIFAR10(root='./data', train=False, download=True, transform=test_transform)
test_loader = torch.utils.data.DataLoader(dataset=test_set, batch_size=testBatchSize, shuffle=False)

-----
##  Network
Next, we are going to design our GoogLeNet
### First, we define our GoogLeNet class
### You need to refer the paper below to understand the structure.
### https://arxiv.org/abs/1409.4842



------
### Inception Module with dimension reductions (There exist many implement methods)
1. Create a python class called Inception which inherits nn.module

2. Create a init function to init this python class
    1. Require in_planes, kernel_1_x, kernel_3_in, kernel_3_x, kernel_5_in, kernel_5_x and pool_planes 7 arguments.
    
    2. Consists of 4 variables b1,b2,b3,b4
    
    3. b1 is a block consists of 2D convolution, a 2D batch normalization layer and a ReLU activation function
    
    4. b2 is a block consists of two 2D convolutions, two 2D batch normalization layers and tow ReLU activation functions
    
    5. b3 is a block consists of two 2D convolutions, two 2D batch normalization layers and two ReLU activation functions
    
    6. b4 is a block consists of a Maxpooling layer, a 2D convolution, a 2D batch normalization layer and a ReLU activation function
    
3. Create the forward function

    1. this forward function will forward the input function though every block and return the concatenation of all the output.

In [162]:
class Inception(nn.Module):
    def __init__(self, in_planes, kernel_1_x, kernel_3_in, kernel_3_x, kernel_5_in, kernel_5_x, pool_planes):
        super(Inception, self).__init__()
        # 1x1 conv branch
        self.b1 = nn.Sequential(
            nn.Conv2d(in_planes, kernel_1_x, kernel_size=1),
            nn.BatchNorm2d(kernel_1_x),
            nn.ReLU(True),
        )

        # 1x1 conv -> 3x3 conv branch
        self.b2 = nn.Sequential(
            #1
            nn.Conv2d(in_planes, kernel_3_in, kernel_size=1),
            nn.BatchNorm2d(kernel_3_in),
            nn.ReLU(True),
            #2
            nn.Conv2d(kernel_3_in, kernel_3_x, kernel_size=3, padding=1),
            nn.BatchNorm2d(kernel_3_x),
            nn.ReLU(True)
        )


        # 1x1 conv -> 5x5 conv branch
        self.b3 = nn.Sequential(
            #1
            nn.Conv2d(in_planes, kernel_5_in, kernel_size=1),
            nn.BatchNorm2d(kernel_5_in),
            nn.ReLU(True),
            #2
            nn.Conv2d(kernel_5_in, kernel_5_x, kernel_size=3, padding=1),
            nn.BatchNorm2d(kernel_5_x),
            nn.ReLU(True),
            #3
            nn.Conv2d(kernel_5_x, kernel_5_x, kernel_size=3, padding=1),
            nn.BatchNorm2d(kernel_5_x),
            nn.ReLU(True)
        )

         
        # 3x3 pool -> 1x1 conv branch
        self.b4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_planes, pool_planes, kernel_size=1),
            nn.BatchNorm2d(pool_planes),
            nn.ReLU(True)
        )

    def forward(self, x):
      out_b1 = self.b1(x)
      out_b2 = self.b2(x)
      out_b3 = self.b3(x)
      out_b4 = self.b4(x)
      all_out = torch.cat([out_b1, out_b2, out_b3, out_b4], dim=1)
      return all_out


-----
### GoogLeNet Module (There exist many implement methods)


1. Create a python class called GoogLeNet which inherits nn.module

2. Create a init function to init this python class

    1. Consists of a variables that serves as all layers before the inception, which contains a 2D convolution with padding=1, kernel_size=3 output channel=192, a 2D batch normalization layer and a ReLU activation fucntion.
    
    2. Two Inception blocks
    
    3. Maxpooling layer
    
    4. Five Inception blocks
    
    5. Maxpooling layer
    
    6. Two Inception blocks  
    
    7. Average Pooling layer
    
    8. A fully connected layer.
    
3. Create the forward function

    1. this forward function will forward the input function though every block and return the output

In [163]:
class GoogLeNet(nn.Module):

    def __init__(self):
        super(GoogLeNet, self).__init__()

        #1. Consists of a variables that serves as all layers before the inception, which contains a 2D convolution with padding=1, 
        # kernel_size=3 output channel=192, a 2D batch normalization layer and a ReLU activation fucntion.
        self.pre_inception = nn.Sequential(
            nn.Conv2d(3, 192, kernel_size=3, padding=1),
            nn.BatchNorm2d(192),
            nn.ReLU(True)
        )
        
        #2. Two Inception block
        self.incept1 = Inception(192,  64,  96, 128, 16, 32, 32)
        self.incept2 = Inception(256, 128, 128, 192, 32, 96, 64)

        #3. Maxpooling layer
        self.maxpool_1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        #4. Five Inception block
        self.incept3 = Inception(480, 192,  96, 208, 16,  48,  64)
        self.incept4 = Inception(512, 160, 112, 224, 24,  64,  64)
        self.incept5 = Inception(512, 128, 128, 256, 24,  64,  64)
        self.incept6 = Inception(512, 112, 144, 288, 32,  64,  64)
        self.incept7 = Inception(528, 256, 160, 320, 32, 128, 128)

        #5. Maxpooling layer
        self.maxpool_2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        #6. Two Inception block
        self.incept8 = Inception(832, 256, 160, 320, 32, 128, 128)
        self.incept9 = Inception(832, 384, 192, 384, 48, 128, 128)

        #7. Average Pooling layer
        self.avgpool = nn.AvgPool2d(kernel_size=8, stride=1)

        #8. Fully Connected Layer
        self.fc_layer = nn.Linear(1024, 10) #10 classes in CIFAR-10

    #this forward function will forward the input function though every block and return the output
    def forward(self,x):

      x = self.pre_inception(x)
      x = self.incept1(x)
      x = self.incept2(x)
      x = self.maxpool_1(x)
      x = self.incept3(x)
      x = self.incept4(x)
      x = self.incept5(x)
      x = self.incept6(x)
      x = self.incept7(x)
      x = self.maxpool_2(x)
      x = self.incept8(x)
      x = self.incept9(x)
      x = self.avgpool(x)
      x = x.view(x.size(0), -1)
      x = self.fc_layer(x)
    
      return x



### Next, we create the network and send it to the target device

In [164]:
googlenetwork = GoogLeNet()
googlenetwork.to(device)

### Finally, We create:
 1. an optimizer  (we use adam optimzer here)
 2. A Criterion (CrossEntropy) function
 3. A Scheduler which is used to decays the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones.

In [165]:
#1. an optimizer (we use adam optimzer here)
optimizer = torch.optim.Adam(googlenetwork.parameters(), lr=learnRate)
#2. A Criterion (CrossEntropy) function
criterion = nn.CrossEntropyLoss()
#3. A Scheduler which is used to decays the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones.
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[50, 10], gamma=0.5)

-----
##  Training
Then, we are going to train our Network

1. Set our network to the training model.
2. Init the train loss, total data and number corrected predictions. 
3. For each data in the training split
    1. Put the data to the correct devices using .to()
    2. Reset the gradient of the optimzier.
    3. Feed the data forward to the google net
    4. Use the criterion function to compute the loss term
    5. Backprop the loss
    6. Update the network parameters using the optimzier
    7. Accumulate the training loss
    8. Find the prediciton. hint: using torch.max()
    9. Increment the data size
    10. Increment the corrected prediction
    11. Print log
    
-----
##  Testing
Then, we are going to test our module

1. Set our network to the test model.
2. Init the test loss, total data and number corrected predictions. 
3. For each data in the training split, we warp it using torch.no_grad()
    1. Put the data to the correct devices using .to()
    2. Feed the data forward to the google net
    3. Use the criterion function to compute the loss term
    4. Accumulate the training loss
    5. Find the prediciton. hint: using torch.max()
    6. Increment the data size
    7. Increment the corrected prediction
    8. Print log

-----
##  Epochs:
For each epoch:
1. we first step our scheduler
2. we train our module
3. we test our module
4. we update the testing accuracy
5. we save the module at the end and print the accuracy

In [166]:
def train():
        
    # Set our network to the training model.
    googlenetwork.train()

    # Init the train loss, total data and number corrected predictions.
    loss_train = 0
    total_train_data = 0
    correct_pred_train = 0

    # For each data in the training split
    for j, data in enumerate(train_loader):

        inputs, labels = data
        # Put the data to the correct devices using .to()
        inputs = inputs.to(device)
        labels = labels.to(device)

        # Reset the gradient of the optimzier.
        optimizer.zero_grad()

        # Feed the data forward to the google net
        feedForward = googlenetwork(inputs)

        # Use the criterion function to compute the loss term
        entropyLoss = criterion(feedForward, labels)

        # Backprop the loss
        entropyLoss.backward()

        # Update the network parameters using the optimzier
        optimizer.step()

        # Accumulate the training loss
        loss_train = loss_train + entropyLoss.item()

        # Find the prediciton. hint: using torch.max()
        _, pred = torch.max(feedForward.data, 1)

        # Increment the data size
        total_train_data = total_train_data + labels.size(0)

        #Increment the corrected prediction
        correct_pred_train = correct_pred_train+ (pred == labels).sum().item()

    # Set our network to the test model
    googlenetwork.eval()

    # Init the test loss, total data and number corrected predictions.
    loss_test = 0
    total_test_data = 0
    correct_pred_test = 0

    # For each data in the testing split, we warp it using torch.no_grad()
    with torch.no_grad():            
        for j, data in enumerate(test_loader):

            inputs, labels = data
            # Put the data to the correct devices using .to()
            inputs = inputs.to(device)
            labels = labels.to(device)

            # Feed the data forward to the google net
            feedForward = googlenetwork(inputs)

            # Use the criterion function to compute the loss term
            entropyLoss = criterion(feedForward, labels)

            # Accumulate the training loss
            loss_test = loss_test + entropyLoss.item()

            # Find the prediciton. hint: using torch.max()
            _, pred = torch.max(feedForward.data, 1)

            # Increment the data size
            total_test_data = total_test_data + labels.size(0)

            # Increment the corrected prediction
            correct_pred_test = correct_pred_test + (pred == labels).sum().item()

    acc_train = (correct_pred_train/total_train_data) * 100
    acc_test = (correct_pred_test/total_test_data) * 100


    return acc_train, acc_test, loss_train, loss_test

In [167]:
train_acc = []
test_acc = []
train_loss = []
test_loss = []

for i in range(epochs):
    print('\nEpoch: {}'.format(i))
    
    acc_train, acc_test, loss_train, loss_test = train()
    
    train_acc.append(acc_train)
    test_acc.append(acc_test)
    train_loss.append(loss_train)
    test_loss.append(loss_test)

    print('Train Accuracy = {:.3f}%'.format(acc_train), end=' | ')
    print('Train Loss = {:.5f}'.format(loss_train))

    print('Test Accuracy  = {:.3f}%'.format(acc_test), end=' | ')
    print('Test Loss  = {:.5f}'.format(loss_test))

In [None]:
print(acc_train)

In [None]:
x_axis = [i for i in range(epochs)]
plt.plot(x_axis, train_loss, label='Training Loss')
plt.plot(x_axis, test_loss, label='Testing Loss')
plt.legend()
plt.show()