Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
ID = ""

---

# Lab 3: ResNet and Create Dataset

In this lab, we will develop ResNets and SE Nets.

## Preliminaries (datasets and data loaders)

Let's follow the same process as in lab 2. We import necessary libraries, load images, and transform them to 3x224x224.
This version is changed a bit to allow different transforms for the training set (augmentation) and val/test sets.

In [None]:
import torch
import torchvision
from torchvision import datasets, models, transforms
import torch.nn as nn
import torch.optim as optim
import time
import os
from copy import copy
from copy import deepcopy
import torch.nn.functional as F

# Set device to GPU or CPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
# Allow augmentation transform for training set, no augementation for val/test set

train_preprocess = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

eval_preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Download CIFAR-10 and split into training, validation, and test sets.
# The copy of the training dataset after the split allows us to keep
# the same training/validation split of the original training set but
# apply different transforms to the training set and validation set.

full_train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                                  download=True)

train_dataset, val_dataset = torch.utils.data.random_split(full_train_dataset, [40000, 10000])
train_dataset.dataset = copy(full_train_dataset)
train_dataset.dataset.transform = train_preprocess
val_dataset.dataset.transform = eval_preprocess

test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                            download=True, transform=eval_preprocess)

# DataLoaders for the three datasets

BATCH_SIZE=128
NUM_WORKERS=4

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE,
                                            shuffle=True, num_workers=NUM_WORKERS)
val_dataloader = torch.utils.data.DataLoader(val_dataset, batch_size=BATCH_SIZE,
                                            shuffle=False, num_workers=NUM_WORKERS)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=BATCH_SIZE,
                                            shuffle=False, num_workers=NUM_WORKERS)

dataloaders = {'train': train_dataloader, 'val': val_dataloader}


## ResNet

ResNet or Residual network has been first publish in [the ResNet paper](https://arxiv.org/pdf/1512.03385.pdf). In the past, there are many problems on the very deep network. Because When the output from convolutional layer come out, the details inside has been changed and cannot guess that what is the input came from. This such problem called **Vanishing gradient**.

To solve the vanishing gradient, the paper present about creating shortcut in the system layers.

#### ResNet structure

The ResNet structure has 4 stages. Each stage combine with number of residual blocks. Thus, in general, we say by show in matrix of 4 digits $[s_1, s_2, s_3, s_4]$. For example if we want to say the structure inside ResNet34, we can write as $[3,4,6,3]$.

### ResNet18

On page 5 of [the ResNet paper](https://arxiv.org/pdf/1512.03385.pdf), the simplest ResNet
described is now known as ResNet18. It is a 1.8 GFLOP CNN with $[2,2,2,2]= 8$ residual blocks (two convolutional
layers in each residual block), which along with the initial convolution (7x7 in the paper, 3x3 here)
and the final linear / softmax layer gives us 18 layers:

<img src="img/Resnet18_new.png" title="ResNet18" style="width: 1400px;" />

### Residual blocks

The residual block is a reusable block.
ResNet18 and ResNet34 use very basic residual blocks, but
ResNet50, ResNet101, and ResNet152 use more complicated residual blocks
with three convolutions, the middle of which is a
bottleneck that increases the representational power of the block
without an enormous increase in the number of parameters.

We need two types of residual block, one that preserves feature map size and one
that allows changes to the feature map size:

### Please select one

<img src="img/residualblock_new1.png" title="Residual block" style="width: 800px;" />

Note that only the shape-preserving residual block has a real identity mapping.
The 1x1 strided convolution is the simplest way to allow changes in the input
feature map size, but since the parameters are learned, after training, the result
may be quite different from an identity mapping.

Let's see how to implement a residual block in a resuable way. This
code is modified from https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py.

In [None]:
class BasicBlock(nn.Module):
    '''
    BasicBlock: Simple residual block with two conv layers
    '''
    EXPANSION = 1
    def __init__(self, in_planes, out_planes, stride=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_planes)
        self.conv2 = nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_planes)
        self.shortcut = nn.Sequential()
        # If output size is not equal to input size, reshape it with 1x1 convolution
        if stride != 1 or in_planes != out_planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

### Bottleneck blocks

When we use Residual block more than 50 layers of CNN. The residual network will be changed to be bottleneck block, whick add another 1x1 CNN layer inside the residual network. The use of a bottleneck reduces the number of parameters and matrix multiplications. The idea is to make residual blocks as thin as possible to increase depth and have less parameters. 

Here's the bottlneck version with three layers per residual block:

In [None]:
class BottleneckBlock(nn.Module):
    '''
    BottleneckBlock: More powerful residual block with three convs, used for Resnet50 and up
    '''
    EXPANSION = 4
    def __init__(self, in_planes, planes, stride=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.EXPANSION * planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.EXPANSION * planes)

        self.shortcut = nn.Sequential()
        # If the output size is not equal to input size, reshape it with 1x1 convolution
        if stride != 1 or in_planes != self.EXPANSION * planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.EXPANSION * planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.EXPANSION * planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

### Resnet

Here is the whole shebang for ResNet, with the layer sizes tailored a bit to our input size of 64x64:

In [None]:
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super().__init__()
        self.in_planes = 64
        # Initial convolution
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        # Residual blocks
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        # FC layer = 1 layer
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.linear = nn.Linear(512 * block.EXPANSION, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.EXPANSION
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.avgpool(out)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

From this template we can make various ResNets:

In [None]:
def ResNet18(num_classes = 10):
    '''
    First conv layer: 1
    4 residual blocks with two sets of two convolutions each: 2*2 + 2*2 + 2*2 + 2*2 = 16 conv layers
    last FC layer: 1
    Total layers: 1+16+1 = 18
    '''
    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes)


def ResNet34(num_classes):
    '''
    First conv layer: 1
    4 residual blocks with [3, 4, 6, 3] sets of two convolutions each: 3*2 + 4*2 + 6*2 + 3*2 = 32
    last FC layer: 1
    Total layers: 1+32+1 = 34
    '''
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes)


def ResNet50(num_classes = 10):
    '''
    First conv layer: 1
    4 residual blocks with [3, 4, 6, 3] sets of three convolutions each: 3*3 + 4*3 + 6*3 + 3*3 = 48
    last FC layer: 1
    Total layers: 1+48+1 = 50
    '''
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes)


def ResNet101(num_classes = 10):
    '''
    First conv layer: 1
    4 residual blocks with [3, 4, 23, 3] sets of three convolutions each: 3*3 + 4*3 + 23*3 + 3*3 = 99
    last FC layer: 1
    Total layers: 1+99+1 = 101
    '''
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes)


def ResNet152(num_classes = 10):
    '''
    First conv layer: 1
    4 residual blocks with [3, 8, 36, 3] sets of three convolutions each: 3*3 + 8*3 + 36*3 + 3*3 = 150
    last FC layer: 1
    Total layers: 1+150+1 = 152
    '''
    return ResNet(Bottleneck, [3, 8, 36, 3], num_classes)

OK, let's test it! Here's a `train_model()` function again:

In [None]:
def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, weights_name='weight_save', is_inception=False):
    '''
    train_model: train a model on a dataset
    
            Parameters:
                    model: Pytorch model
                    dataloaders: dataset
                    criterion: loss function
                    optimizer: update weights function
                    num_epochs: number of epochs
                    weights_name: file name to save weights
                    is_inception: The model is inception net (Google LeNet) or not

            Returns:
                    model: Best model from evaluation result
                    val_acc_history: evaluation accuracy history
                    loss_acc_history: loss value history
    '''
    since = time.time()

    val_acc_history = []
    loss_acc_history = []

    best_model_wts = deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        epoch_start = time.time()

        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                # for process anything, device and dataset must put in the same place.
                # If the model is in GPU, input and output must set to GPU
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                # it uses for update training weights
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        # print('outputs', outputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
            epoch_end = time.time()
            
            elapsed_epoch = epoch_end - epoch_start

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
            print("Epoch time taken: ", elapsed_epoch)

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = deepcopy(model.state_dict())
                torch.save(model.state_dict(), weights_name + ".pth")
            if phase == 'val':
                val_acc_history.append(epoch_acc)
            if phase == 'train':
                loss_acc_history.append(epoch_loss)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history, loss_acc_history

The train function still be the same as lab3. Then, you can train Resnet function.

In [None]:
resnet = ResNet18().to(device)
# Optimizer and loss function
criterion = nn.CrossEntropyLoss()
params_to_update = resnet.parameters()
# Now we'll use Adam optimization
optimizer = optim.Adam(params_to_update, lr=0.01)

best_model, val_acc_history, loss_acc_history = train_model(resnet, dataloaders, criterion, optimizer, 25, 'resnet18_bestsofar')

## Squeeze and Excite networks

Squeeze and Excite networks (SENet) is a building block for CNNs that improves channel interdependencies at almost no computational cost. The modification from the ordinary ResNet is easy. The main idea of SENet is add parameters in each channel, then the network can adaptively adjust the weighting of each feature map.

SENets are all about changing this by adding a content aware mechanism to weight each channel adaptively. In it’s most basic form this could mean adding a single parameter to each channel and giving it a linear scalar how relevant each one is.

The concept of squeeze and excite (SENet) is shown here:

<img src="img/SEInceptionModule.PNG" title="se block" style="width: 800px;" />

<img src="img/SEResnetModule.PNG" title="se block" style="width: 800px;" />

Implementation is beautifully simple. Here's an example of an SE module
from https://github.com/moskomule/senet.pytorch/blob/23839e07525f9f5d39982140fccc8b925fe4dee9/senet/se_module.py#L4:

In [None]:
class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super().__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

SE modules can be added anywhere you like:

<img src="img/SEblockSet.PNG" title="se add" style="width: 800px;" />

Let's use the standard option (option b above) recommended by the authors:

In [None]:
class ResidualSEBasicBlock(nn.Module):
    '''
    ResidualSEBasicBlock: Standard two-convolution residual block with an SE Module between the
                          second convolution and the identity addition
    '''
    EXPANSION = 1

    def __init__(self, in_planes, out_planes, stride=1, reduction=16):
        super().__init__()
        self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_planes)
        self.conv2 = nn.Conv2d(out_planes, out_planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_planes)
        self.se = SELayer(out_planes, reduction)

        self.shortcut = nn.Sequential()
        # If output size is not equal to input size, reshape it with a 1x1 conv
        if stride != 1 or in_planes != out_planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, out_planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.EXPANSION * out_planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out = self.se(out)              # se net add here
        out += self.shortcut(x)         # shortcut just plus it!!!
        out = F.relu(out)
        return out


def ResSENet18(num_classes = 10):
    return ResNet(ResidualSEBasicBlock, [2, 2, 2, 2], num_classes)


Let's try the SE version of ResNet18 and compare in terms of time and accuracy.

In [None]:
ressenet = ResSENet18().to(device)
# Optimizer, loss function
criterion2 = nn.CrossEntropyLoss()
params_to_update2 = ressenet.parameters()
optimizer2 = optim.Adam(params_to_update2, lr=0.01)

best_model2, val_acc_history2, loss_acc_history2 = train_model(ressenet, dataloaders, criterion2, optimizer2, 25, 'ressenet18_bestsofar')

In [None]:
import matplotlib.pyplot as plt

def plot_data(val_acc_history, loss_acc_history, val_acc_history2, loss_acc_history2):
    plt.plot(loss_acc_history, label = 'ResNet18')
    plt.plot(loss_acc_history2, label = 'ResSENet18')
    plt.title('Training loss over time')
    plt.legend()
    plt.show()
    plt.plot(val_acc_history, label = 'ResNet18')
    plt.plot(val_acc_history2, label = 'ResSENet18')
    plt.title('Validation accuracy over time')
    plt.legend()
    plt.show()

In [None]:
plot_data(val_acc_history, loss_acc_history, val_acc_history2, loss_acc_history2)

Interestingly, we can see that the additional parameters accelerate learning of the training set without
causing any degredation on the validation set and in fact improving validation set performance early
on.

## Create your own dataset

If you want to use the model that you created, downloaded with your own project, you must know that each dataset does not store in the same format. You need to consider the data to get images and label as you want. For computer vision dataset, there are some example types as:

1. Classification: images, labels
    - folderClassA, folderClassB
    - image_name
    - images folder, csv_labels
2. Detection: images, annotations
    - Yolo: images folder, annotation files
3. Segmentation: images, annotations
    - images folder, masks folder
    - images folder, annotation files
4. Image synthesis: images, labels (optional)
5. Image transfer: imagesA, imagesB

In this lab, I will explain only image classification.

### Experiment: Kaprao-Horapa

First, let load the [vege_dataset.zip](https://www.dropbox.com/s/eip79rx1mmofbov/vege_dataset.zip?dl=0).
The dataset contains 2 classes of kaprao and horapa. Both are basils but different family and using.

Extract file and see the folder inside

The dataset contains 2 folders which 2 difference name, so we can use the folder as dataset.

### Create Dataset class using pytorch

Let's create the empty dataset class. The input of the class are
- the dataset library of <code>../vege_dataset/</code>, when <code>..</code> is the root path of your dataset.
- transform function

In [None]:
# import important library
from torch.utils.data import Dataset, DataLoader

class BasilDataset(Dataset):
    def __init__(self, root_path="/vege_dataset/", transform=None):
        return
    
    def __len__(self):
        return 0
    
    def __getitem__(self, i):
        return

The important function of the dataset class are
- <code>__init__</code>: The initialize dataset, this is used for count all data and collect all paths for each data in your dataset.
- <code>__len__</code>: The function return total number of dataset
- <code>__getitem__</code>: The function is used for select 1 row of data in dataset and return as input and label as you want. You don't need to care about batch items. The **DataLoader** class will help you load later.

### Initialize dataset
Then you need to write <code>__init__</code> function to get all path files of your dataset.

Then, you can insert <code>len</code> as the total number of dataset you have.

In [None]:
from os import listdir

class BasilDataset(Dataset):
    def __init__(self, root_path="/vege_dataset/", transform=None):
        # keep root directory
        self.dir = root_path
        # keep transform
        self.transform = transform

        # read all files in kapao and horapa folder
        list_kaprao = listdir(root_path + 'kapao/')
        list_horapa = listdir(root_path + 'horapa/')
        # calculate all number for each class (just in case)
        self.kaprao_len = len(list_kaprao)
        self.horapa_len = len(list_horapa)

        # put the data file path into ids
        self.ids = [self.dir + 'kapao/' + file for file in list_kaprao if not file.startswith('.')]
        self.ids.extend([self.dir + 'horapa/' + file for file in list_horapa if not file.startswith('.')])
        
    def __len__(self):
        return self.kaprao_len + self.horapa_len
    
    def __getitem__(self, i):
        return

### Get one item of your dataset in the list

In [None]:
from torch.utils.data import Dataset, DataLoader
from os import listdir
from PIL import Image

class BasilDataset(Dataset):
    def __init__(self, root_path="vege_dataset/", transform=None):
        # keep root directory
        self.dir = root_path
        # keep transform
        self.transform = transform

        # read all files in kapao and horapa folder
        list_kaprao = listdir(root_path + 'kapao/')
        list_horapa = listdir(root_path + 'horapa/')
        # calculate all number for each class (just in case)
        self.kaprao_len = len(list_kaprao)
        self.horapa_len = len(list_horapa)

        # put the data file path into ids
        self.ids = [self.dir + 'kapao/' + file for file in list_kaprao if not file.startswith('.')]
        self.ids.extend([self.dir + 'horapa/' + file for file in list_horapa if not file.startswith('.')])
        
    def __len__(self):
        return self.kaprao_len + self.horapa_len
    
    def __getitem__(self, i):
        idx = self.ids[i]
        img_file = idx
        
        # open photo
        pil_img = Image.open(img_file)
        
        # resize, normalize and convert to pytorch tensor
        if self.transform:
            img = self.transform(pil_img)
        self.pil_img = pil_img
            
        # get label from file list counter
        if i < self.kaprao_len:
            label = 0
        else:
            label = 1
            
        return {
            'image': img,
            'label': label,
            'file_name' : img_file,
        }

### Test dataset

Now you can test your dataset to get images.

In [None]:
root = "vege_dataset/"

transform = transforms.Compose([
    transforms.Resize(32),
    transforms.RandomCrop(28), # CenterCrop
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

dataset = BasilDataset(root, transform)

In [None]:
import matplotlib.pyplot as plt

output_label = ['kaprao', 'horapa']

batch = dataset[0]
image, label, filename = batch['image'], batch['label'], batch['file_name']
pil_img = Image.open(filename)

print(output_label[label])
print(filename)
# (3, 224, 224) pytorch
# pyplot -> (224,224,3)
plt.imshow(pil_img)
plt.show()

### Do train loader

In [None]:
train_loader = DataLoader(dataset, batch_size=64, shuffle=True, pin_memory=True)

### Use resnet class from above

In [None]:
resnet = ResNet18(2).to(device)
# Optimizer and loss function
criterion = nn.CrossEntropyLoss()
params_to_update = resnet.parameters()
# Now we'll use Adam optimization
optimizer = optim.Adam(params_to_update, lr=0.01)

In [None]:
n_epochs = 25

loss_history = []
loss_history_epoch = []
accuracy = []

for epoch in range(1, n_epochs + 1):
    epoch_iter = 0                  # the number of training iterations in current epoch, reset to 0 every epoch
    running_loss = 0
    running_corrects = 0
    for batch in train_loader:
        image, label, filename = batch['image'], batch['label'], batch['file_name']
            
        epoch_iter += image.shape[0]

        image = image.to(device)
        label = label.to(device)

        # training only
        optimizer.zero_grad()

        output = resnet(image)

        # 0, 1, 0, 0 ---> 0.2, 0.6, 0.1, 0.1
        loss = criterion(output, label)       # training

        # prediction - real use
        _, preds = torch.max(output, 1)

        running_loss += loss.item() * image.size(0)
        running_corrects += torch.sum(preds == label.data)

        loss.backward()       # back propagation -> calculate that how much value to update weight
        optimizer.step()      #update weight

        loss_history.append(loss.item() * image.size(0))
        if (epoch_iter % 640 == 0):
          print('{} Loss: {:.4f} Acc: {:.4f}'.format(epoch_iter, loss.item(), running_corrects / epoch_iter))
        
    loss_history_epoch.append(running_loss / epoch_iter)
    accuracy.append(running_corrects / epoch_iter)

    print('Epoch: {} Loss: {:.4f} Acc: {:.4f}'.format(epoch, running_loss / epoch_iter, running_corrects / epoch_iter * 100.0))

## Take home exercises
1. Run the lab instruction. For the dataset part, separate to be 90% of train set and 10% of test set (random). (30 points)
2. Try to create InceptionResNet by your own. Remind that 1 inception block is one ResNet Module. You can use the pattern of InceptionNet from chapter 3. Train the model using CIFAR10 dataset. plot graph and accuracy output. (40 points)
3. Find your own dataset which contains at least 3 classes. If you download from somewhere, please reference it. Make your own dataset class, explain how to setup your data and the label. Train the dataset in ResNet and InceptionResNet, show your results. (30 points)

### Turn-in report

Export the output of the lab in PDF. You can do in the same file or create separate files of your homework and in-class exercise. Submit in PDF file and Jupyter notebook

You don't need to upload dataset.