# Homework 2, *part 2*
### (60 points total)

In this part, you will build a convolutional neural network (CNN) to solve (yet another) image classification problem: the Tiny ImageNet dataset (200 classes, 100K training images, 10K validation images). Try to achieve as high accuracy as possible.

## Deliverables

* This file.
* A "checkpoint file" `"checkpoint.pth"` that contains your CNN's weights (you get them from `model.state_dict()`). Obtain it with `torch.save(..., "checkpoint.pth")`. When grading, we will load it to evaluate your accuracy.

**Should you decide to put your `"checkpoint.pth"` on Google Drive, update (edit) the following cell with the link to it:**

### [Dear TAs, I've put my "checkpoint.pth" on Google Drive, download it here](https://drive.google.com/file/d/1CJ3gseVFd5ma9MBCSFEbtio0Wyvcr25M/view?usp=sharing )

## Grading

* 9 points for reproducible training code and a filled report below.
* 11 points for building a network that gets above 25% accuracy.
* 4 points for using an **interactive** (please don't reinvent the wheel with `plt.plot`) tool for viewing progress, for example Tensorboard ([with this library](https://github.com/lanpa/tensorboardX) and [an extra hack for Colab](https://stackoverflow.com/a/57791702)). In this notebook, insert screenshots of accuracy and loss plots (training and validation) over iterations/epochs/time.
* 6 points for beating each of these accuracy milestones on the private **test** set:
  * 30%
  * 34%
  * 38%
  * 42%
  * 46%
  * 50%
  
*Private test set* means that you won't be able to evaluate your model on it. Rather, after you submit code and checkpoint, we will load your model and evaluate it on that test set ourselves, reporting your accuracy in a comment to the grade.

Note that there is an important formatting requirement, see below near "`DO_TRAIN = True`".

## Restrictions

* No pretrained networks.
* Don't enlarge images (e.g. don't resize them to $224 \times 224$ or $256 \times 256$).

## Tips

* **One change at a time**: never test several new things at once (unless you are super confident). Train a model, introduce one change, train again.
* Google a lot.
* Use GPU.
* Use regularization: L2, batch normalization, dropout, data augmentation...
* Pay much attention to accuracy and loss graphs (e.g. in Tensorboard). Track failures early, stop bad experiments early.

In [0]:
# Detect if we are in Google Colaboratory
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

from pathlib import Path
# Determine the locations of auxiliary libraries and datasets.
# `AUX_DATA_ROOT` is where 'notmnist.py', 'animation.py' and 'tiny-imagenet-2020.zip' are.
if IN_COLAB:
    google.colab.drive.mount("/content/drive")
    
    # Change this if you created the shortcut in a different location
    AUX_DATA_ROOT = Path("/content/drive/My Drive/Deep Learning 2020 -- Home Assignment 2")
    
    assert AUX_DATA_ROOT.is_dir(), "Have you forgot to 'Add a shortcut to Drive'?"
else:
    AUX_DATA_ROOT = Path(".")

The below cell puts training and validation images in `./tiny-imagenet-200/train` and `./tiny-imagenet-200/val`:

In [0]:
# Extract the dataset into the current directory
if not Path("tiny-imagenet-200/train/class_000/00000.jpg").is_file():
    import zipfile
    with zipfile.ZipFile(AUX_DATA_ROOT / 'tiny-imagenet-2020.zip', 'r') as archive:
        archive.extractall()

**You are required** to format your notebook cells so that `Run All` on a fresh notebook:
* trains your model from scratch, if `DO_TRAIN is True`;
* loads your trained model from `"./checkpoint.pth"`, then **computes** and prints its validation accuracy, if `DO_TRAIN is False`.

In [0]:
DO_TRAIN = True

## Train the model

In [0]:
# Your code here (feel free to add cells)
import torch as t
import numpy as np
import matplotlib.pyplot as plt
import torchvision as tv
from torchvision import transforms,utils
from torch.utils.data import DataLoader,Dataset
from torch import nn
import torch.nn.functional as F
import torch.optim as opt
from IPython import display
from tqdm import tqdm
from torch.utils.tensorboard import SummaryWriter
from torchvision.utils import make_grid
from tqdm import tqdm_notebook
import random
import time
import os

- Data Preparation

In [0]:
#training transformation...
transform_1 = transforms.Compose([
        transforms.RandomRotation(10),
        transforms.RandomHorizontalFlip(),
        transforms.ColorJitter(hue = 0.05, brightness = 0.2, contrast=0.2),
        transforms.ToTensor()])

#validation transformation..    
transform_2 = transforms.Compose([
        transforms.ToTensor()])


train_set = tv.datasets.ImageFolder('tiny-imagenet-200/train', transform=transform_1)
val_set = tv.datasets.ImageFolder('tiny-imagenet-200/val', transform=transform_2)

train_dataloader = t.utils.data.DataLoader(train_set, batch_size=200,shuffle=True)
val_dataloader =  t.utils.data.DataLoader(val_set, batch_size=200,shuffle=True)

In [0]:
train_dataloader

In [0]:
class Flatten(t.nn.Module):
    def forward(self,input):
        return input.view(input.size(0), -1)

model = []
####################################################################
#1st Convolution layer....
model.append(nn.Conv2d(3,64,3,stride=1, padding=1, bias=False))
model.append(nn.BatchNorm2d(64))
model.append(nn.ReLU(inplace=True))
model.append(nn.MaxPool2d(3,padding=1,stride=2))
model.append(nn.Dropout(p=0.3))


######################################################################
#2nd Convolution layer....
model.append(nn.Conv2d(64,2*64,3,stride=1,padding=1,bias=False))
model.append(nn.BatchNorm2d(2*64))
model.append(nn.ReLU(inplace=True))
model.append(nn.Dropout(p=0.3))

######################################################################
#3rd Convolution layer....
model.append(nn.Conv2d(2*64,2*64,3,stride=2,padding=1,bias=False))
model.append(nn.BatchNorm2d(2*64))
model.append(nn.ReLU(inplace=True))
model.append(nn.Dropout(p=0.3))

######################################################################
#4th Convolution layer....
model.append(nn.Conv2d(2*64,4*64,3,stride=1,padding=1,bias=False))
model.append(nn.BatchNorm2d(4*64))
model.append(nn.ReLU(inplace=True))
model.append(nn.Dropout(p=0.3))

########################################################################
#5th Convolution layer....
model.append(nn.Conv2d(4*64,4*64,3,stride=2,padding=1,bias=False))
model.append(nn.BatchNorm2d(4*64))
model.append(nn.ReLU(inplace=True))
########################################################################
#7th Convolution layer....
model.append(nn.Conv2d(4*64,8*64,3,stride=1,padding=1,bias=False))
model.append(nn.BatchNorm2d(8*64))
model.append(nn.ReLU(inplace=True))
model.append(nn.Dropout(p=0.3))

#######################################################################
model.append(nn.AvgPool2d(3,padding=1, stride = 2))
model.append(Flatten())
model.append(nn.Linear(128*64,200))
model.append(nn.LogSoftmax(dim=1))

#######################################################################
model_net =  nn.Sequential(*model)

In [0]:
device = t.device("cuda:0" if t.cuda.is_available() else "cpu")
model_net = model_net.to(device)

In [0]:
print(model_net)

In [0]:
device

In [0]:
criterion = t.nn.CrossEntropyLoss()
adam_opt_ = t.optim.Adam(model_net.parameters())

In [0]:
def get_acc(model, dataloader, device):
    acc_value = 0
    with t.no_grad():
        for img, label in dataloader:
            img = img.to(device)  
            label = label.to(device)  
            pred = model(img).argmax(dim=-1, keepdim=True)
            acc_value += pred.eq(label.view_as(pred)).sum().item()
    return acc_value / len(dataloader.dataset)

In [0]:
# tv.models.resnet18()
# class OwnResnet(nn.Module):
#     def __init__(self):
#         super().__init__()

#         self.resnet = torchvision.models.resnet18()
#         self.resnet.fc = nn.Linear(512, 200)

#     def forward(self, inputs):
#         return self.resnet(inputs)

In [0]:
def random_seeds(seed_=0, device = 0):
    t.cuda.manual_seed(seed_)
    t.cuda.manual_seed_all(seed_)
    t.backends.cudnn.deterministic = True
    t.backends.cudnn.benchmark = False

In [0]:
train_len = len(train_set)
bs=100
n = 30

In [0]:
#%load_ext tensorboard

In [0]:
%%time
def model_training(model,opt,scheduler=None,write = True): 
    #exp_name='my_network'
    writer = SummaryWriter()
    model.to(device)
    criterion = t.nn.CrossEntropyLoss()
    val_acc_ = 0
    val_acc_n = 0
    batches_n = (train_len)//bs
    lrc = [np.nan]*n
    train_acc_curve = [np.nan]*n
    val_acc_curve = [np.nan]*n
    
    for _ in tqdm_notebook(range(n)):
        if scheduler is not None:  
            scheduler.step()       
        model.train()

        lrc[_] = 0
        for img,label in train_dataloader:
            img = img.to(device)
            label = label.to(device)
            opt.zero_grad()
            pred = model(img)
            loss_value = criterion(pred,label)
            lrc[_] += loss_value.item()
            loss_value.backward()
            opt.step()
        
        writer.add_scalar('train loss', loss_value.item(), global_step=len(lrc))
        
        model.eval()
        train_acc_curve[_] = get_acc(model, train_dataloader, device)
        val_acc_curve[_] = get_acc(model, val_dataloader, device)
        val_accuracy = val_acc_curve[_]
        if val_accuracy > val_acc_:
            val_acc_ = val_accuracy
            val_acc_n = _
            if write:
                t.save(model.state_dict(),'checkpoint.pth')
        writer.add_graph(model, img[:8])
        writer.add_scalar('Training accuracy', train_acc_curve[_],_ + 1)
        writer.add_scalar('Validation accuracy', val_acc_curve[_], _ + 1)  
        plt.figure(figsize=(15,5))
        plt.plot(lrc, 'b')
        plt.xlabel('Number of iteration')
        plt.ylabel('Loss value')
        plt.title('Learning Curve')
        plt.show()

        plt.figure(figsize=(15,5))
        plt.plot(train_acc_curve, 'r', label ='Training Acccuracy')
        plt.plot(val_acc_curve, 'g', label ='Validation Accuracy')
        plt.xlabel('Number of iteration')
        plt.ylabel('Accuracy')
        plt.title('Display and Training and Validation Accuracies')
        plt.legend(loc = 'best')
        plt.show()
        print("Training Loss: " ,loss_value)
        print("Training accuracy :%.2f%%" % (train_acc_curve[_]*100))
        print("Validation accuracy: %.2f%%" % (val_accuracy * 100))
        print("Validation Max_accuracy: %.2f%%" % (val_acc_ * 100))
        print("Validation Max_accuracy Epoch: ", val_acc_n)
         #val_accuracy, max_val_accuracy, max_val_accuracy_epoch))
    

random_seeds(device=device)
scheduler = t.optim.lr_scheduler.MultiStepLR(adam_opt_, (20, 30), gamma=.1)  

In [0]:

%load_ext tensorboard
logs_base_dir = "./logs"
os.makedirs(logs_base_dir, exist_ok=True)
%tensorboard --logdir {logs_base_dir}

In [0]:
if DO_TRAIN:
    # Your code here (train your model)
    %%time
    model_training(model_net,adam_opt_,scheduler=scheduler,write = True)

## Load and evaluate the model

In [0]:
# Your code here (load the model from "./checkpoint.pth")
# Please use `torch.load("checkpoint.pth", map_location='cpu')`
if not DO_TRAIN:
  model_net.load_state_dict(t.load('checkpoint.pth',map_location='cpu'))

In [0]:
val_accuracy =  get_acc(model_net,val_dataloader, device) * 100
# Your code here
assert 0 <= val_accuracy <= 100
print("Validation accuracy: %.2f%%" % val_accuracy)

# Report

Below, please mention:

* A brief history of tweaks and improvements.
* Which network architectures have you tried? What is the final one and why?
* What is the training method (batch size, optimization algorithm, number of iterations, ...) and why?
* Which techniques have you tried to prevent overfitting? What were their effects? Which of them worked well?
* Any other insights you learned.

For example, start with:

"I have analyzed these and those conference papers/sources/blog posts. \
I tried this and that to adapt them to my problem. \
The conclusions this task taught me are ..."

- Starting off by studying the [Tiny ImageNet Challenge](http://cs231n.stanford.edu/reports/2017/pdfs/935.pdf) paper by Yinbin Ma of Standford University, the initial approcah to solving this problem is by implementing ResNet_18.

- By applying  the Resnet-18 without pretrained weights, I applied this directly to the ImageNet,and a val_accuracy of 30.9% was achieved.However, In getting a drastic improvement on this, the parameters has to be altered within the layers layout.

- I resolved to building a Neural Network architecture from scratch using  ideas from Con2d-model in seminar -3 coulped with data augumentation.

- The tranformtion that was done on the data was mainly to achieve a better accuracy through data augumentation, I applied some rotation, as well as color transformation on the data.

- My model consist of a simple Convolutional Neural Network with "Relu" as my activation in all layers. Relu seems to be a more effective activation function as I saw from HW1. I added BatchNorm to achieve as much accuracy as possible with lesser training steps. I used 7- Conv2d() layers and I wrapped this  up with  a linear layer. For the output layer, I used  LogSoftMax, and I used cross entropy loss as my loss metric.

- To avoid overfitting , I added a dropout layer after every Relu activation function.

- Training was done with batch-size of 100, epoch of 30, learning rate is 1e-3 and optmization by AdamOptimizer was used. AdamOptimizer has proven to be better than SGD from previous task.

- in my training procedure, I also made a point out of the max_validation accuracy at every iteration and the epoch at which this max accuracy occurs.
Tgis help to show me teh point at which drop in the validation accuracy occurs for several raining trials

- For me, working on this model from scratch and combined with knowledge garnered from seminars, it gives me a better undertadning on training methods, patterns and approach in getting the better result at every level.

- I was able to obtain a training accuracy of = 79.51%, and a validation accuracy of 48.60% 