# Basic Instructions

1. Enter your Name and UID in the provided space.
2. Do the assignment in the notebook itself.
3. You are free to use Google Colab.
4. Upload to Google Drive.
5. Now, enter the Google Drive link in the provided space. (you can do this by opening the iPython notebook uploaded using Google Collab).
6. Submit the assignment to Gradescope.

Name:  **Name Here**  
UID:  **UID Here**

Link to Google Drive : **Link Here (make sure it works)**

Provide your code at the appropriate placeholders.

## 1. Packages

In [21]:
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from tqdm import tqdm
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

In [295]:
%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

np.random.seed(1)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 2. Define your hyperparamters

In [298]:
batch_size = 8
learning_rate = 0.001
epochs = 1000

## 3. Dataset and Preprocessing

**Dataset Description**: 

**Preprocessing instructions:** As usual, you reshape and standardize the images before feeding them to the network.

<img src="imvectorkiank.png" style="width:450px;height:300px;">

<caption><center> <u>Figure 1</u>: Image to vector conversion. <br> </center></caption><br>

We are going to get help from  *torch.utils.data.Dataset*, which is an abstract class representing a dataset. Our *birdvnonbird* class should inherit Dataset and override the following methods:

* __\_\_len\_\___ : returns the size of the dataset
* __\_\_getitem\_\___ : helps indexing and creating the batches.

**Exercise:** Implement the *birdvnonbird* class that will help to load your data.

In [311]:
class birdvnonbird(Dataset):
    """
    Helps to load the dataset
    
    Arguments:
    file_name -- Data file name
    mode -- There are three modes possible: train, test, evaluate
    
    Returns:
    train mode -- x_train, y_train
    test mode -- x_test, y_test
    evaluate mode -- x_eval
    
    x has the shape (batch_size, input_layer_size)
    y has the shape (batch_size, output_layer_size)
    """
    
    def __init__(self, file_name, mode = 'train'):
        self.mode = mode
        self.file_name = file_name
        
        #Load the data file
        ### START CODE HERE ###
        data = h5py.File(file_name, "r")
        data = torch.load(file_name, "r")
        print(data)
        ### END CODE HERE ###
        
        #Extract x,y for training/testing/evaluation.
        if self.mode == 'train':
            self.set_x = np.array(data["train_set_x"])
            self.set_y = np.array(data["train_set_y"])
            self.classes = np.array(data["list_classes"][:])
            self.set_y = self.set_y.reshape((self.set_y.shape[0],1))
            
        if self.mode == 'test':
            self.set_x = np.array(data["test_set_x"])
            self.set_y = np.array(data["test_set_y"])
            self.classes = np.array(data["list_classes"][:])
            self.set_y = self.set_y.reshape((self.set_y.shape[0],1))
            
        if self.mode == 'evaluate':
            self.set_x = np.array(data["evaluate_set_x"])
            self.classes = np.array(data["list_classes"][:])
            
        self.set_x = (self.set_x.reshape(self.set_x.shape[0], -1))/255. 
        
    def __len__(self):
        return len(self.set_x)
    
    def __getitem__(self, idx):
        if self.mode == 'train':
            ## Return x_train, y_train
            return torch.Tensor(self.set_x[idx]),torch.Tensor(self.set_y[idx])
        if self.mode == 'test':
            ## Return x_test, y_test
            return torch.Tensor(self.set_x[idx]),torch.Tensor(self.set_y[idx])
        if self.mode == 'evaluate':
            ## Return only x_eval
            return torch.Tensor(self.set_x[idx])

In [313]:
train_file="data/train_catvnoncat.h5"
test_file="data/test_catvnoncat.h5"

data_file = "toy_dataset.pth"
train_set = birdvnonbird(data_file,"train")
test_set = birdvnonbird(test_file,"test")

train_dataloader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

inputs,labels = next(iter(train_dataloader))
print(inputs.shape, labels.shape)

OSError: Unable to open file (file signature not found)

## 2. Architecture of your model

Now that you are familiar with the dataset, it is time to build a deep neural network to distinguish bird images from non-bird images.

###  2-layer neural network

<!-- <img src="2layerNN_kiank.png" style="width:650px;height:400px;">
<caption><center> <u>Figure 2</u>: 2-layer neural network. <br> The model can be summarized as: ***INPUT -> LINEAR -> RELU -> LINEAR -> SIGMOID -> OUTPUT***. </center></caption> -->

<!-- <u>Detailed Architecture of figure 2</u>: -->

- The input is a (64,64,3) image which is flattened to a vector of size $(12288,1)$. 
- The corresponding vector: $[x_0,x_1,...,x_{12287}]^T$ is then multiplied by the weight matrix $W^{[1]}$ of size $(n^{[1]}, 12288)$.
- You then add a bias term and take its relu to get the following vector: $[a_0^{[1]}, a_1^{[1]},..., a_{n^{[1]}-1}^{[1]}]^T$.
- You multiply the resulting vector by $W^{[2]}$ and add your intercept (bias). 
- Finally, you take the sigmoid of the result. If it is greater than 0.5, you classify it to be a cat.


###  General methodology

As usual you will follow the Deep Learning methodology to build the model:
    1. Initialize parameters / Define hyperparameters
    2. Loop for num_iterations:
        a. Forward propagation
        b. Compute loss function
        c. Backward propagation
        d. Update parameters (using parameters, and grads from backprop) 
    4. Use trained parameters to predict labels

Let's now implement those the model! 

$$\color{red}{text here}$$

In [228]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input a feature of dimenstion 1024.
        self.fc1 = nn.Linear(64*64*3, 7)
        self.fc2 = nn.Linear(7, 1)
        

    def forward(self, x):     #This is the forward propagation function which will be called everytime during forward pass
        #x is the input that we will give in the network.
        x = self.fc1(x) #Passsing the function through the first Fully connected layer
        x = torch.relu(x) #Applying the sigmoid activation to the outputof the first fc layer
        x = self.fc2(x)
        x = torch.sigmoid(x)

        return x

## 4. Define your hyperparamters

In [284]:
batch_size = 16
learning_rate = 0.001
epochs = 1000

## 5. Initialize the Dataloaders

In [285]:
train_dataloader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

# for idx, batch in enumerate(test_dataloader):
#     inputs,labels = batch
#     print(inputs.shape,labels.shape,labels)
#     break

## 6. Define the optimizer and loss function

In [286]:
model = Net()  #making an instance of the network
optim = torch.optim.SGD(model.parameters(), lr = learning_rate)  #model.paramters() gives all the trainable paramters. 
loss_function = nn.BCELoss() #many people call this the criterion also.

## 7. Defining the for loop for train and validation phase

### In each the phases certain things one has to be careful of:

- Training Phase:
  - Make sure the model is in train mode. That is ensured by `model.train()`

  - While looping over instances of a batch, make sure the graidents are always set to zero before calling the backward function. That's done by `optim.zero_grad()`. If this is not done, the gradients get accumulated.

  - Call the backward function on the loss by `loss.backward()` so that the loss get back propagated.

  - Call the step function of the optimiser to update the weights of the network. This is done by `optim.step()`

- Validation/Testing Phase
  - Make sure your model is in eval mode. This makes the model deterministic rather than probabilistic. This is ensured by `model.eval()`
  - As we don't need any gradients doing our validation/ testing phase, we can esnure that they are not calculated by defining a block with `torch.no_grad()`

In [289]:
for epoch in range(epochs):

    #Training phase
    model.train()  #Setting the model to train phase
    train_loss = []
    train_correct = 0.
    train_count = 0.
    train_acc = 0.
    test_acc = 0.
    
    for idx, batch in enumerate(train_dataloader):
        inputs,labels = batch
        outputs = model(inputs)
        
        with torch.no_grad():
            prd = torch.argmax(outputs,1)
            train_correct = train_correct + torch.sum(prd==labels.squeeze(1)).item()  
            train_count = train_count + prd.shape[0]
        loss = loss_function(outputs,labels)
        train_loss.append(loss.item())

        optim.zero_grad()  #Setting the gradients to be zero
        loss.backward()  #This propagates the loss backward to the entire network, hence calculating the gradients of each weight
        optim.step()  #Updates the paramters with the graidents calculated in the previous step
    train_acc = train_correct/train_count*100
    #Validation phase
    model.eval()  #Setting the model to eval mode, hence making it deterministic.
    val_loss = []

    for idx, batch in enumerate(test_dataloader):
        with torch.no_grad():   #Does not calulate the graidents, as in val phase its not needed. Saves on memory.
            inputs,labels = batch
            outputs = model(inputs)
            loss = loss_function(outputs,labels)
            val_loss.append(loss.item())

    if epoch%100==0:
        print("Epoch : {}, Train loss: {} , Train Acc: {}, Val loss: {}".format(epoch, np.mean(train_loss), train_acc, np.mean(val_loss)))

  #You can caluculate the accuracy too, I got too lazy to write the code for it.
    

Epoch : 0, Train loss: 0.07297294453850814 , Train Acc: 65.55023923444976, Val loss: 0.6431166678667068
Epoch : 100, Train loss: 0.07406534467424665 , Train Acc: 65.55023923444976, Val loss: 0.6512221843004227
Epoch : 200, Train loss: 0.06880053452083043 , Train Acc: 65.55023923444976, Val loss: 0.6609604209661484
Epoch : 300, Train loss: 0.0662411672196218 , Train Acc: 65.55023923444976, Val loss: 0.6675984412431717
Epoch : 400, Train loss: 0.0600972996492471 , Train Acc: 65.55023923444976, Val loss: 0.6766654998064041
Epoch : 500, Train loss: 0.061264866430844576 , Train Acc: 65.55023923444976, Val loss: 0.6835901960730553
Epoch : 600, Train loss: 0.059312045840280395 , Train Acc: 65.55023923444976, Val loss: 0.6919058561325073
Epoch : 700, Train loss: 0.05377317241592599 , Train Acc: 65.55023923444976, Val loss: 0.6964394822716713
Epoch : 800, Train loss: 0.05292575926120792 , Train Acc: 65.55023923444976, Val loss: 0.7021104469895363
Epoch : 900, Train loss: 0.05079331293901695 , T

In [327]:
predictions_train = predict(train_x, train_y, parameters)

(1, 209) (1, 209)
Accuracy: 0.9999999999999998


In [328]:
predictions_test = predict(test_x, test_y, parameters)

(1, 50) (1, 50)
Accuracy: 0.72


***Exercise:***
Identify the hyperparameters in the model and For each hyperparameter
- Briefly explain its role
- Explore a range of values and describe their impact on (a) training loss and (b) test accuracy
- Report the best hyperparameter value found.

Note: Provide your results and explanations in the answer for this question.

**Type your answer here**

##  8. Results Analysis

First, let's take a look at some images the 2-layer model labeled incorrectly. This will show a few mislabeled images.

In [329]:
def explore_result(image_ids, probabilities):
    if image_ids.shape != (50, ) or probabilities.shape != (50, ):
        'make sure your arrays are correct shape'
    else:
        results_obj = {}
        results_obj['images'] = image_ids
        results_obj['probabilities'] = probabilities
        torch.save(results_obj)

In [290]:
explore_result(image_ids, probabilities)