# CS331 - Spring 2021 - Phase 1 [15%]

*__Submission Guidelines:__*
- Naming convention for submission of this notebook is `groupXX_Phase1.ipynb` where XX needs to be replaced by your group number. For example: group 1 would rename their notebook to `group01_Phase1.ipynb`
- Only the group lead is supposed to make the submission
- All the cells <b>must</b> be run once before submission. If your submission's cells are not showing the results (plots etc.), marks wil be deducted
- Only the code written within this notebook will be considered while grading. No other files will be entertained
- You are advised to follow good programming practies including approriate variable naming and making use of logical comments 

The university honor code should be maintained. Any violation, if found, will result in disciplinary action. 


#### <b>Introduction</b> 
This is the first of the three phases of this offering's project. To give an overview of this phase, we will essentially be building everything from scratch. The dataset that we will be using for this project is Fashion_MNIST dataset. This dataset consists of 70,000 images of fashion/clothing items belonging to 10 different categories/classes. It has furhter been divided into 60,000 training images and 10,000 test images and each image is a 28*28 grayscale image (hence 1 color channel). It is recommended that you go through  [this link](https://www.kaggle.com/zalando-research/fashionmnist) to familiarize yourself with the dataset

You will begin by manually loading the dataset in this notebook (more instructions on this will follow) followed by from-scratch implementation of a Neural Netowrk (NN). Once done, you will have to tweak the hyperparameters (such as learning rate, number of epochs etc.) to get the best results for your NN's implementation

###### <b>You will strictly be using for-loops fort this phase's implementation of NN (unless specified otherwise in the sub-section)

###### Modification of the provided code without prior discussion with the TAs will result in a grade deduction</b>

---

###### <b>Side note</b>
The `plot_model` method will only work if you have the `pydot` python package installed along with [Graphviz](https://graphviz.gitlab.io/download/). If you do not wish to use this then simply comment out the import for `pydot`

###### <b>Need Help?</b>
If you need help, please refer to the course staff ASAP and do not wait till the last moment as they might not be available on very short notice close to deadlines

#### <b>Before You Begin</b>

Skeleton code is provided to get you started. The main methods that you need to implement correspond to the four steps of the training process of a NN which are as follows:
1. Initialize variables and initialize weights
2. Forward pass
3. Backward pass AKA Backpropogation
4. Weight Update AKA Gradient Descent

__Look for comments in the code to see where you are supposed to write your code__ 

A `fit` function is what combines the previous three functions and overall trains the network to __fit__ to the provided training examples. The provided `fit` methods requires all the four steps of the training process to be working correctly. The function has been setup in a way that it expects the above four methods to take particular inputs and return particular outputs. __You are supposed to work within this restriction__ 



__To see if your model is working correctly, you need to make sure that your model loss is going down during training__


In [None]:
# making all the necessary imports here

import numpy as np
import pandas as pd
import time
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg # New import added for extraction
plt.style.use('seaborn')
from IPython.display import Image
import pydot
from tqdm import tqdm_notebook
import seaborn as sns
from keras.datasets import fashion_mnist
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
from sklearn.datasets import make_moons
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import confusion_matrix,classification_report
from google.colab import drive
import glob
import cv2

In [None]:
# This fucntion will be used to plot the confusion matrix at the end of this notebook

def plot_confusion_matrix(conf_mat):
    classes = ['T-shirt/top','Trouser/pants','Pullover shirt','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot']
    df_cm = pd.DataFrame(conf_mat,classes,classes)
    plt.figure(figsize=(15,9))
    sns.set(font_scale=1.4)
    sns.heatmap(df_cm, annot=True,annot_kws={"size": 16})
    plt.show()

class_labels = ['T-shirt/top','Trouser/pants','Pullover shirt','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot']

In [None]:
# Enter group lead's roll number here. This will be used for plotting purposes

rollnumber = 22100282

#### __Read dataset__

Get paths for all the training and test images in the dataset and print the length of training and test paths' list. For this purpose you can use glob. You can have a look [here](https://www.geeksforgeeks.org/how-to-use-glob-function-to-find-files-recursively-in-python/) on how to use glob. The dataset that has been provided to you guys is a truncated version of the Fashion MNIST dataset (having 2000 training images and 1600 test images, only)

In [None]:
# Mounting Google Drive here
drive.mount('/drive')

# Edit this address so that it points to the dataset's zipped file on your Google Drive
!unzip -o -q "/drive/MyDrive/Project/dataset.zip" -d "/content/data/"

Mounted at /drive


In [None]:
classes = 10  # do not change this
X_train = None  # you must store the training images in this varaible 
y_train = None  # you must store the training images' labels in this varaible
X_test = None   # you must store the test images in this varaible
y_test = None   # you must store the test images' labels in this varaible

###### Code Here ######
'''Please note that you will have to extract and one-hot encode the labels of the images for both y_train and y_test'''

# TODO: The pixels already seem normalized to me when I outputted it but I am still not sure

# --------------Have rigorously tested using for loops that each pixel is normalized in both the X_train and X_test datasets.

# test_path = "/content/data/test"
# train_path = "/content/data/train"


test_paths = []
train_paths = []

X_train = []
X_test = []
y_train = []
y_test = []

for name in glob.glob("/content/data/train/*/*"):
  train_paths.append(name)

for name in glob.glob("/content/data/test/*/*"):
  test_paths.append(name)



for train_path in train_paths:
  X_train.append(mpimg.imread(train_path))

for test_path in test_paths:
  X_test.append(mpimg.imread(test_path))

# One hot encoding the data

# --------------Tuples were causing issues with comparison operator ==. They would return a vector of True/False, rather than just one bool value.

for train_path in train_paths:

  if train_path.find("anklefoot") != -1:
    y_train.append([1,0,0,0,0,0,0,0,0,0])

  elif train_path.find("bag") != -1:
    y_train.append([0,1,0,0,0,0,0,0,0,0])

  elif train_path.find("coat") != -1:
    y_train.append([0,0,1,0,0,0,0,0,0,0])

  elif train_path.find("dress") != -1:
    y_train.append([0,0,0,1,0,0,0,0,0,0])


  elif train_path.find("pants") != -1:
    y_train.append([0,0,0,0,1,0,0,0,0,0])

  elif train_path.find("pullovershirt") != -1:
    y_train.append([0,0,0,0,0,1,0,0,0,0])

  elif train_path.find("sandal") != -1:
    y_train.append([0,0,0,0,0,0,1,0,0,0])

  elif train_path.find("shirt") != -1:
    y_train.append([0,0,0,0,0,0,0,1,0,0])

  elif train_path.find("sneaker") != -1:
    y_train.append([0,0,0,0,0,0,0,0,1,0])

  elif train_path.find("top") != -1:
    y_train.append([0,0,0,0,0,0,0,0,0,1])
  else:
    print("SOMETHING WENT WRONG!")



for test_path in test_paths:

  if test_path.find("anklefoot") != -1:
    y_test.append([1,0,0,0,0,0,0,0,0,0])

  elif test_path.find("bag") != -1:
    y_test.append([0,1,0,0,0,0,0,0,0,0])

  elif test_path.find("coat") != -1:
    y_test.append([0,0,1,0,0,0,0,0,0,0])

  elif test_path.find("dress") != -1:
    y_test.append([0,0,0,1,0,0,0,0,0,0])


  elif test_path.find("pants") != -1:
    y_test.append([0,0,0,0,1,0,0,0,0,0])

  elif test_path.find("pullovershirt") != -1:
    y_test.append([0,0,0,0,0,1,0,0,0,0])

  elif test_path.find("sandal") != -1:
    y_test.append([0,0,0,0,0,0,1,0,0,0])

  elif test_path.find("shirt") != -1:
    y_test.append([0,0,0,0,0,0,0,1,0,0])

  elif test_path.find("sneaker") != -1:
    y_test.append([0,0,0,0,0,0,0,0,1,0])

  elif test_path.find("top") != -1:
    y_test.append([0,0,0,0,0,0,0,0,0,1])
  else:
    print("SOMETHING WENT WRONG!")

# We need to convert them to numpy arrays as the rest of the code relies on it.

X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)

# X_train images are of 28 X 28 pixels


print("Number of training sample: ", X_train.shape[0])  # You can change len(X_train) based on your implementation such that total number of training samples is printed

print("Number of testing sample: ", X_test.shape[0])   # You can change len(X_test) based on your implementation such that total number of test samples is printed

Number of training sample:  2000
Number of testing sample:  1600


#### __NN Implementation__
Your implementation of NN needs to use the `sigmoid` activation function for the hidden layer(s) and the `softmax` activation function for the output layer. The NN model you will be creating here will consits of only three layers: 1 input layer, 1 hidden layer and 1 output layer

In [None]:
class NeuralNetwork():
    
    @staticmethod                    
    def cross_entropy_loss(y_pred, y_true):  
        ###### Code Here ######
        ln_y_pred = np.log(y_pred)
        ln_one_minus_y_pred = np.log(1-y_pred)

        sum = -1*((y_true * ln_y_pred) + ((1-y_true)*ln_one_minus_y_pred))

        sum2 = np.sum(sum)
        return sum2/y_pred.shape[0]
        
  
    @staticmethod
    def accuracy(y_pred, y_true):
        ###### Code Here ######
        
        correctly_classified = 0

        for pred,true in zip(y_pred,y_true):
          if pred == true:
            correctly_classified += 1

        
        return correctly_classified*100.0/len(y_true)
    
    
    @staticmethod
    def softmax(x):
        ###### Code Here ######

        # calculate the vector found by exponentiating
        x = np.exp(x)  

        # Sum up all its values
        sum = np.sum(x)


        # softmax definition (e^zi/sum(e^zi)
        return x/sum
    
    
    @staticmethod
    def sigmoid(x):
        ###### Code Here ######
        
        return 1.0/(1.0+np.exp(-x))
                            
    def __init__(self, input_size, hidden_nodes, output_size):
        '''Creates a Feed-Forward Neural Network.
        The parameters represent the number of nodes in each layer (total 3). 
        Look at the inputs to the function'''
        
        self.num_layers = 3
        self.input_shape = input_size
        self.hidden_shape = hidden_nodes
        self.output_shape = output_size
        
        self.weights_ = []
        self.biases_ = []
        self.__init_weights()
    
    def __init_weights(self):
        '''Initializes all weights based on standard normal distribution and all biases to 0.'''
        
        ###### Code Here (Replace 'None' by appropriate values/varaibles) ######
        
        


        W_h = np.random.normal(size=(self.input_shape,self.hidden_shape))   
        b_h = np.zeros(shape=(1,self.hidden_shape))                         # This is a 2-D numpy array of shape (1,n) for later use in forward_pass function.

        W_o = np.random.normal(size=(self.hidden_shape,self.output_shape))   
        b_o = np.zeros(shape=(1,self.output_shape))                          # This is a 2-D numpy array of shape (1,n) for later use in forward_pass function.
        
        # self.weights_ becomes a list of np.arrays. 0th index has W_h and 1st index has W_o
        self.weights_.append(W_h)  
        self.weights_.append(W_o)  

        # self.biases_ becomes a list of np.arrays. 0th index has b_h and 1st index has b_o
        self.biases_.append(b_h)
        self.biases_.append(b_o)



                          
    def forward_pass(self, input_data):
        '''Executes the feed forward algorithm.
        "input_data" is the input to the network in row-major form
        Returns "activations", which is a list of all layer outputs (excluding input layer of course)'''

        ###### Code Here ######
        activations = []


        sum = 0
        dot = np.empty((1,self.hidden_shape))
        cols = self.weights_[0].shape[1]
        for i in range(0,cols):
          for j in range(0,self.input_shape):
            sum = sum + self.weights_[0][j,i]*input_data[0,j]     # for loop to help in finding activations in hidden layer (basically wa)
          dot[0,i] = sum
          sum = 0
        activation_hidden_layer = self.sigmoid(dot + self.biases_[0])  #adding bias to get activations (a_new = wa + b)

        sum = 0
        dot = np.empty((1,self.output_shape))
        cols = self.weights_[1].shape[1]
        for i in range(0,cols):
          for j in range(0,self.hidden_shape):
            sum = sum + self.weights_[1][j,i]*activation_hidden_layer[0,j]   #Same for next layer
          dot[0,i] = sum
          sum = 0
        activation_output_layer = self.softmax(dot + self.biases_[1])

        activations.append(activation_hidden_layer)
        activations.append(activation_output_layer)

        
        return activations            # List has to be returned, not numpy array (will give error + even mentioned in the prompt above)
        #The list returned contains two 2-D (1,n) numpy arrays. Don't modify this. It's supposed to be this way.

                          
    def backward_pass(self, targets, layer_activations):
        '''Executes the backpropogation algorithm.
        "targets" is the ground truth/labels
        "layer_activations" are the return value of the forward pass step
        Returns "deltas", which is a list containing weight update values for all layers (excluding the input layer of course)'''
        
        ###### Code Here ######


        gradient = -1*(targets/layer_activations[1] - (1-targets)/(1-layer_activations[1]))   # Calculating del Cost to backpropagate to hidden layer
        soft_derivative1 = layer_activations[1]*(1-layer_activations[1])                      # Calculating derivative of softmax
        output_error = gradient*soft_derivative1                                              # Output error given by multiplying above mentioned things

        soft_derivative0 = layer_activations[0]*(1-layer_activations[0])              # derivative of sigmoid
        deltas = []        

        
        sum = 0
        dot = np.empty((1,self.hidden_shape))
        cols = self.weights_[1].shape[0]
        for i in range(0,cols):
          for j in range(0,self.output_shape):
            sum = sum + self.weights_[1][i,j]*output_error[0,j]                # Doing dot (matrix multiplication) product of output error with  
          dot[0,i] = sum                                                              # weights of hidden layer to back propagate error
          sum = 0
        error = dot*soft_derivative0
    
        deltas.append(output_error)
        deltas.append(error)

        return deltas
                           
    def weight_update(self, deltas, layer_inputs, lr):
        '''Executes the gradient descent algorithm.
        "deltas" is return value of the backward pass step
        "layer_inputs" is a list containing the inputs for all layers (including the input layer)
        "lr" is the learning rate'''
        
        ###### Code Here ######

    
        delC_by_delW_output = np.empty((self.hidden_shape,self.output_shape))         # Creating empty matrix for weight updates

        for i in range(0,self.hidden_shape):
            delC_by_delW_output[i] = layer_inputs[1][0,i]*deltas[0][0]                # Gradient descent for output layer: dot of error with inputs
                                                                                      # One for loop used because each row of desired array can be obtained  
        delC_by_delB_output = deltas[0]   # Bias updates                              # by essentially multiplying each element (scalar) of one array  
                                                                                      # with others entire row
        self.weights_[1] = self.weights_[1] - lr*(delC_by_delW_output)   # Updating weights
        self.biases_[1] = self.biases_[1] - lr*(delC_by_delB_output)     # Updating biases

        delC_by_delW_hidden = np.empty((self.input_shape,self.hidden_shape))   # Repeat process for hidden layer

        for i in range(0,self.input_shape):
            delC_by_delW_hidden[i] = layer_inputs[0][0,i]*deltas[1][0]
        
        delC_by_delB_hidden = deltas[1]

        self.weights_[0] = self.weights_[0] - lr*(delC_by_delW_hidden)
        self.biases_[0] = self.biases_[0] - lr*(delC_by_delB_hidden)


    ###### Do Not Change Anything Below this line in This Cell ######
    
    def fit(self, Xs, Ys, epochs, lr=1e-3):
            history = []
            for epoch in tqdm_notebook(range(epochs)):
                num_samples = Xs.shape[0]
                for i in range(num_samples):

                    sample_input = Xs[i,:].reshape((1,self.input_shape))
                    sample_target = Ys[i,:].reshape((1,self.output_shape))
                    
                    activations = self.forward_pass(sample_input)   # Call forward_pass function 
                    deltas = self.backward_pass(sample_target, activations)    # Call backward_pass function 
                    layer_inputs = [sample_input] + activations[:-1]
                    
                    # Call weight_update function 
                    self.weight_update(deltas, layer_inputs, lr)
                
                preds = self.predict(Xs)   # Call predict function 

                current_loss = self.cross_entropy_loss(preds, Ys)
                
                if  epoch==epochs-1:
                  confusion_mat=confusion_matrix(Ys.argmax(axis=1), preds.argmax(axis=1),labels=np.arange(10))  
                  plot_confusion_matrix(confusion_mat)
                  report = classification_report(Ys, np_utils.to_categorical(preds.argmax(axis=1),num_classes=classes), target_names=class_labels)
                  print(report)
                history.append(current_loss)
            return history

    def predict(self, Xs):
        '''Returns the model predictions (output of the last layer) for the given "Xs".'''
        predictions = []
        num_samples = Xs.shape[0]
        for i in range(num_samples):
            sample = Xs[i,:].reshape((1,self.input_shape))
            sample_prediction = self.forward_pass(sample)[-1]
            predictions.append(sample_prediction.reshape((self.output_shape,)))
        return np.array(predictions)

    def evaluate(self, Xs, Ys):
        '''Returns appropriate metrics for the task, calculated on the dataset passed to this method.'''
        pred = self.predict(Xs)
        return self.cross_entropy_loss(pred, Ys), self.accuracy(pred.argmax(axis=1), Ys.argmax(axis=1))
  
    def plot_model(self, filename):
        '''Provide the "filename" as a string including file extension. Creates an image showing the model as a graph.'''
        graph = pydot.Dot(graph_type='digraph')
        graph.set_rankdir('LR')
        graph.set_node_defaults(shape='circle', fontsize=0)
        nodes_per_layer = [self.input_shape, self.hidden_shape, self.output_shape]
        for i in range(self.num_layers-1):
            for n1 in range(nodes_per_layer[i]):
                for n2 in range(nodes_per_layer[i+1]):
                    edge = pydot.Edge(f'l{i}n{n1}', f'l{i+1}n{n2}')
                    graph.add_edge(edge)
        graph.write_png(filename)

In [None]:
# These are what we call the hyperparameters (a.k.a Black Magic). You need to research on them and tweak them to see what generates the best result for you 

# This is because there are 784 pixels in that image already
INPUT_SIZE = 784        # must be an int, this number represents the numeber of nodes/neurons in the input layer of the network

HIDDEN_NODES = 15     # must be an int, this number represents the numeber of nodes/neurons in the only hidden layer of the network
OUTPUT_SIZE = 10      # must be an int, this number represents the numeber of nodes/neurons in the output layer of the network
EPOCHS = 100         # must be an int
LEARNING_RATE = 1e-3

In [None]:
start = time.time()

nn = NeuralNetwork(input_size = INPUT_SIZE, hidden_nodes = HIDDEN_NODES, output_size = OUTPUT_SIZE)
history = nn.fit(X_train, y_train, epochs=EPOCHS, lr=LEARNING_RATE)
plt.plot(history);
plt.gca().set(xlabel='Epoch', ylabel='Cross-entropy', title='Training Plot {}'.format(rollnumber));
end = time.time()

print("Runtime of the algorithm is ", round((end - start),3)," seconds")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


HBox(children=(FloatProgress(value=0.0), HTML(value='')))

In [None]:
nn.evaluate(X_test,y_test)