# Assignment #3
## P556: Applied Machine Learning

More often than not, we will use a deep learning library (Tensorflow, Pytorch, or the wrapper known as Keras) to implement our models. However, the abstraction afforded by those libraries can make it hard to troubleshoot issues if we don't understand what is going on under the hood. In this assignment you will implement a fully-connected and a convolutional neural network from scratch. To simplify the implementation, we are asking you to implement static architectures, but you are free to support variable number of layers/neurons/activations/optimizers/etc. We recommend that you make use of private methods so you can easily troubleshoot small parts of your model as you develop them, instead of trying to figure out which parts are not working correctly after implementing everything. Also, keep in mind that there is code from your fully-connected neural network that can be re-used on the CNN. 

Problem #1.1 (40 points): Implement a fully-connected neural network from scratch. The neural network will have the following architecture:

- Input layer
- Dense hidden layer with 512 neurons, using relu as the activation function
- Dropout with a value of 0.2
- Dense hidden layer with 512 neurons, using relu as the activation function
- Dropout with a value of 0.2
- Output layer, using softmax as the activation function

The model will use categorical crossentropy as its loss function. 
We will optimize the gradient descent using RMSProp, with a learning rate of 0.001 and a rho value of 0.9.
We will evaluate the model using accuracy.

Why this architecture? We are trying to reproduce from scratch the following [example from the Keras documentation](https://keras.io/examples/mnist_mlp/). This means that you can compare your results by running the Keras code provided above to see if you are on the right track.

In [0]:
import numpy as np
from sklearn.metrics import accuracy_score
#RMS prop var


class NeuralLayer_weights():
    def __init__(self,Nodes_present_layer,Nodes_next_layer):
        self.Present_Nodes=Nodes_present_layer
        self.Next_Nodes=Nodes_next_layer
        self.Weights_vector=np.random.rand(Nodes_present_layer,Nodes_next_layer)
        self.grad_sq=np.zeros((Nodes_present_layer,Nodes_next_layer))
        #self.input_nodes=0
        #self.output_nodes=0
    
    def getWeights(self):
        return self.Weights_vector
    
    #updating the weights after back propagation
    def setWeights(self,updated_weights):
        self.Weights_vector=updated_weights
        pass

    def getRMSPROP(self):
        return self.grad_sq
    
    def updateRMSprop(self,new_weights):
        self.grad_sq=new_weights
        pass
        
    def input_node_initial(self,input_nodes):
        self.input_node_initial=input_nodes
    
    def output_node_initial(self, output_nodes):
        self.output_node_inital= output_nodes
        
    def delta_value(self,delta):
        self.delta_value=delta    
    
    
    
    
class NeuralNetwork():
    def __init__(self,epochs, learning_rate,Num_Layer, NodesPerLayer, Dropout_val,rho):
        self.epochs=epochs
        self.learning_rate=learning_rate
        self.Num_Layer=Num_Layer
        self.NodesPerLayer=NodesPerLayer
        self.all_layers_weights=0
        self.Dropout_val=Dropout_val
        self.rho=rho
        
    #relu activation ftn--> takes vect as input and returns tranformed vector
    def relu_activation(self,input_units):
        # relu activation function is 0 for all values less than 0
        cost_ftn=[0 if units<0 else units for units in input_units]
        return cost_ftn 
    
    #return the softmax output
    def softmax_activation_ftn(self,z):
        """# softmax = e^(y)/sum(e^(Y))
        
        output_exp=[np.exp(y) for y in output]
        print(output)
        sum_output_exp=sum(output_exp)
        softmax_out=[out/sum_output_exp for out in output_exp]
        return softmax_out"""
        z_norm=np.exp(z-np.max(z,axis=0,keepdims=True))
        return(np.divide(z_norm,np.sum(z_norm,axis=0,keepdims=True)))
    
    #initialization of weights for all the layers-->x_train(vec) input
    def Layers_weight_initialization(self,x_train):
        input_nodes=x_train.shape[1]
        Nodes_Input=[input_nodes]+self.NodesPerLayer
        Nodes_output=self.NodesPerLayer+[self.NodesPerLayer[-1]]
        layers_weights=[]
        for (i,j) in zip(Nodes_Input,Nodes_output):
            layers_weights.append(NeuralLayer_weights(i,j))
        self.all_layers_weights=layers_weights
        pass

    #calculates the cross entropy on trained data 
    def cross_entropy(self,y_pred,y_act):
      prob_y=[i/sum(y_pred) for i in y_pred]
      err=0
      for i in range(len(prob_y)):
          if y_act[i]==1:
              err=err+(-np.log(prob_y[i]))
          else:
              err=err+(-np.log(1-prob_y[i]))
      return err

    #fits the training data
    def fit(self,x_train,y_train):
        #x_train_bias=np.hstack((np.ones(len(x_train),1)),x_train)
        self.Layers_weight_initialization(x_train)
        accuracy=[]
        for epoch in range(0,epochs):
            for (x,y) in zip(x_train,y_train):
                (activation_val,z)=self.feed_forward(x)
                delta=self.backprop(y,activation_val,z)
                self.updateweights(delta,z,activation_val)
                accuracy.append(self.cross_entropy(activation_val[self.Num_Layer-1],y))
            print("Cross Entropy error on epoch = "+str(epoch+1)+" is "+str(np.mean(accuracy)))

    #checks whether the prediction belongs to correct class
    def evaluate(self,y_actual,y_calcualte):
        #return -np.sum(np.log(y_calcualte) * y_actual, axis=1)
        return y_actual==y_calcualte
           
    def feed_forward(self,x):
        activation_val=[]
        z=[]
        for (layer_index,layer) in enumerate(self.all_layers_weights):
            if layer_index==0:
                activation_val.append(x)
                #z.append(x)
                weight=layer.getWeights()
            else:
                z.append(np.dot(np.transpose(activation_val[layer_index-1]),weight)+1)
                weight=layer.getWeights()
                activation=self.relu_activation(z[layer_index-1])
                if layer_index!=self.Num_Layer-1:
                    activation_val.append(activation)
                else:
                    #activation_val.append(self.softmax_activation_ftn(activation))
                    activation_val.append(activation)
        return (activation_val,z)
    
    def backprop(self,y_train,activation_val,z):
        delta=[]
        for (layer_index,layer) in enumerate(self.all_layers_weights[::-1]):
            if layer_index==0:
                flag=activation_val[self.Num_Layer-1]-y_train
                delta.append(flag)
            else:
                flag=np.dot(layer.getWeights(),delta[layer_index-1])
                g_prime_val=self.relu_prime(z[self.Num_Layer-2-layer_index])
                delta.append(flag*g_prime_val)  
        delta=np.array(delta[::-1])
        return delta[1:]
    
    def relu_prime(self,x):
        return np.max(np.sign(x),0)
        
    def updateweights(self,delta,z,activation_val):
        for currentLayer,index in zip(self.all_layers_weights[:-1],range(self.Num_Layer-1)):
            curr_weights=currentLayer.getWeights()
            if index==0:
                delta_v=delta[index].reshape(-1,1)
                act_val=activation_val[index].reshape(-1,1)
                grad=np.dot(act_val,delta_v.T)*10e-18
            else:
                delta_v=delta[index].reshape(-1,1)
                act_val=z[index-1].reshape(-1,1)
                grad=np.dot(act_val,delta_v.T)*10e-18
            rms_var=(self.rho)*currentLayer.getRMSPROP() + (1-self.rho)*((grad)**2)
            currentLayer.updateRMSprop(rms_var)
            #print(rms_var)
            curr_weights=curr_weights-((self.learning_rate)*np.sqrt(currentLayer.getRMSPROP())*grad)
            currentLayer.setWeights(curr_weights)

    def predict(self,xtest,ytest):
        accuracy=[]
        for (x,y) in zip(xtest,ytest):
                (activation_val,z)=self.feed_forward(x)
                accuracy.append(self.evaluate(self.softmax_activation_ftn(activation_val[self.Num_Layer-1]),y))
        return str(np.mean(accuracy)*100)
                                  

Problem #1.2 (10 points): Train your fully-connected neural network on the Fashion-MNIST dataset using 5-fold cross validation. Report accuracy on the folds, as well as on the test set.

In [282]:
# To simplify the usage of our dataset, we will be importing it from the Keras 
# library. Keras can be installed using pip: python -m pip install keras

# Original source for the dataset:
# https://github.com/zalandoresearch/fashion-mnist

# Reference to the Fashion-MNIST's Keras function: 
# https://keras.io/datasets/#fashion-mnist-database-of-fashion-articles

from keras.datasets import fashion_mnist
import keras

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

## creating Neural Network

Number_layers=4
Nodes_layer=[512,512,num_classes]
learning_rate=0.001
epochs=5
Dropout_val=0.2
rho=0.9

myNet=NeuralNetwork(epochs, learning_rate,Number_layers,Nodes_layer,Dropout_val,rho)
myNet.fit(x_train,y_train)
accuracy=myNet.predict(x_test,y_test)
print("The Accuracy on the 10000 testing data-set is "+str(accuracy))


# k fold cross validation
from sklearn.model_selection import KFold
cv = KFold(n_splits=5, shuffle=False)
for train_index, test_index in cv.split(x_train):
    Number_layers=4
    Nodes_layer=[50,50,num_classes]
    learning_rate=0.001
    epochs=5
    Dropout_val=0.2
    rho=0.9
    myNet_k=NeuralNetwork(epochs, learning_rate,Number_layers,Nodes_layer,Dropout_val,rho)
    x_train_k=x_train[train_index]
    y_train_k=y_train[train_index]
    myNet_k.fit(x_train_k,y_train_k)
    x_test_k=x_train[test_index]
    y_test_k=y_train[test_index]
    accuracy=myNet_k.predict(x_test_k,y_test_k)
    print("The Accuracy on the k-fold testing data-set is "+str(accuracy))

60000 train samples
10000 test samples
Cross Entropy error on epoch = 1 is 3.2547626147868054
Cross Entropy error on epoch = 2 is 3.254762614786805
Cross Entropy error on epoch = 3 is 3.2547626147868045
Cross Entropy error on epoch = 4 is 3.2547626147868045
Cross Entropy error on epoch = 5 is 3.254762614786805
The Accuracy on the 10000 testing data-set is 81.452
Cross Entropy error on epoch = 1 is 3.251656173004386
Cross Entropy error on epoch = 2 is 3.251656173004386
Cross Entropy error on epoch = 3 is 3.251656173004386
Cross Entropy error on epoch = 4 is 3.251656173004386
Cross Entropy error on epoch = 5 is 3.2516561730043865
The Accuracy on the k-fold testing data-set is 66.25583333333334
Cross Entropy error on epoch = 1 is 3.255098772122755
Cross Entropy error on epoch = 2 is 3.2550987721227544
Cross Entropy error on epoch = 3 is 3.255098772122755
Cross Entropy error on epoch = 4 is 3.2550987721227544
Cross Entropy error on epoch = 5 is 3.255098772122754
The Accuracy on the k-fold 

#### The first 5 lines in the output gives the information about the normal test and train implementation i.e. Trained with 60000 images and tested on 10000 images

#### The next set of output represent 5 different iteration of k=5 fold implementation.