# Task-1:
Choose any one deep learning algorithm of your choice and try to
implement it from scratch.
Important points:
<ul>
● You can use modules like JAX, Numpy, etc. for your
implementations.<br>
● You should be able to explain the mathematical concept
behind your implementation.<br>
 </ul>
Judging Criteria:<br>
● Structure of your code.<br>
● Math and code implementation.<br>
<br>
Take the given Implementation as a reference: <a href="https://github.com/Math-behind-AI/ScratchAI/tree/main/traditional_ML_algorithms">Link </a><br>

Note: For your reference, the link attached herewith shows implementations
of a few ML algorithms. However, the task is to implement DL algorithms
from scratch.<br>

A few of the DL algorithms you can implement but are not limited
to are Multi-layer perceptron, Convolutional Neural Nets,
Recurrent Neural Nets, etc.

## What is Multi-Layer Perceptron (MLP)

<img src="https://media.geeksforgeeks.org/wp-content/uploads/nodeNeural.jpg">


## Working of MLP

The Basic steps are as follows:<br>
<ol>
    <li>Initialize the weights and bias with small-randomized values</li>
    <li>Propagate all values in the input layer until output layer(Forward Propagation)</li>
    <li>Update weight and bias in the inner layers(Backpropagation)</li>
    <li>Do it until that the stop criterion is satisfied !</li>
</ol>

### Step 1: Initializing Weights

### Step 2: Forward propagation

In order to proceed we need to improve the notation we have been using. That for, for each layer $1\geq l\geq L$, the activations and outputs are calculated as:

$$
\text{L}^l_j = {\sum_i w^l_{ji} x^l_i\, = w^l_{j,0} x^l_0 + w^l_{j,1} x^l_1 + w^l_{j,2} x^l_2 + ... + w^l_{j,n}} x^l_n,
$$
$$Y^l_j = g^l(\text{L}^l_j)\,,$$

$$\{y_{i},\,x_{i1},\ldots ,x_{ip}\}_{i=1}^{n}$$

where:

* $y^l_j$ is the $j-$th output of layer $l$,
* $x^l_i$ is the $i$-th input to layer $l$,
* $w^l_{ji}$ is the weight of the $j$-th neuron connected to input $i$,
* $\text{L}^l_{j}$ is called net activation, and
* $g^l(\cdot)$ is the activation function of layer $l$.

### Step 3. Activation Functions


### Step 4. Backpropagation Algorithm
<img src="https://sebastianraschka.com/images/faq/visual-backpropagation/backpropagation.png">
### In Output Layer,  $L = 2:$
   - **Step 1**: Calculate error in output layer: $\delta^{(L2)} = -({d_j}^{(L2)} - {y_j}^{(L2)})\cdot
   g'({S_j}^{(L2)})$
   
      `
      ERROR_output = self.OUTPUT - self.OUTPUT_L2
      DELTA_output = ((-1)*(ERROR_output) * self.deriv(self.OUTPUT_L2))
      `
      

   - **Step 2**: Update all weight between hidden and output layer: $W^{(L2)} = W^{(L2)} -\gamma \cdot(\delta^{(L2)}  - {S_j}^{(L1)})$
   
         for i in range(self.hiddenLayer):`
           ` for j in range(self.OutputLayer):`
               ` self.WEIGHT_output[i][j] -= (self.learningRate * (DELTA_output[j] * self.output_l1[i]))`
               ` self.BIAS_output[j] -= (self.learningRate * DELTA_output[j])`
               
   - **Step 3**: Update bias value in output layer: $bias^{(L2)} = bias^{(L2)} - \gamma \cdot \delta^{(L2)}$
   
### In Input Layer , $L = 1$:
   - **Step 4**: Calculate error in hidden layer: $\delta^{(L1)} = W^{(L2)} \cdot \delta^{(L2)} \cdot g'({S_j}^{(L1)})$
     
   `delta_hidden = np.matmul(self.WEIGHT_output, DELTA_output) * self.deriv(self._l1)`
   - **Step 5**: Update all weight between hidden and output layer: $W^{(L1)} = W^{(L1)} -\gamma \cdot(\delta^{(L1)}  - {X_i})$
         `for i in range(self.OutputLayer):`
           `for j in range(self.hiddenLayer):`
               `self.WEIGHT_hidden[i][j] -= (self.learningRate * (DELTA_hidden[j] * INPUT[i]))`
               `self.BIAS_hidden[j] -= (self.learningRate * DELTA_hidden[j])`
   - **Step 6**: Update bias value in output layer: $bias^{(L1)} = bias^{(L1)} - \gamma \cdot \delta^{(L1)}$

## Importing Libraries

In [26]:
import numpy as np
import pandas as pd
import math
import random
from sklearn.datasets import load_digits

##  Preprocessing the data

In [133]:
data = load_digits()
print(data['DESCR'])

.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 1797
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each blo

In [147]:
X = pd.DataFrame(data.data)
y = data.target

In [148]:
X.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,54,55,56,57,58,59,60,61,62,63
0,0.0,0.0,5.0,13.0,9.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,6.0,13.0,10.0,0.0,0.0,0.0
1,0.0,0.0,0.0,12.0,13.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,11.0,16.0,10.0,0.0,0.0
2,0.0,0.0,0.0,4.0,15.0,12.0,0.0,0.0,0.0,0.0,...,5.0,0.0,0.0,0.0,0.0,3.0,11.0,16.0,9.0,0.0
3,0.0,0.0,7.0,15.0,13.0,1.0,0.0,0.0,0.0,8.0,...,9.0,0.0,0.0,0.0,7.0,13.0,13.0,9.0,0.0,0.0
4,0.0,0.0,0.0,1.0,11.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,2.0,16.0,4.0,0.0,0.0


In [152]:
y[:20]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Normalizing the X Label

In [153]:
X1 = X.copy()
X1 = (X1 - X1.mean())/X1.std()
X1.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,54,55,56,57,58,59,60,61,62,63
0,,-0.334923,-0.043069,0.273995,-0.664293,-0.843894,-0.40961,-0.124988,-0.059061,-0.623836,...,-0.757225,-0.209727,-0.02359,-0.298998,0.086695,0.208235,-0.366669,-1.146328,-0.505529,-0.195953
1,,-0.334923,-1.094632,0.038637,0.268676,-0.137981,-0.40961,-0.124988,-0.059061,-0.623836,...,-0.757225,-0.209727,-0.02359,-0.298998,-1.08908,-0.24894,0.849396,0.548408,-0.505529,-0.195953
2,,-0.334923,-1.094632,-1.844229,0.735161,1.097367,-0.40961,-0.124988,-0.059061,-0.623836,...,0.259158,-0.209727,-0.02359,-0.298998,-1.08908,-2.07764,-0.163992,1.56525,1.694665,-0.195953
3,,-0.334923,0.377556,0.744712,0.268676,-0.843894,-0.40961,-0.124988,-0.059061,1.879168,...,1.072264,-0.209727,-0.02359,-0.298998,0.282657,0.208235,0.241363,0.378934,-0.505529,-0.195953
4,,-0.334923,-1.094632,-2.550304,-0.197808,-1.020373,-0.40961,-0.124988,-0.059061,-0.623836,...,-0.757225,-0.209727,-0.02359,-0.298998,-1.08908,-2.306227,0.849396,-0.468434,-0.505529,-0.195953


### Categorizing Y Label (one Hot Encoding)

In [154]:
y1 = np.zeros((y.shape[0], (np.amax(y)+1)))
y1[np.arange(y.shape[0]), y] = 1
y1[:10]

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [155]:
#Train-Test Split
split_i = len(y1) - int(len(y1) // (1 / 0.2))
X_train, X_test = X1[:split_i], X1[split_i:]
y_train, y_test = y1[:split_i], y1[split_i:]

In [158]:
print("X_train Shape: ",X_train.shape) 
print("X_test Shape : ",X_test.shape) 
print("y_train Shape: ",y_train.shape) 
print("y_test Shape : ",y_test.shape)

X_train Shape:  (1438, 64)
X_test Shape :  (359, 64)
y_train Shape:  (1438, 10)
y_test Shape :  (359, 10)


## Implementing Multi Layer Perceptron from Scratch

In [127]:
class MultiLayerPerceptron:
     
    def __init__(self,hidden_layer, epoch, learning_rate, verbose=False):
        self.hidden_layer = hidden_layer
        self.epoch = epoch
        self.learning_rate = learning_rate
        self.verbose = verbose
        
    # Initializing the weights    
    def initial_weights(self, X, y):
        n_sample, n_features = X.shape
        n_output = y.shape[1]
        
        limit_hidden = 1/math.sqrt(n_features)
        self.hiddenWeight = np.random.uniform(-limit_hidden,limit_hidden, (n_features, self.hidden_layer))        
        self.BiasHidden = np.zeros((1,self.hidden_layer))
        
        limit_out = 1/ math.sqrt(self.hidden_layer)
        self.outputWeight = np.random.uniform(-limit_out,limit_out, (self.hidden_layer, n_output))
        self.BiasOutput = np.zeros((1, n_output))
     
    #Sigmoid Function
    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))
    
    #Sigmoid Derivative Function
    def sigmoid_derivative(self, z):
        return self.sigmoid(z) * (1 - self.sigmoid(z))
     
    #SoftMax Function (Output Layer)    
    def softmax(self, z):
        e_x = np.exp(z - np.max(z, axis=-1, keepdims=True))
        return e_x / np.sum(e_x, axis=-1, keepdims=True)
    
    #SoftMax Gradient Function
    def softmax_gradient(self, z):
        return self.softmax(z) * (1 - self.softmax(z))
    
    #Cross-Entropy Loss Function
    def loss(self, h, y):
        h = np.clip(h, 1e-15, 1 - 1e-15)
        return (-y * np.log(h) - (1 - y) * np.log(1 - h))
    
    #Cross-Entropy Loss Gradient Function
    def loss_gradient(self, h, y):
        h = np.clip(h, 1e-15, 1 - 1e-15)
        return -(h/y) + (1-h)/(1-y)
    
    #Accuracy Score Function
    def accuracy_score(self, y_true, y_pred):
        accuracy = np.sum(y_true == y_pred, axis=0) / len(y_true)
        return accuracy
    
    #Prediction Function
    def predict(self, X):
        hidden_input = X.dot(self.hiddenWeight) + self.BiasHidden
        hidden_output = self.sigmoid(hidden_input)
        output_layer_input = hidden_output.dot(self.outputWeight) + self.BiasOutput
        y_pred = self.softmax(output_layer_input)
        return y_pred
    
    #Fit Function
    def fit(self, X, y):
        self.initial_weights(X, y)
        n_epoch = 1
        
        while(n_epoch <= self.epoch):
            
            # Forward Propogation
            #hidden Layer
            hidden_input = X.dot(self.hiddenWeight) + self.BiasHidden
            hidden_output = self.sigmoid(hidden_input)
            #output layer
            output_layer_input = hidden_output.dot(self.outputWeight) + self.BiasOutput
            y_pred = self.softmax(output_layer_input)
            
            #Backward Propogation
            #Output Layer Gradient
            grad_out_input = self.loss_gradient(y, y_pred) * self.softmax_gradient(output_layer_input)
            grad_output = hidden_output.T.dot(grad_out_input)
            grad_biasoutput = np.sum(grad_out_input,axis=0,keepdims=True)
            #Hidden Layer Gradient
            grad_input_out = grad_out_input.dot(self.outputWeight.T) * self.sigmoid_derivative(hidden_input)
            grad_input = X.T.dot(grad_input_out)
            grad_biasinput = np.sum(grad_input_out, axis=0, keepdims=True)
            
            #Updating Weights
            self.outputWeight -= self.learning_rate * grad_output
            self.BiasOutput -= self.learning_rate *grad_biasoutput
            self.hiddenWeight -= self.learning_rate * grad_input
            self.BiasHidden -= self.learning_rate * grad_biasinput
                        
            
            n_epoch += 1
            
       

In [131]:
def main():
    data = load_digits()
    X = data.data
    y = data.target
    
    #Normalize X
    X1 = X.copy()
    X1 = (X1 - X1.mean())/X1.std()
    
    #Categorize Y
    y1 = np.zeros((y.shape[0], (np.amax(y)+1)))
    y1[np.arange(y.shape[0]), y] = 1
    
    #Train-Test Split
    split_i = len(y1) - int(len(y1) // (1 / 0.2))
    X_train, X_test = X1[:split_i], X1[split_i:]
    y_train, y_test = y1[:split_i], y1[split_i:]
    
    
    clf = MultiLayerPerceptron(hidden_layer = 10, epoch=1000, learning_rate=0.01, verbose=True)

    clf.fit(X_train, y_train)
    y_pred = np.argmax(clf.predict(X_test), axis=1)
    y_test = np.argmax(y_test, axis=1)

    accuracy = clf.accuracy_score(y_test, y_pred)
    print ("Accuracy:", accuracy)

In [132]:
if __name__ == "__main__":
    main()

Accuracy: 0.8635097493036211
