# Topic
Option 2: Implement a neural network from scratch.

Write a `NeuralNetwork` class in Python that implements a neural network model with dense layers and ReLU activation. 

This class should contain the following methods:

1. `__init__`: Declares a neural network with a list of integers indicating the size of each layer. 
2. `fit`: Trains the neural network model using the Stochastic Gradient Descent method.
3. `predict`: Calculates the output values for a list of input data.

It is up to you to decide the details, but your implementation cannot use neural network classes from any existing library. Please refer to the slides on neural networks and this online textbook for the formulas and algorithms you should use. After completing the implementation, apply the model on a dataset (for example, the Iris dataset) to see whether it gives reasonable results.

Instructions: All code should be executed, and results should be displayed in the notebook. Writing is as important as coding. Clearly describe every step that you take.

## Libraries importation

In [1]:
import numpy as np # for calculus
from pandas import read_csv  # for data loading
import matplotlib.pyplot as plt # plotting
import os # os system managment

## Description
__Full neural network construction with only math and numpy__

For this short project, I will create a fully connected two layer neural network to predict the numbers associated with the images in the MNIST data set. The network will take in the 784 pixels in an image as input. The first hidden layer will contain 10 neurons with ReLU activation, followed by a 10 neuron output layer with sigmoid activation.

Thus, to make it possible, i'll use a trainning and testing dataset form `kaggle` from __`mnist-in-csv`__.

## Test & Train data loading

In [2]:
script_dir = os.getcwd() # get the current working directory
# define the custom name and extension of train an test dataset
# in our case ext mime is csv.
train_file = "mnist_train.csv"  
test_file = "mnist_test.csv"
# compute the absolute files path
abs_train_file_path = os.path.join(script_dir, train_file)
abs_test_file_path = os.path.join(script_dir, test_file)
# load each fiile
train = read_csv(abs_train_file_path)
test = read_csv(abs_test_file_path)

### Less exploration of data to pull out somes details we said before

> show head data

In [3]:
train.head()


Unnamed: 0,label,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


it's there a data set of 784 pixels of different images showing a number between `0` to `9`. each row content a label which is the number show by the image and 784 pixels's value belong to it.

> show train information

In [9]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60000 entries, 0 to 59999
Columns: 785 entries, label to 28x28
dtypes: int64(785)
memory usage: 359.3 MB


the dataset `don't have missing values or noise values`

## Train and Test data pretreatment
in the goal of building the NN, we need to get each pixel to image as input of our first or entry layer of our NN, thus we will change the type data from `DataFrame` to `ndarray`, next transpose each dataset as modeling to be trust by the model nn. 

In [10]:
# dtype changing
train = np.array(train)
test = np.array(test)
# split train label and features before transpose the array
X_train = train[:, 1:].T
Y_train = train[:, 0].T
# split test label and features before transpose the array
X_test = test[:, 1:].T
Y_test = test[:, 0].T
# normalize each feature value to involve image computation
X_train = X_train / 255.
X_test = X_test / 255.


Our NN will have a simple two-layer architecture. Input layer $a^{[0]}$ will have 784 units corresponding to the 784 pixels in each 28x28 input image. A hidden layer $a^{[1]}$ will have 10 units with ReLU activation, and finally our output layer $a^{[2]}$ will have 10 units corresponding to the ten digit classes with softmax activation.

**Forward propagation**

$$Z^{[1]} = W^{[1]} X + b^{[1]}$$
$$A^{[1]} = g_{\text{ReLU}}(Z^{[1]}))$$
$$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$$
$$A^{[2]} = g_{\text{softmax}}(Z^{[2]})$$

**Backward propagation**

$$dZ^{[2]} = A^{[2]} - Y$$
$$dW^{[2]} = \frac{1}{m} dZ^{[2]} A^{[1]T}$$
$$dB^{[2]} = \frac{1}{m} \Sigma {dZ^{[2]}}$$
$$dZ^{[1]} = W^{[2]T} dZ^{[2]} .* g^{[1]\prime} (z^{[1]})$$
$$dW^{[1]} = \frac{1}{m} dZ^{[1]} A^{[0]T}$$
$$dB^{[1]} = \frac{1}{m} \Sigma {dZ^{[1]}}$$

**Parameter updates**

$$W^{[2]} := W^{[2]} - \alpha dW^{[2]}$$
$$b^{[2]} := b^{[2]} - \alpha db^{[2]}$$
$$W^{[1]} := W^{[1]} - \alpha dW^{[1]}$$
$$b^{[1]} := b^{[1]} - \alpha db^{[1]}$$

**Vars and shapes**

Forward prop

- $A^{[0]} = X$: 784 x m
- $Z^{[1]} \sim A^{[1]}$: 10 x m
- $W^{[1]}$: 10 x 784 (as $W^{[1]} A^{[0]} \sim Z^{[1]}$)
- $B^{[1]}$: 10 x 1
- $Z^{[2]} \sim A^{[2]}$: 10 x m
- $W^{[1]}$: 10 x 10 (as $W^{[2]} A^{[1]} \sim Z^{[2]}$)
- $B^{[2]}$: 10 x 1

Backprop

- $dZ^{[2]}$: 10 x m ($~A^{[2]}$)
- $dW^{[2]}$: 10 x 10
- $dB^{[2]}$: 10 x 1
- $dZ^{[1]}$: 10 x m ($~A^{[1]}$)
- $dW^{[1]}$: 10 x 10
- $dB^{[1]}$: 10 x 1

In [None]:
class Activation:
    def __init__(self, name):
        self.name = name
    def __str__(self):
        return "{} activation function".format(self.name)
        
class ReLU(Activation):
    def __init__(self):
        super().__init__(self,"Relu")
    def __call__(self, Z):
        """
        ReLU method
        
        Args:
            Z (ndarray): there is a computed features
            
        Return:
            Z (ndarray) max between Zi and 0
        """
        return np.maximum(Z,0)
    
    def derivative_ReLU(self, Z):
        """
        derivative_ReLU method
        
        Args:
            Z (ndarray): there is a computed features
            
        Return:
            Z (ndarray) with true if zi > 0 and false else
        """
        return Z > 0
    
class SoftMax(Activation):
    def __init__(self):
        super().__init__(self,"SoftMax")
    def __call__(self, Z):
        """
        softmax method
        
        Args:
            Z (ndarray): there is a computed features
            
        Return:
            (ndarray) computed softmax values for each sets of scores in Z.
        """
        exp = np.exp(Z - np.max(Z))
        return exp / exp.sum(axis=0)


In [None]:
class Layer():
    def __init__(self, name):
        self.name = name

class Affine(Layer):
    def __init__(self, nin, nout, activation):
        super().__init__(self,"Affine Layer")
        self.input_size = nin
        self.hidden_nodes = nout
        self.W = np.random.normal(size=(nout, nin)) * np.sqrt(1./(nin))
        self.b = np.random.normal(size=(nout, 1)) * np.sqrt(1./nout)
        self.activator = activation

    def forward_propagation(self,X):
        Z1 = self.W.dot(X) + self.b  # nout, nin
        A1 = self.activator(Z1)  # nout, nin
        return Z1, A1
        
    def update_params(self, alpha, dW, db):
        self.W -= alpha * dW
        self.b -= alpha * np.reshape(db, (self.hidden_nodes, 1))


In [None]:
class Optimizer():
    def __init__(self, name):
        self.name = name


class GradientDescent(Optimizer):
    def __init__(self, alpha, iterations):
        super().__init__(self, "Gradient Descent Optimizer")
        self.alpha = alpha
        self.iterations = iterations

    def __call__(X, Y, layers, fprop, bprop, uparams):
        W1, b1, W2, b2 = init_params(size)
        for i in range(iterations):
            Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
            dW1, db1, dW2, db2 = backward_propagation(X, Y, A1, A2, W2, Z1, m)

            W1, b1, W2, b2 = update_params(alpha, W1, b1, W2, b2, dW1, db1, dW2, db2)   

            if (i+1) % int(iterations/10) == 0:
                print(f"Iteration: {i+1} / {iterations}")
                prediction = get_predictions(A2)
                print(f'Training accuracy: {get_accuracy(prediction, Y):.3%}')
        return W1, b1, W2, b2


In [24]:
class mnist_nn:
    """ 
    mnist neural network build from scratch within numpy, pandas and matplotlib
    
    Args:
        features (ndarray): the features content all examples features values based for train ....
        labels (ndarray): the labels content all real labels of all examples features...
        alpha (Numeric): the alpha is an update coeficient...
        iterations (Numeric): the iterations is the number of epochs of training stage...
        
    Attributes:
        features (ndarray): where store the features content all examples features values based for train ....
        labels (ndarray): where store the labels content all real labels of all examples features...
        alpha (Numeric): where store the alpha is an update coeficient...
        iterations (Numeric): where store the iterations is the number of epochs of training stage...
        w1 (ndarray): the normally randomly initialized weights from the input layer to the first hidden layer...
        b1 (ndarray): the biais from the input layers to the first hidden layer...
        w2 (ndarray): the normally randomly initialized weights from the first hidden layer to the second hidden layer...
        b2 (ndarray): the biais from the first hidden layers to the second hidden layer...
        
    """
    def __init__(self, features, labels, alpha, iterations):
        ''' Return an instance of mnist_nn with all attributes and methods '''
        self.layers = []

    def forward_propagation(X):
        fvar = []
        for i in range(len(self.layers)):
            if i == 0:
                Z, A = layers[i].forward_propagation(X)
                fvar.append({'Z':Z,'A':A})
            else:
                Z, A = layers[i].forward_propagation(fvar[i-1]['A'])
                fvar.append({'Z': Z, 'A': A})
        ''' Z1 = W1.dot(X) + b1 #10, m
        A1 = ReLU(Z1) # 10,m
        Z2 = W2.dot(A1) + b2 #10,m
        A2 = softmax(Z2) #10,m '''
        return fvar

    def one_hot(Y):
        ''' return an 0 vector with 1 only in the position correspondind to the value in Y'''
        one_hot_Y = np.zeros((Y.max()+1,Y.size))
        one_hot_Y[Y,np.arange(Y.size)] = 1
        return one_hot_Y

    def backward_propagation(X, Y, fvar):
        one_hot_Y = one_hot(Y)
        pvar = []
        for i in range(len(self.layers)-1,-1,-1):
            if i == len(self.layers)-1:
                dZ = 2*(fvar[i]['A'] - one_hot_Y) #10,m
                dW = 1/m * (dZ2.dot(fvar[i-1]['A'].T)) # 10 , 10
                db = 1/m * np.sum(dZ2,1) # 10, 1
                pvar.append({'dZ':dZ,'dW':dW,'db':db})
            elif i == 0:
                dZ1 = W2.T.dot(dZ2)*derivative_ReLU(Z1)  # 10, m
                dW1 = 1/m * (dZ1.dot(X.T))  # 10, 784
                db1 = 1/m * np.sum(dZ1, 1)  # 10, 1
            else:
                
        dZ2 = 2*(A2 - one_hot_Y) #10,m
        dW2 = 1/m * (dZ2.dot(A1.T)) # 10 , 10
        db2 = 1/m * np.sum(dZ2,1) # 10, 1
        dZ1 = W2.T.dot(dZ2)*derivative_ReLU(Z1) # 10, m
        dW1 = 1/m * (dZ1.dot(X.T)) #10, 784
        db1 = 1/m * np.sum(dZ1,1) # 10, 1

        return dW1, db1, dW2, db2

    def update_params(alpha, W1, b1, W2, b2, dW1, db1, dW2, db2):
        W1 -= alpha * dW1
        b1 -= alpha * np.reshape(db1, (10,1))
        W2 -= alpha * dW2
        b2 -= alpha * np.reshape(db2, (10,1))

        return W1, b1, W2, b2
        

In [25]:
help(mnist_nn)


Help on class mnist_nn in module __main__:

class mnist_nn(builtins.object)
 |  mnist_nn(features, labels, alpha, iterations)
 |  
 |  mnist neural network build from scratch within numpy, pandas and matplotlib
 |  
 |  Args:
 |      features (ndarray): the features content all examples features values based for train ....
 |      labels (ndarray): the labels content all real labels of all examples features...
 |      alpha (Numeric): the alpha is an update coeficient...
 |      iterations (Numeric): the iterations is the number of epochs of training stage...
 |      
 |  Attributes:
 |      features (ndarray): where store the features content all examples features values based for train ....
 |      labels (ndarray): where store the labels content all real labels of all examples features...
 |      alpha (Numeric): where store the alpha is an update coeficient...
 |      iterations (Numeric): where store the iterations is the number of epochs of training stage...
 |      w1 (ndarray):