<a href="https://colab.research.google.com/github/Phoebe0222/deep-learning/blob/master/lesson%201/Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building Neural Network from scratch

Neural Network has been a facsinating deep learning topic to me. As a starting point, I've decided to follow [James Loy's guide](https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6) to build a very simple Neural Network from scratch. This notebook is to keep track of my learning process. 






### The math behind it
Following Jame's guide and also the help of [the Element of Statistical Learning](https://web.stanford.edu/~hastie/Papers/ESLII.pdf), I've sorted my notes. A Neural Network is a mimic of human brain with each unit being a neuron and each connection eing a synapse, but it can also be regarded as a mathematical function that maps a **input** (x) to a desired **output** (y hat). 

How the function maps is as the following: 

*   Suppose input x has p components, output y has k classes
*   First create **derived features** by creating linear combinations of the onputs, 
\begin{equation}  
z_m = \sigma(w_mx+b_m), for\space m =1,2,...M
\end{equation}
where w is a p-vector unknown parameter called  **weights**, b is known as **bias** and σ is knowns as **activation function**
*   Then for each class, the output is calculated as the linear combination of the derived features
\begin{equation}  
T_k = \beta_k^TZ+\beta_{0k}, for\space k =1,2,...K
\end{equation}
*   Finally the output function allows for the fianl transformation of these output vectors,
\begin{equation}  
\hat{y}=f_k(T), for \space k =1,2,...K
\end{equation}


The function would then have an arbbitrary amount of hidden layers, with a set of weights and bias for each layer. The following graph shows the architecture of a multi-layer Neural Network. ![alt text](https://docs.google.com/drawings/d/e/2PACX-1vRuEUt_8_bWMT0KUyRoZwflsA405mUXxqR4g423L-d0YI84W0DCJYT9xLC-gkkjw2Gc2VNnlawT5U0q/pub?w=960&h=720)


Choosing the right values for the weights and biases is important to the accuracy of the prediction. Naturally we want to find the values that minimise the loss function (e.g. square loss, cross-entropy). The process of finding these values from input data is also known as **training** the Neural Network. 


We can do this with many iterations, and in each iteration we update the weights and biases to produce better model. Therefore, each iteration can be broken down into 2 steps: step 1 is to calculate the perdicted output (known as **feedforward**) and step 2 is to update the weights and biases to minimise the loss function by gradient descent (known as **backpropagation**). 








Following Jame's expample, I'll use Sigmoid activation function (1/(1+exp(-v))) to build a 2-layer Neural Network.


### Step 1: creating a Neural Network class


In [0]:
import numpy as np
class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1],4) 
        self.weights2   = np.random.rand(4,1)                 
        self.y          = y
        self.output     = np.zeros(y.shape)

### Step 2: training the Neural Network

feedforward: 


In [0]:
def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1)) #activation function
        self.output = sigmoid(np.dot(self.layer1, self.weights2))
      

backpropagation: 


In [0]:
 def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function
        self.weights1 += d_weights1
        self.weights2 += d_weights2

### Step 3: putting all together with test data

In [0]:
import numpy as np

def sigmoid(x):
    return 1.0/(1+ np.exp(-x))

def sigmoid_derivative(x):
    return x * (1.0 - x)
  
  
class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1],4) 
        self.weights2   = np.random.rand(4,1)                 
        self.y          = y
        self.output     = np.zeros(y.shape)
        
    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1)) #activation function
        self.output = sigmoid(np.dot(self.layer1, self.weights2))
  
    def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function
        self.weights1 += d_weights1
        self.weights2 += d_weights2
        
        
if __name__ == "__main__":
    X = np.array([[0,0,1],
                  [0,1,1],
                  [1,0,1],
                  [1,1,1]])
    y = np.array([[0],[1],[1],[0]])
    nn = NeuralNetwork(X,y)

    for i in range(1500):
        nn.feedforward()
        nn.backprop()

    print(nn.output)


[[0.02266983]
 [0.97714259]
 [0.98481312]
 [0.01994611]]
