In [5]:
import numpy as np

# Create a Simple Neural Network in Python

The best way to understand how neural networks work is to learn how to build one from scratch without using any library.

So, what is a Neural network?
The neural network is an algorithm inspired from the structure of the neurons inside a human brain. So, it is basically composed of neurons, connected together by synapses. If there is sufficient synaptic input to a neuron, then that neuron will  be activated. We call this process “thinking”.

![neuron.PNG](attachment:neuron.PNG)

We can model this process by creating a neural network on a computer. The diagram below shows the architecture of a 2-layer Neural Network.


![NN_archi.png](attachment:NN_archi.png)

Neural Networks consist of the following components:
- An input layer, x
- An arbitrary amount of hidden layers (1 in exemple)
- An output layer, ŷ
- A set of weights and biases between each layer, W and b
- A choice of activation function for each hidden layer, σ. In this case, we will use a Sigmoid activation function.

## Question 1:

We’re going to train the neural network to solve the problem below. The first four examples are called a training set. Can you work out the pattern? Should the answer be 0 or 1? Give an explanation of your answer.

| Input | Output   |
|------|------|
|   0 0 1  | 0|
|   1 1 1  | 1|
|   1 0 1  | 1|
|   0 1 1  | 0|
|   1 0 0  | ?|

## Answer 1:

Predicted output: 1

There x1 seems to be heavily weighted, so regardless of the values of x2 and x3, the output is determined by x1

In order to teach our neuron we will give each input a weight, which can be a positive or negative number. An input with a large positive weight or a large negative weight, will have a strong effect on the neuron’s output. Before we start, we set each weight to a random number. Then we begin the training process:

1. Take the inputs from a training set example, adjust them by the weights, and pass them through a special formula to calculate the neuron’s output. Also known as $\textbf{feedforward}$.
2. Calculate the error, which is the difference between the neuron’s output and the desired output in the training set example. This is called the $\textbf{loss function}$.
3. Depending on the direction of the error, adjust the weights slightly. Also known as $\textbf{backpropagation}$.
4. Repeat this process a fixed number of iteration times.

Let's start with the first point: 
- Take the weighted sum of the neuron’s inputs, which is:
$weight_1 \cdot input_1 + weight_2 \cdot input_2 + weight_3 \cdot input_3 = \sum weight_i \cdot input_i  $. (1)

Next we normalise this, so the result is between 0 and 1. In order to do this, we will use a mathematically convenient function, called the Sigmoid function which is an normalization function shaped as a S. This function can map any value to a value from 0 to 1. It will assist us to normalize the weighted sum of the inputs.

![sigmoid-function-1.png](attachment:sigmoid-function-1.png)

It's mathematical formulation is : $\frac{1}{1+e^{-x}}$. (2)

So by substituting the first equation into the second, the final formula for the output of the neuron is:

Output of neuron $= \frac{1}{1+e^{-\sum weight_i \cdot input_i}}$

You now have all the information to write the code for the first step

## Question 1:

Represent the training set shown earlier and the numbers should be stored like:
$
\begin{bmatrix}
 0  &0  &1 \\ 
 1  &1  &1 \\ 
 1  &0  &1 \\ 
 0  &1  &1 
\end{bmatrix}\begin{bmatrix}
0\\ 
1\\ 
1\\ 
0
\end{bmatrix}$

## Answer 2:

In [6]:
import numpy as np

# X = {'x1':[0,1,1,0], 'x2':[0,1,0,1], 'x3':[1,1,1,1]}
X = np.array([[0,0,1], [1,1,1], [1,0,1], [0,1,1]])
y = np.array([[0],[1],[1],[0]])

Define the function initializing the initial random weights:

Consider an L layer neural network, which has L-1 hidden layers and 1 input and output layer each. The parameters (weights and biases) for layer l are represented as

![1*M3Ja24g0cK22gVqG6HFQbw.png](attachment:1*M3Ja24g0cK22gVqG6HFQbw.png)

General ways to make it initialize better weights:

a) If you’re using ReLu activation function in the deep nets (I’m talking about the hidden layer’s output activation function) then:

    Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1.
    Multiply that sample with the square root of (2/ni). Where ni is number of input units for that layer.

b) Likewise if you’re using Tanh activation function :

    Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1.
    Multiply that sample with the square root of (1/ni). Where ni is number of input units for that layer.

So what is this Xavier’s initialization?

Only major difference in Xavier’s initialization is the output no term. We add the number of output units for that layer.

For Tanh:

    Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1.
    Multiply that sample with the square root of (1/(ni+no)). Where ni is number of input units, no is the number of output units for that layer respectively.
    
https://hackernoon.com/how-to-initialize-weights-in-a-neural-net-so-it-performs-well-3e9302d4490f

In [67]:
def weight(X, y):
    return np.random.rand(X.shape[1])

Define the function of the sigmoid:

![Screenshot%20from%202020-03-05%2009-58-50.png](attachment:Screenshot%20from%202020-03-05%2009-58-50.png)

In [8]:
def sigmoid(weights, X):
    sigma = 0
    for i in range(row):
        sigma += np.dot(weights.T, X[i])
    return 1 / (1 + np.exp(-sigma))

Define the function calculating the output of the neuron:

Output of neuron $= \frac{1}{1+e^{-\sum weight_i \cdot input_i}}$

In [9]:
def output(X, weights, row):
    return (1 / (1 + np.exp(-np.dot(weights.T, X[row]))))

Now that we coded the elements necessary for the first step let's go to the second step, the loss function. In the case of this very simple neural network, the error is simply the difference between the desired output and the predicted output: 

In [26]:
def error(X, y, weights, row):
    return y[row] - output(X,weights, row)

In [34]:
def loss_function(X, y, weights, row):
    return (y[row] - output(weights, X, row))

3. Adjusting the weights

During the training cycle, we adjust the weights. Now that we’ve measured the error of our prediction (loss), we need to find a way to propagate the error back, and to update our weights and biases. But how much do we adjust the weights by? 

In order to know the appropriate amount to adjust the weights and biases by, we need to know the derivative of the loss function with respect to the weights and biases.

![gradient_sigmoid.png](attachment:gradient_sigmoid.png)

The calculus is based on the gradient of the Sigmoid curve. To understand this lets look at the previous figure.

We used the sigmoid curve to calculate the output of the neuron. If the output is a large positive or negative number, it signifies the neuron was quite confident one way or another. If the neuron is confident that the existing weight is correct, it doesn’t want to adjust it very much. Multiplying by the sigmoid curve gradient achieves this.

The gradient of the sigmoid curve, can be found by taking the derivative of the sigmoid function:

SigmoidGradient = $output \cdot (1-output)$

Next we want to make the adjustment proportional to the size of the error and decide if the weights should be adjusted or not according to the inputs.

Give the final formula for updating the weights:

In [55]:
def sigmoid_gradient(X, weights, row):
    return np.dot(output(X,weights, row),(1-output(X,weights, row)))

In [56]:
sigmoid_gradient(X, weights,0)

0.23997533548779773

In [52]:
X

array([[0, 0, 1],
       [1, 1, 1],
       [1, 0, 1],
       [0, 1, 1]])

In [48]:
1-output(X,weights,0)

array([0.39987675])

Weights adjustement = $ error \cdot input \cdot output \cdot (1-output)$

In [57]:
def weight_adjustment(X, y, weights, row):
    err = error(X, y, weights, row)
    sig_grad = sigmoid_gradient(X, weights, row)
    return (err * X[row] * sig_grad)

Write here the function permitting to train the neuron according to the process we just described:

In [63]:
def train_neuron(X, y, n_iter):
    weights = weight(X, y)
    print('='*80)
    print('Starting synaptic weights:\n', weights)
    for i in range(n_iter):
        for row in range(len(X)):
            weights += weight_adjustment(X,y, weights, row)
    print('='*80)
    print('New synaptic weights:', weights)
    y_pred = []
    for row in range(len(X)):
        y_pred.append(output(X,weights, row))
    return np.array(y_pred)

Now that we have seen all the steps separately put them all together and write a program doing the following points:
- use the exemple to train the neural network over 10000 iterations.
- print the random starting synaptic weights.
- print the new synaptic weights after training.
- what would be the prediction for the 1,0,0 exemple?

In [136]:
train_neuron(X, y,  1000)

Starting synaptic weights:
 [0.06778747 0.89598879 0.45384793]
New synaptic weights: [ 7.26321138 -0.21815757 -3.41591832]


array([0.03180166, 0.974147  , 0.97910836, 0.02572887])

In [212]:
class Smartest_neuron_ever():
    def __init__(self, n_iter=1000):
        """Init function"""
        self.n_iter = n_iter

    def _init_weight(self,X):
        return np.random.rand(X.shape[1])
    
    def _output_layer(self, X):
        return (1 / (1 + np.exp(-np.dot(self.weights.T, X[self.row]))))
    
    def _error(self, X, y):
        return (y[self.row] - self._output_layer(X))
    
    def _sigmoid_gradient(self, X,):
        return np.dot(self._output_layer(X),(1-self._output_layer(X)))
    
    def _weight_adjustment(self, X, y):
        err = self._error(X, y)
        sig_grad = self._sigmoid_gradient(X)
        return (err * X[self.row] * sig_grad)

    def _training_neuron(self,X,y):
        self.weights = self._init_weight(X)
        print('='*80)
        print('Starting synaptic weights:\n', self.weights)
        for i in range(self.n_iter):
            for self.row in range(len(X)):
                self.weights += self._weight_adjustment(X,y)
        print('='*80)
        print('New synaptic weights, after ',self.n_iter,' iterations:\n', self.weights)
        self.y_pred = []
        for self.row in range(len(X)):
            self.y_pred.append(self._output_layer(X))
    
    def fit(self, X, y):
        assert isinstance(X, np.ndarray), 'X must be a numpy array'
        self._training_neuron(X,y)

    def predict(self, X):
        '''Use previously trained kmean to predict the classes for X'''
        try:
            self.weights
        except AttributeError:
            raise Exception('can not predict using an unfitted neuron network')
        
        self.y_pred=np.zeros(len(X))
        for row in range(len(X)):       
            if 1 / (1 + np.exp(-np.dot(self.weights.T, X[row])))>0.5:
                self.y_pred[row]=1
            else:
                 self.y_pred[row]=0
        return self.y_pred
            

In [215]:
SNE= Smartest_neuron_ever(n_iter=10)

In [216]:
SNE.fit(X,y)

Starting synaptic weights:
 [0.95779228 0.42385394 0.55382758]
New synaptic weights, after  10  iterations:
 [ 1.9029599  -0.17715648 -0.68862557]


In [221]:
X_test=np.array([[1,1,0]])

In [222]:
SNE.predict(X_test)

array([1.])