##  Neural Networks Definition 


### But what is a Neural Network anyway?

For a dataset of dosages , ( low , medium and high), we can't fit a straight line to make an accurate prediction

![alt](https://i.imgur.com/TcuaKgc.png)

A Neural network can effectively fit a squiggle for the data, even for complicated data.

### Definition 
  

- When you build a NN you have to decide what activation function you may use ( for learning : sigmoid , in practice : ReLU)
- When you build a NN you make a guess on how many hidden layers and input you will use. 

Let's use this as an example of a NN :

![alt](https://miro.medium.com/max/875/1*dypctO_eJnrXqX6JUa5MDA.jpeg)

Where a Hidden Layer can be defined as:

- $ H_i =  W_{ij} * I_{i}  + B$

'W' represents the weight and is every connection between the input layer and the hidden layer

### Perceptron  

A [Perceptron](https://en.wikipedia.org/wiki/Perceptron) can be defined as single layer NN , it's based on an artificial neuron, it can take several binary inputs $x_1,x_2,...,$ and produce a single binary output

![img](http://neuralnetworksanddeeplearning.com/images/tikz0.png)

For a Hidden node , the math processed is a 'weighted sum' , like so:

![alt](https://i.imgur.com/hur6mHW.png)

# NeuralNetwork code and testing

Let's code a NN from scratch !

In [1]:
import numpy as np

## The weights , the heart of a NN

The weights are one of the most important parameters of a NN, they are basically the link between the different layers of the NN , and they help feeding the error , calculating the prediction , and it's basically what helps improving the NN , the weight link can be represented as matrices , so we will use that to implement them.

Basically :

- A matrix for the weights for links between the input and hidden layers $W_{input \ hiden}$ , of $ dim = (​
 input\_nodes​ \ x\  hidden\_nodes ) $ 

- A matrix for the weights for links between the hidden and  the output layers $W_{hidden\_output}$ , of $dim = (​
 output\_nodes ​  \ x\ hidden\_nodes )$ 

The initial values of the link weights should be small and
random , unlike in logistic regression where we can simply set the initial weights as 0 , in a NN this will NOT work (if you initial the hidden units as 0 , then all of them will be symmetric matrices)



$dw=\begin{bmatrix}
u & v \\
u & v 
\end{bmatrix}$

Since both hidden units (eg: $a_1$ and $a_2$ ) will have the same influence on the output layer , after one iteration or many , both will remain symmetric , and no mather how many times you run it will compute the same function. 


Neural networks learn by refining their link weights. This is guided by the error​: the difference between the right answer given by the training data and their actual output.

## Random Initialization

To avoid ending up in extreme parts of the correspoint function, it would be helpful to set $W$ as small numbers 

In [2]:
np.random.random((1,3)) - 0.5 # define also negative weights


array([[ 0.4472479 , -0.00144761,  0.21367992]])

●
Gradient descent​ is a really good way of working out the minimum of a function, and it really works well when that function is so complex and difficult that we couldn’t easily
work it out mathematically using algebra.

Defining the error as a matrix:

$$ error_{hidden} = W^T_{hidden\_output} * error_{output} $$

$$ y = mx + b $$

We want $m$ to change based on the error :

$$ \Delta m = \alpha x* error $$ 
$$ \Delta B = \alpha  error $$ 

## Back propagation  with gradient descent

Gradient Descent is defined as a " first-order iterative optimization algorithm for finding a local minimum of a differentiable function", that is , in more human terms , a way of finding the steepest descent to a minimum .


In the case of Neural Networks , we are trying to minimize the NN error, defining the weight :

$$ \frac{\partial E}{\partial W_{jk}} $$

$$ \Delta W_{jk} = \alpha E_k (O_k) + \sigma(O_k) (1-\sigma(O_k)) (O_j)^T $$

This will help us find the change in error as the weight links change.

 Improving a neural network means reducing this error ­ by changing those weights

●
 Choosing the right weights directly is too difficult. An alternative approach is to
iteratively improve the weights by descending the <b> error </b> function, taking small steps.
Each step is taken in the direction of the greatest downward slope from your current
position. This is called ​ g
radient descent​
 .

A good recommendation is to rescale inputs into the range 0.0 to 1.0. Some will add a small
offset to the inputs, like 0.01, just to avoid having zero inputs which are troublesome because
they kill the learning ability by zeroing the weight update expression by setting that ​ oj=0.

- We should try to avoid saturating a NN keeping the inputs small

How to actually propagate weights ?

In [1]:

from scipy.special import expit  # expit reprents the sigmoid function

class neuralNetwork:
    #constructor body
    def __init__(self,input_nodes,hidden_nodes,output_nodes,alpha):
        # defining a sigmoid or activation only once so that it can be referenced several times
        self.sigmoid = lambda x: expit(x)

        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes
        self.alpha = alpha


        # weights 
        self.W_ih = np.matrix(np.random.random((self.hidden_nodes,self.input_nodes))) - 0.5
        self.W_ho = np.matrix(np.random.random((self.output_nodes,self.hidden_nodes))) - 0.5

        pass 




        
    def train(self,inputs_list,target_variable): # target_variable represent 'y' or the variable you want to predict

        outputs = self.feedforward(inputs_list)

        target = np.array(target_variable, ndmin=2).T


        # Find the error e = (t - o) - matrix subtraction 
        output_errors = target - outputs

        # getting the hidden error ( W.T * e) - weights hidden - outputs
        W_ho_T= (self.W_ho).T # transpose the weight matrix
        
        
        # find hidden layer error

        hidden_errors = np.dot(W_ho_T,output_errors)
        
        # NOTE : FEED FORWARD already mapped the ouputs to the sigmoid function
        # implementing the delta weight change between the hidden layer and the output layer
        
        print(outputs.shape)
                
        self.W_ho += self.alpha * np.dot((output_errors * outputs * (1.0 - outputs)), np.transpose(hidden_outputs))




        pass





    # receive inputs -> generate hidden ! outputs
    # feeding the information forward to the NN
    def feedforward(self,inputs_list):

        # before predicting we need to turn the input into a 2D
        inputs = np.array(inputs_list, ndmin=2).T
        # after that , we do a matrix multiplication of the inputs with the weights 
        # TODO add Bias
        hidden_inputs = np.dot(self.W_ih,inputs)
        # calculate the sigmoid function
        hidden_outputs = self.sigmoid(hidden_inputs)
        
        # take the hidden outputs and do the matrix product between the hidden layer and the output layer

        final_inputs = np.dot(self.W_ho,hidden_outputs)
        
        outputs = self.sigmoid(final_inputs)

        
        return outputs



# number of input, hidden and output nodes
# input_nodes =2
# hidden_nodes = 2
# output_nodes = 1
# create instance of neural network

#



# class instance of the neural network       

In [96]:
n = neuralNetwork(2,2,2,0.5)

input_nodes = [1,0]
targets = [1,0]
n.train(input_nodes,targets)

(2, 1)


ValueError: shapes (2,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)

In [None]:
np.diff

<function numpy.diff(a, n=1, axis=-1, prepend=<no value>, append=<no value>)>

## Training to XOR OR OR 

In [None]:
inputs = np.array(np.random.random((1,3)) , ndmin=2).T
inputs

array([[0.39163834],
       [0.52415167],
       [0.50979665]])