# Neural Networks Background

## Biological Neuron vs Artificial Neuron a.k.a Perceptron


<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/neuron_and_artif_neuron.png" alt="Biological Neuron and Artificial Neuron" />
</td>

</tr>
</table>

<table width="400" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/Formula.png" />
</td>

</tr>
</table>

<ul>
  
  <li>In the basic neuron model, the dendrites carry the signal to the cell body where they all get summed <b>(weighted Sum)</b>. If the final sum is above a certain threshold, the neuron can fire, sending a spike along its axon</li>
  <li>Each neuron performs a <b>dot product</b> of the input and its weights, adds the bias and applies the <b>non-linearity (i.e activation function)</b>, in this case the sigmoid</li>
  <li>The idea is that the synaptic strengths (the weights w) are learnable and <b>control the strength of influence</b> (and its direction: excitory (positive weight) or inhibitory (negative weight)) of one neuron on another. 
</li>
<li>The <b>firing rate</b> of the neuron is modeled by with an activation function f, which represents the frequency of the spikes along the axon</li>
  
</ul>

<i>Source: http://cs231n.github.io/neural-networks-1/</i>

### NAND gate using One Perceptron

Perceptrons can be used is to compute the elementary logical functions i.e underlying computation functions such as AND, OR, and NAND. For example, suppose we have a perceptron with two inputs, each with weight −2, and an overall bias of 3. 

<img src="images/nand.png" />

<ul>
<li>Input 00 produces output 1, since (−2)∗0+(−2)∗0+3=3(−2)∗0+(−2)∗0+3=3 is <b>positive</b>. </li>
<li>Inputs 01 and 10 produce output 1. </li>
<li>But the input 11 produces output 0, since (−2)∗1+(−2)∗1+3=−1(−2)∗1+(−2)∗1+3=−1 is <b>negative</b>. </li>
<li>And so our perceptron implements a <b>NAND gate</b>!</li>
</ul>

<i>Source: http://neuralnetworksanddeeplearning.com/chap1.html</i>

#### Vectorized Form 
The input data and output lables can be represented in vectorized form. eg, (W.T) . X + b where W and X are the weights and inputs in a vector form. 


#### Implementation in Numpy

In [1]:
import numpy as np

#X = [x1,x2] = 
X = np.array([1,1])
#W = [-2,-2]
W = np.array([-2,-2])
#bias
b = 3
#output 
y = ((np.dot(W.T,X) + b) > 0)

print y

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-1-824e97168206>, line 12)

##  Logistic Regression and Activation Function

<table width="600" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/sigmoid_vs_linear.png" alt="Biological Neuron and Artificial Neuron" />
</td>
</tr>
</table>


The “classic” application of logistic regression model is binary classification. 

However, we can also use “flavors” of logistic to tackle multi-class classification problems. 

It is useful if we are working with a dataset where the classes are more or less “linearly separable.”


<ul>
<li>The logistic regression formula is <b>derived from the standard linear equation for a straight line</b> that is y=mx + b
</li>
<li>Using the <b>Sigmoid function (a.k.a Activation Function)</b>, the standard linear formula is transformed to the logistic regression formula
</li>
<li>This logistic regression function is useful for <b>predicting the class</b> i.e the output has disceret / <b>catagorical values</b>. (versus linear regression where output is a continious variable)
</li>
<li>The Activation Function therefore <b>adds a non-linearity</b> to the linear function. 
</li>
</ul>
 
 #### Common Activation Functions
<table width="700" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/activation.png" alt="" />
</td>
</tr>
</table>


## Neural Network (i.e Multi-Layer Perceptron)


### Feedforward Network

Neural networks where the output from one layer is used as input to the next layer are called feedforward neural networks. This means there are no loops in the network - information is always fed forward, never fed back. 

<i>Source : https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/</i>

### XOR function using multiple perceptrons

Tha  XOR implementation with multiple neurons is a simple neural network with input layer of 2, one hidden layer with 2 neurons and 1 output layer

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/xor.png" />
</td>
<td align="center" valign="center">
<img src="images/xor_with_perceptrons.png" />
</td>
</tr>
</table>

A "single-layer" perceptron can't implement XOR. The reason is because the classes in XOR are not linearly separable. You cannot draw a straight line to separate the points (0,0),(1,1) from the points (0,1),(1,0).


Source: http://toritris.weebly.com/perceptron-2-logical-operations.html






### Feedfoward Implementation of XOR

In a feedforward implementation of XOR, the weights and biases have been precomupted to give the right output i.e Zero error. The solution we described to the XOR problem is at a global minimum of the loss function. A gradient-based optimization algorithm can ﬁnd parameters that produce very little error. So gradient descent would have converged to this point.

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/my_xor_nn_.png" alt="Drawing" style="width: 800px;"/>

</td>

</tr>
</table>



In [3]:
import numpy as np

def relu(x): return x*(x > 0)

X = np.array([[0,0], [0,1], [1,0], [1,1]]) ##input 
y = np.array([ [0],   [1],   [1],   [0]])  ##expected output

Wh = np.array([[1,1],[1,1]]) #weights for hidden layer
Wz = np.array([1,-2]) #weights for output (z) layer 
bias_c = np.array([0,-1]) #bias terms for hidden layer


XWh = np.dot(X,Wh.T) + bias_c #output of hidden layer before activation
activated_XWh = relu(XWh) #
yHat = (np.dot(activated_XWh, Wz.T))
Z = relu(yHat)

print (Z)


[0 1 1 0]


### Multilayer Feedforward network

This shows a neural network with one input layer, two hidden layers and one ouput layer.
One hidden layer has 4 elements and other hidden layer has 3 hidden elements.

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/multilayerff2.png" alt="Drawing" style="width: 800px;"/>

</td>

</tr>
</table>

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/activation1.png" alt="Drawing" style="width: 800px;"/>

</td>

</tr>
</table>

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/activation4.png" alt="Drawing" style="width: 800px;"/>

</td>

</tr>
</table>

Source: https://www.coursera.org/learn/neural-networks-deep-learning/lecture/tyAGh/computing-a-neural-networks-output

### Cost function

Cost function is usually calculated after running one epoch (single pass of entire training set). Cost function is used to calculate the gradients which is fundamental of back propagation. Cost function depends on the weights and biases.


<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/error.png" alt="Drawing" style="width: 600px;"/>

</td>

</tr>
</table>

Log loss graph

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/logistic_loss.png" alt="Drawing" style="width: 400px;"/>

</td>

</tr>
</table>

Source :http://www.kaiyin.co.vu/2014/04/logistic-regression-with-gradient_7.html

### Backprop Implementation of XOR

The code below implements  foward propgation and backward propagation (i.e Graditent Descent) in order to learn the weights that will produce the minimum error i.e the global minimum of the loss function


In [4]:
#   XOR.py-A very simple neural network to do exclusive or.
#Source : http://python3.codes/neural-network-python-part-1-sigmoid-function-gradient-descent-backpropagation/

import numpy as np
 
epochs = 50000       # Number of iterations
inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 3, 1
 
X = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = np.array([ [0],   [1],   [1],   [0]])
 
def sigmoid (x): return 1/(1 + np.exp(-x))      # activation function
def sigmoid_der(x): return x * (1 - x)             # derivative of sigmoid
                                                # weights on layer inputs
Wh = np.random.uniform(size=(inputLayerSize, hiddenLayerSize))
Wz = np.random.uniform(size=(hiddenLayerSize,outputLayerSize))
 
for i in range(epochs):
 
    ##forward propagation
    H = sigmoid(np.dot(X, Wh))                  # hidden layer results
    Z = sigmoid(np.dot(H, Wz))                  # output layer results
    #calculate error
    E = Y - Z                                   # how much we missed (error)
    #backward propatation
    dZ = E * sigmoid_der(Z)                        # delta Z
    dH = np.dot(dZ,Wz.T) * sigmoid_der(H)             # delta H
    #adjust weights for next iteration
    Wz +=  H.T.dot(dZ)                          # update output layer weights
    Wh +=  X.T.dot(dH)                          # update hidden layer weights
     
print(Z)                # what have we learnt?


[[ 0.01413497]
 [ 0.991483  ]
 [ 0.99148342]
 [ 0.00220432]]


## Linear Algebra (Basics) 

Reference : http://rlhick.people.wm.edu/stories/linear-algebra-python-basics.html