# Neural Networks Background

##  Logistic Regression and Activation Function

Recall that Linear regression tries to predict the data by <b>finding a linear – straight line – equation</b> to model or predict future data points. It works on <b>continuous numeric variable</b> given a set of inputs. 

Logistic regression on the other hand tries to find the <b>plane/line that separates the input space into two regions</b> i.e linearly separable. The output is a <b>probability</b> that the given input point belongs to a certain class.
<table width="600" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/lin_reg.png" />
</td>
<td align="center" valign="center">
<img src="images/log_reg.png" />
</td>

</tr>
</table>

Source:  https://simple.wikipedia.org/wiki/Logistic_Regression


<table width="600" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/sigmoid_vs_linear.png" alt="Biological Neuron and Artificial Neuron" />
</td>
</tr>
</table>

<ul>
<li>The logistic regression formula is <b>derived from the standard linear equation for a straight line</b> that is y=mx + b
</li>
<li>Using the <b>Sigmoid function (a.k.a Activation Function)</b>, the standard linear formula is transformed to the logistic regression formula
</li>
<li>This logistic regression function is useful for <b>predicting the class</b> i.e the output has disceret / <b>catagorical values</b>. (versus linear regression where output is a continious variable)
</li>
<li>The Activation Function therefore <b>adds a non-linearity</b> to the linear function. 
</li>
</ul>
 
 #### Common Activation Functions
<table width="700" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/activation.png" alt="" />
</td>
</tr>
</table>


## Biological Neuron vs Artificial Neuron a.k.a Perceptron


<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/neuron_and_artif_neuron.png" alt="Biological Neuron and Artificial Neuron" />
</td>

</tr>
</table>

<table width="400" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/Formula.png" />
</td>

</tr>
</table>

<ul>
  
  <li>In the basic neuron model, the dendrites carry the signal to the cell body where they all get summed <b>(weighted Sum)</b>. If the final sum is above a certain threshold, the neuron can fire, sending a spike along its axon</li>
  <li>Each neuron performs a <b>dot product</b> of the input and its weights, adds the bias and applies the <b>non-linearity (i.e activation function)</b>, in this case the sigmoid</li>
  <li>The idea is that the synaptic strengths (the weights w) are learnable and <b>control the strength of influence</b> (and its direction: excitory (positive weight) or inhibitory (negative weight)) of one neuron on another. 
</li>
<li>The <b>firing rate</b> of the neuron is modeled by with an activation function f, which represents the frequency of the spikes along the axon</li>


  
</ul>


The Artificial Neuron is also called the <b>Sigmoid Neuron</b> when activation function used is Sigmoid. These days however ReLU seems to be the first choice in Deep Neural Network implementations with only the output layer using the Sigmoid activation function

<i>Source: http://cs231n.github.io/neural-networks-1/</i>

### NAND gate using One Perceptron
Another way to represnt the above equation i.e condition for the output, we can rearrange it as follows. 


<table width="400" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/w.x.png"/>
</td>
</tr>
</table>



<ul>
    <li>∑j jwjx  as a dot product, w⋅x≡∑jxjw, where w and x are vectors of weight and input.</li>
    <li>Move the threshold to the other side of the inequality, and to replace it by what's known as the perceptron's bias, b≡−threshold</li>
</ul>



Perceptrons can be used is to compute the elementary logical functions i.e underlying computation functions such as AND, OR, and NAND. For example, suppose we have a perceptron with two inputs, each with weight −2, and an overall bias of 3. 



<table width="400" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/nand.png"/>
</td>
</tr>
</table>


<ul>
<li>Input 00 produces output 1, since (−2)∗0+(−2)∗0+3=3(−2)∗0+(−2)∗0+3=3 is <b>positive</b>. </li>
<li>Inputs 01 and 10 produce output 1. </li>
<li>But the input 11 produces output 0, since (−2)∗1+(−2)∗1+3=−1(−2)∗1+(−2)∗1+3=−1 is <b>negative</b>. </li>
<li>And so our perceptron implements a <b>NAND gate</b>!</li>
</ul>

<i>Source: http://neuralnetworksanddeeplearning.com/chap1.html</i>

#### Vectorized Form 
The input data and output lables can be represented in vectorized form. eg, (W.T) . X + b where W and X are the weights and inputs in a vector form. 


#### Implementation in Numpy

In [1]:
import numpy as np

#X = [x1,x2] = 
X = np.array([1,1])
#W = [-2,-2]
W = np.array([-2,-2])
#bias
b = 3
#output 
y = ((np.dot(W.T,X) + b) > 0)

print(y)

False


## Neural Network (i.e Multi-Layer Perceptron)

A neural network therefore is nothing but a network of such individual perceptrons. Each perceptron is called a node.  The network consists of the <b>input layer, the hidden layer(s) and the output layer</b>. The output of each neuron from layer n is connected to ever other neuron in layer (n+1).  The output layer could have multiple nodes but the output of each output node will always lie between [0,1] i.e probability of belonging to a certain class.

<table width="600" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/nn.png" />
</td>

</tr>
</table>


### Feedforward Network

Neural networks where the output from one layer is used as input to the next layer are called feedforward neural networks. This means there are no loops in the network - <b>information is always fed forward, never fed back</b>. 

<i>Source : https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/</i>

### XOR function using multiple perceptrons

Tha  XOR implementation with multiple neurons is a simple neural network with input layer of 2, one hidden layer with 2 neurons and 1 output layer

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/xor.png" />
</td>
<td align="center" valign="center">
<img src="images/xor_with_perceptrons.png" />
</td>
</tr>
</table>

A "single-layer" perceptron can't implement XOR. The reason is because the classes in XOR are not linearly separable. You cannot draw a straight line to separate the points (0,0),(1,1) from the points (0,1),(1,0).


Source: http://toritris.weebly.com/perceptron-2-logical-operations.html






### Feedfoward Implementation of XOR

In the feedforward implementation of XOR, the <b>weights and biases have been precomupted</b> to give the right output i.e Zero error. We cannot learn the weights in a iterative manner in a Feedfowar Neural network. 

[<b>Side Note</b>. In order to LEARN the right weights and biases which would get us to the exact exepcted ouput, we have to feed the input in forrward direction (starting with random weight and bias) , check difference between curent output and expected output,  and then feed info backwards from output to input computing the rate of change of output at each layer with respect to the weights computed in forward direction and then adjusting them to a more optimal value while in the backward direction for each iteration. This step is repeated multiple times for all the inputs.  This iterative step is called Gradient Descent or Backporpagation.]

The solution we described to the XOR problem is at a global minimum of the loss function i.e differece between expected output and computed output y-yhat. So gradient descent would have converged to this point.

<table width="800" border="1" cellpadding="5">
<tr>
<td align="center" valign="center">
<img src="images/my_xor_nn_.png" alt="Drawing" style="width: 800px;"/>

</td>

</tr>
</table>



In [6]:
import numpy as np

def relu(x): return x*(x > 0)

X = np.array([[0,0], [0,1], [1,0], [1,1]]) ##input 
y = np.array([ [0],   [1],   [1],   [0]])  ##expected output

Wh = np.array([[1,1],[1,1]]) #weights for hidden layer
Wz = np.array([1,-2]) #weights for output (z) layer 
bias_c = np.array([0,-1]) #bias terms for hidden layer


XWh = np.dot(X,Wh.T) + bias_c #output of hidden layer before activation
activated_XWh = relu(XWh) #
yHat = (np.dot(activated_XWh, Wz.T))
Z = relu(yHat)

print Z


[0 1 1 0]


### Backprop Implementation of XOR

The code below implements  foward propgation and backward propagation (i.e Graditent Descent) in order to learn the weights that will produce the minimum error i.e the global minimum of the loss function


In [7]:
#   XOR.py-A very simple neural network to do exclusive or.
#Source : http://python3.codes/neural-network-python-part-1-sigmoid-function-gradient-descent-backpropagation/

import numpy as np
 
epochs = 50000       # Number of iterations
inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 3, 1
 
X = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = np.array([ [0],   [1],   [1],   [0]])
 
def sigmoid (x): return 1/(1 + np.exp(-x))      # activation function
def sigmoid_der(x): return x * (1 - x)             # derivative of sigmoid
                                                # weights on layer inputs
Wh = np.random.uniform(size=(inputLayerSize, hiddenLayerSize))
Wz = np.random.uniform(size=(hiddenLayerSize,outputLayerSize))
 
for i in range(epochs):
 
    ##forward propagation
    H = sigmoid(np.dot(X, Wh))                  # hidden layer results
    Z = sigmoid(np.dot(H, Wz))                  # output layer results
    #calculate error
    E = Y - Z                                   # how much we missed (error)
    #backward propatation
    dZ = E * sigmoid_der(Z)                        # delta Z
    dH = np.dot(dZ,Wz.T) * sigmoid_der(H)             # delta H
    #adjust weights for next iteration
    Wz +=  H.T.dot(dZ)                          # update output layer weights
    Wh +=  X.T.dot(dH)                          # update hidden layer weights
     
print(Z)                # what have we learnt?


[[ 0.01410154]
 [ 0.99149623]
 [ 0.991496  ]
 [ 0.00224726]]


## Linear Algebra (Basics) 

Reference : http://rlhick.people.wm.edu/stories/linear-algebra-python-basics.html