## Basic Neural Networks: Starting small, thinking big

There has been a lot of buzz surrounding neural networks these days. Almost everyone, is seeking to apply them in some way or the other. You would be surprised that some decades ago, neural networks were not even looked upon as practical tools (and were generally scoffed at). In this notebook, we look at the building blocks of neural networks and then introduce you to your very first, shiny new Neural network. Lets dive right in. 

### The Perceptron: The building blocks of Neural Networks

 ![title](imgs/mlp.png)

The figure above shows the most basic unit of a neural network - the _perceptron_. This unit is basically responsible for giving neural networks their power over a wide variety of data.  You can see that the input to the unit marked $\Sigma$ is a vector $X = \{x_{1}, x_{2} .. x_{n}\}$. You remember the data notebook? There we defined something called as _dimensions_. Well $X$ is precisely a vector of the dimensions of each data point.


It's a little difficult to understand a concept like dimensions at a first glance. So I'll give an example - suppose you've had dinner and want ice cream. Now that decision is determined by a lot of factors. For example, you may ask yourself if you had enough money or if the weather was nice. Also note that having money and weather are _conditionally independent_ i.e. whether you have money or not is not dependent on the weather (unless you worked at a weather station). So your decision to get ice cream is based on a a collection of _conditionally independent_ variables which influence your decision in different ways. In a totally nerdy way, we're actually going to tabulate some possible choices that lead to good (yay ice cream) or bad decisions (nay ice cream) for clarity. 


In [None]:
# there is no need to understand this code, but feel free to muck around 
from IPython.display import HTML, display 
import tabulate 

table = [["Money", "Weather", "Mood", "Friends", "Actions"],
         ["Yes", "warm","awesome", "yes", "Buy"],
         ["Yes","cold","adventurous", "maybe", "Buy"],
         ["No", "warm", "meh", "yes", "Pass"],
         ["Yes", "hot","okay", "no", "Pass"]
        ]
display(HTML(tabulate.tabulate(table, tablefmt='html')))



You can see that all these dimensions influence your decisions in different ways. Moreover, you can choose to put more importance on certain factors as compared to others. This is represented by a weight vector $W = \{ w_{1}, w_{2} ... w_{n}\}$ in the figure. In addition, the cell marked $\Sigma$ has it's own weight $w_{0}$. This is called the _bias_. Essentially, the $\Sigma$ cell is performing this operation on the $X$ and $W$: 


   \begin{equation}
     y = w_{1}*x_{1} + w_{2}*x_{2} + ... w_{n}*x_{n} + w_{0}(t)
    \end{equation}
    
  This can be more succintly written as: $y = \Sigma w.X^{T} + w_{0}(t)$. This equation is the most fundamental equation in neural networks. What this does is to squash multiple dimensions to _one_ real value which represents a weighted sum of all inputs. The vector $X$ is transposed since most of the dimensions are represented as row vectors. 
     
     
        
  



### Activations: The secret sauce of success 

So far our world has been linear. We have a couple of inputs, we decide to take a weighted sum of them and squash them to a continuous value. But that doesn't translate into a decision. But wait! we have not explored the figure fully yet. That squiggly line you see after the $\Sigma$ cell? That is what gives the neural networks their true power. We're going to discuss what exactly it does now

In [None]:
import matplotlib.pyplot as plt 
from ipywidgets import interactive, FloatSlider, IntSlider
import numpy as np 

%matplotlib inline 

def centered_sigmoid(x, a=1, b=1):
    s = 1/(1+np.exp(-(x-a))/(b+0.1))
    return s 


def plot_sigmoid(a, b):
    figure = plt.figure(figsize=(10,10))
    x = np.linspace(0, 10, num=200)
    sigm = centered_sigmoid(x,a,b)
    sigm_n = (sigm - sigm.min())/(sigm.max()-sigm.min())
    plt.plot(x,sigm_n)
    plt.xlim(x.max(), x.min())
    plt.title("The Sigmoid nonlinearity")
    plt.ylabel("Range")
    plt.xlabel("Input")
    plt.show()
    



a_slider = FloatSlider(min=-5, value=0, max=10, step=0.3)
b_slider = IntSlider(min=-5, value=0, max=10, step=1)
    
interactive(plot_sigmoid, a=a_slider, b=b_slider)






The squiggly line in the cell after the $\Sigma$ cell is what we call as an _activation function_ or simply _activation_. This does two things:

1. Quash a value between a certain fixed range. 

2. Convert a linear value into a non linear value. 


We saw in the main equation that $y$ was a _linear_ function of input vector $X$. However, once we pass $y$ through this activation function, it can be any value between a certain fixed range. In the code example above, we have introduced the `sigmoid` activation function which quashes the values to a range of $[-1,1]$. But before we discuss the sigmoid function, I'd like to point out _why_ we call this an activation function. 



When we quash the value $y$ in a range, it doesn't necessarily make a neuron _fire_ i.e. it doesn't cause the neuron to output any value. A neuron can _only fire_ when the output of this activation is greater than a predefined _threshold_. Since the output of the activation function decides whether a neuron can fire or not, these functions are called activations.

-- TODO: Write about sigmoid. 

### Neural Networks: First contact

 ![title](imgs/fcon_nn.jpeg)