## Basic Neural Networks: Starting small, thinking big

There has been a lot of buzz surrounding neural networks these days. Almost everyone, is seeking to apply them in some way or the other. You would be surprised that some decades ago, neural networks were not even looked upon as practical tools (and were generally scoffed at). In this notebook, we look at the building blocks of neural networks and then introduce you to your very first, shiny new Neural network. Lets dive right in. 

### The Perceptron: The building blocks of Neural Networks

 ![title](imgs/mlp.png)

The figure above shows the most basic unit of a neural network - the _perceptron_. This unit is basically responsible for giving neural networks their power over a wide variety of data.  You can see that the input to the unit marked $\Sigma$ is a vector $X = \{x_{1}, x_{2} .. x_{n}\}$. You remember the data notebook? There we defined something called as _dimensions_. Well $X$ is precisely a vector of the dimensions of each data point.


It's a little difficult to understand a concept like dimensions at a first glance. So I'll give an example - suppose you've had dinner and want ice cream. Now that decision is determined by a lot of factors. For example, you may ask yourself if you had enough money or if the weather was nice. Also note that having money and weather are _conditionally independent_ i.e. whether you have money or not is not dependent on the weather (unless you worked at a weather station). So your decision to get ice cream is based on a a collection of _conditionally independent_ variables which influence your decision in different ways. In a totally nerdy way, we're actually going to tabulate some possible choices that lead to good (yay ice cream) or bad decisions (nay ice cream) for clarity. 


In [None]:
# there is no need to understand this code, but feel free to muck around 
from IPython.display import HTML, display 
import tabulate 

table = [["Money", "Weather", "Mood", "Friends", "Actions"],
         ["Yes", "warm","awesome", "yes", "Buy"],
         ["Yes","cold","adventurous", "maybe", "Buy"],
         ["No", "warm", "meh", "yes", "Pass"],
         ["Yes", "hot","okay", "no", "Pass"]
        ]
display(HTML(tabulate.tabulate(table, tablefmt='html')))



You can see that all these dimensions influence your decisions in different ways. Moreover, you can choose to put more importance on certain factors as compared to others. This is represented by a weight vector $W = \{ w_{1}, w_{2} ... w_{n}\}$ in the figure. In addition, the cell marked $\Sigma$ has it's own weight $w_{0}$. This is called the _bias_. Essentially, the $\Sigma$ cell is performing this operation on the $X$ and $W$: 


   \begin{equation}
     y = w_{1}*x_{1} + w_{2}*x_{2} + ... w_{n}*x_{n} + w_{0}(t)
    \end{equation}
    
  This can be more succintly written as: $y = \Sigma w.X^{T} + w_{0}(t)$. This equation is the most fundamental equation in neural networks. What this does is to squash multiple dimensions to _one_ real value which represents a weighted sum of all inputs. The vector $X$ is transposed since most of the dimensions are represented as row vectors. 
     
     
        
  



### Activations: The secret sauce of success 

So far our world has been linear. We have a couple of inputs, we decide to take a weighted sum of them and squash them to a continuous value. But that doesn't translate into a decision. But wait! we have not explored the figure fully yet. That squiggly line you see after the $\Sigma$ cell? That is what gives the neural networks their true power. We're going to discuss what exactly it does now

In [None]:
import matplotlib.pyplot as plt 
from ipywidgets import interactive, FloatSlider, IntSlider
import numpy as np 

%matplotlib inline 

def centered_sigmoid(x, a=1, b=1):
    s = 1/(1+np.exp(-(x-a))/(b+0.1))
    return s 


def plot_sigmoid(a, b):
    figure = plt.figure(figsize=(10,10))
    x = np.linspace(0, 10, num=200)
    sigm = centered_sigmoid(x,a,b)
    sigm_n = (sigm - sigm.min())/(sigm.max()-sigm.min())
    plt.plot(x,sigm_n)
    plt.xlim(x.max(), x.min())
    plt.title("The Sigmoid nonlinearity")
    plt.ylabel("Range")
    plt.xlabel("Input")
    plt.show()
    



a_slider = FloatSlider(min=-5, value=0, max=10, step=0.3)
b_slider = IntSlider(min=-5, value=0, max=10, step=1)
    
interactive(plot_sigmoid, a=a_slider, b=b_slider)






The squiggly line in the cell after the $\Sigma$ cell is what we call as an _activation function_ or simply _activation_. This does two things:

1. Quash a value between a certain fixed range. 

2. Convert a linear value into a non linear value. 


We saw in the main equation that $y$ was a _linear_ function of input vector $X$. However, once we pass $y$ through this activation function, it can be any value between a certain fixed range. In the code example above, we have introduced the `sigmoid` activation function which quashes the values to a range of $[-1,1]$. But before we discuss the sigmoid function, I'd like to point out _why_ we call this an activation function. 



When we quash the value $y$ in a range, it doesn't necessarily make a neuron _fire_ i.e. it doesn't cause the neuron to output any value. A neuron can _only fire_ when the output of this activation is greater than a predefined _threshold_. Since the output of the activation function decides whether a neuron can fire or not, these functions are called activations.

The output of this whole neuron is $y$ which is calculated as follows: 

\begin{equation}
  y = \sigma(w*X^{T} + w_{0}(t))
\end{equation}

This is the most fundamental equation in all of deep learning. The $\sigma$ is the activation function like the sigmoid and is responsible for doing what we just saw. The $y$ is then matched with a _label_ and if it is wrong then we penalize the neural network. You must also note that the weight vector $w$ is not _fixed_ . Instead, we adjust them so that $y$ produces a result that is exactly equal to the actual label. 

### Neural Networks: First contact

 ![title](imgs/fcon_nn.jpeg)

So far, we have built up an understanding of how a single neuron works with a weight vector $w$ and an input vector $X$. If you look at the diagram above you'll see that it has many rounded circles enclosed in a rectangle. The circles are nothing but our new found friend - the neuron. The rectangle is what we call a _layer_. Before we define what a layer is, let us try to understand why we need them. 

You noticed in the section above that the output of neurons is computed by an equation. For small vectors of $w$ and $X$ this equation can serve all our needs. However, in deep learning there are thousands of dimensions for _millions_ of data points. You can see that, it will be very tedious to compute $y$ for each neuron by simply looping through the input. To get around this, we define something that can:

1. Compute  $y$ for each neuron in a single step.
2. Make it unnecessary for us to worry about every single neuron. 

This is what is served by a _layer_. It's nothing but a collection of neurons and guarantees that if we compute $y$ for a layer, we would get the same result if we computed for each and every neuron. We represent the layer by a __weight matrix__ $W$ that is `num_samples x num_dimensions` size. In the figure, a "fully connected" layer means that neurons in one layer and "fully connected" to neurons in the next layer. You'll understand this as we go further in the course. 


In this notebook, we're going to look at building a simple network like this to classify images from a dataset called MNIST(shown below). We'll be using a popular library called Keras

 ![title](imgs/mnist_plot-800x600.png)

In [None]:
# some necessary imports 

import keras 
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

In [None]:
# Defining our neural network

model = Sequential() # when we define a variable as sequential() we say that this model will hold different layers

# adding some layers 
model.add(Dense(units=1024, activation='relu', input_shape=(784,))) 
model.add(Dense(units=1200, activation='relu'))
model.add(Dense(units=10, activation='softmax'))

There is a lot to take in, when you're reading this code. I'll walk you through it step by step. 


The first thing that you must realize is that we never use one layer in a neural network. Instead we sort of build a pancake of layers that take in the input and produce an output. The first layer that takes in the input is called the _input layer_. The layer that produces the output is called the _output layer_. In the code above we define `model` as `Sequential()`. What this means is that we're telling the library to pass the input in from the input layer sequentially through other layers. 

We then proceed to `add` other layers to the model. The order in which we add the the layers determines how the input will be passed through the layers, so we must be very careful when doing so. The `add` method takes in a layer. Here, we're passing `Dense` to it, which is Keras speak for a fully connected layer. The `units` in that layer determine _how many_ neurons it should hold. 

We've already seen what an activation is. `Relu` is a much simpler activation that returns the input if it's greater than 0. `Softmax` effectively squashes the output $y$ of the layer to a probability value between [0,1]. 

In [None]:
# defininig the training logic

def train(model, train_data, train_labels, epochs, num_classes, one_hot=True):
    """
    Trains a model defined in Keras.
    
    Args:
    model: A "compiled" model. 
    train_data: The training data
    train_labels: The actual correct labels 
    epochs: The number of "epochs" to train
    num_classes: The number of possible classes 
    one_hot: A boolean indicating if we should "one-hot" 
    the labels
    
    Returns:
    Nothing.
    """
    labels = None
    ################################################################################
    # when you want to train the
    # network, you simply call "fit"
    # on the model you defined earlier. The x is training data, y are correct labels.
    # One-hot: This converts a number to a vector of num_class zeros in which only the "number"
    # index is 1. E.g. 3 - > [0,0,0,1, 0,0,0,0,0,0,0] for 10 classes. 
    ###############################################################################
    if one_hot: 
        labels = keras.utils.to_categorical(train_labels, num_classes=num_classes)
    else:
        labels = train_labels
        
    model.fit(x=train_data, y=labels, epochs=epochs, batch_size=32)
    

In [None]:
# Compiling the model: This means we tell the model about 3 things 
# 1. The optimizer, 2. The loss function, 3. any metrics we want. 

# For the purpose of this notebook, you can leave these settings as is.

model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
# Lets see what the model looks like
print(model.summary())

In [None]:
# Finally, lets train 
from keras.datasets import mnist

(train_data, train_labels), (test_data, test_labels) = mnist.load_data()
rtrain_data = train_data.reshape(train_data.shape[0], train_data.shape[1]*train_data.shape[2])
one_hot_train = keras.utils.to_categorical(train_labels, num_classes=10)
one_hot_test = keras.utils.to_categorical(test_labels, num_classes=10)

rtest_data = test_data.reshape(test_data.shape[0], test_data.shape[1]*test_data.shape[2])


train(model, rtrain_data, train_labels, epochs=10, num_classes=10)

In [None]:
# Time to see how well a model does! 

# When we say evaluate we show the model data it's never seen before. This is what test_data represents. When you build
# own models, always keep some portion of your data as this test data. To save us work, we'll use the evaluate function


def eval_model(model, test_data, test_labels, num_classes, one_hot=True):
    """
    Evaluates a trained model and returns the loss and accuracy on test set.
    
    Args:
    model: A trained model. You MUST call the train method before using this. 
    test_data: Test data 
    test_labels: The test labels 
    num_classes: 10
    one_hot: One hot encode the labels.
    
    Returns:
    A List containing the test loss and test accuracy
    """
    labels = None
    if one_hot:
        labels = keras.utils.to_categorical(test_labels, num_classes=num_classes)
    else:
        labels = test_labels 
        
    score = model.evaluate(x=test_data, y=labels)
    return score 

In [None]:
score = eval_model(model, rtest_data, test_labels, num_classes=10)
print("Test Loss: {}".format(score[0]))
print("Test Accuracy:{}".format(score[1]))

## Assigment: Get your hands dirty 

The purpose of this assignment is to allow yourself to get your ahnds dirty with neural networks. We define 4 tasks that need to be done. If you want to learn more, feel free to read through and change other things.


1. Refer to Keras documentation [here](https://keras.io/layers/core/). Read and understand the "Dense" Layer. 

2. Define a function called `model_builder` as given in the next cell. The function should take in a list of number of units in each layer. It should return a model in Keras. 

3. Create 3 different versions of your models:

         i. 2 Layer networks of units [1024,1200] 
     
         ii. 3 Layer networks of units [32, 1024, 1200] 
     
         iii. 4 Layer networks of units [32, 1024, 1024, 1200] 


Prepare a table of the following from your experiments in task 3:

        i. Number of trainable parameters for each config 
    
        ii. Training accuracy at the end of your chosen number of epochs 
    
        iii. Test accuracy at eval 
    
        iv. Test loss at eval. 
    
    
Food for thought: Do you see the accuracy increase as more layers are added? Why do you think that happens? 

In [None]:
def model_builder(config=None):
    """
    Builds a model using the sequential API of Keras. 
    
    Args:
    config: A list containing the number of units in each Dense layer. 
    
    Returns:
    A Sequential model 
    
    Your code will be called as such:
    model = model_builder(config=[32,1024,10])
    """
    
    model = Sequential()
    
    # your code here 
    
    return model 