# Lab 21

Today, we continue our journey into deep learning, by networking our perceptons. Today'a goals are: 

0. Define network in the context of deep learning
1. Code an activation function
2. Detail the differences between the activation and threshold function

Today we are going to be following aspects of this [blog post](https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6).

### Imports for today 

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

import numpy as np
from numpy import linalg as LA
import pandas as pd

import random 

In [None]:
# Import the lab 14 data to be an example

new_data = pd.read_csv("lab14data.csv", sep = ",")
new_data_np = np.genfromtxt("lab14data.csv", delimiter=',', skip_header=1)

## Networks 

Along with our artificial neurons, _networks_ are the other central ingredient to our neural networks. Before we discuss what a neural network is, we begin with a few basic network terms:

* Nodes - These are objects that we want to form connections between
* Edges - These are the connections between the nodes
* Directional Edges - Edges can have direction, or a "to" side and "from" side. These are often used to represent flow of information. 

With networks that only have directional edges, we can sometimes arrange the nodes into groups where the first group has edges directly connected only to the second group, and then the second group is connected to the third and so on. This grouping is a _partitioning_ of the nodes. (Networks with this kind of partitioning have special properties, but details about these special networks are outside the scope of this course.) 

Neural networks have such a structure. In deep learning, we call each of these groups _layers._ 
* The first layer is the **input layer** where each variable is its own node. 
* The last layer is the **output layer** where we get our final predictions
* All the other layers are **hidden layers** 

In the case of neural networks, the strength of the connections between the nodes in each layer are the weights. 

Consider the below image from [this website](http://www.texample.net/tikz/examples/neural-network/):

![neural-network.png](attachment:neural-network.png)


## First Perceptron as a NN

So what do we mean by "the strength of the connections between the nodes in each layer are the weights"? Let's look at an example, we know: the perceptron: 

![perceptron.png](attachment:perceptron.png)

So we have our inputs (in green) and our output. The arrows represent the weights that get used within the red dot to determine the output. In this view, there are no "hidden" parts. We know what goes in (green) and have a prediction (in red). There's no mystery within; hence no hidden layer. 

Within the red node, we have all the actions of the perceptron from last time: 1) gathering and 2) thresholding. If the inputs are denoted as $(x_1,x_2,...,x_n)$, then the output of this red node would be given by (assuming no bias term):

\begin{equation}
Red = \phi(\Sigma_{i=1}^n w_i*x_i
\end{equation}

## Adding a Layer

Let's consider the following network. This one has a hidden layer: 

![perceptron2.png](attachment:perceptron2.png)

For this network, what is happening the purple and red nodes?  (In other words, what is the output for red and purple?)

\begin{equation}
Red = ???
\end{equation}

\begin{equation}
Purple = ???
\end{equation}

### Feed-Forward

There are two parts to a neural network: the forward part and the backward part. The first is called **feed-forward** and refers to how to push the information forward from one layer to the next. 

The last piece of feed-forward that we did not talk about last time is _activation._ Essentially, instead of having the threshold part in the interior layers, we use an _activation_ function that is a softer version of thresholding that tells our node how much to pass forward (in proportion the weighted sum). So instead of having the weighted sum immediately thresholded, we instead use a relaxation _until_ the last layer where we do threshold into one class or another class. 

There are many kinds of activation functions $\phi()$: 
* Identity function: $\phi(x) = x$
* Sigmoid function: $\phi(x) = \dfrac{1}{1+e^{-x}}$
* ReLU: $\phi(x) = x \textrm{ if } x \geq 0 \textrm{; 0, otherwise}$

In the below code blocks, code implementations for sigmoid and ReLU

In [None]:
# Implementation for sigmoid
def sigmoid(x):
    pass

In [None]:
# Implementaiton for ReLU
def ReLU(x):
    pass

### Adding to the perceptron

Returning to your perceptron from last time, instead of the threshold function, replace it with the sigmoid function. 

In [None]:
# Add your edited perceptron here:

### Backpropagation 

Backpropagation is where we update our weights. As with our preceptron, we do this using a derivative. In typically backpropagation, we use the MSE as our "loss" function that we want to optimize. Considering our example network with one hidden layer, let's work through the details of how we would take this derivative for each of the weights in our network. 

**Pencil derivative time!**

Next let's code partial derivatives for our input weights and our hidden weight: 

In [None]:
# Implementation for partial derivatives for input weights

In [None]:
# Implementation for partial derivatives for hidden weight

### Putting it all together

Keeping the idea stochastic gradient descent in mind, code `firstNN` that takes in:
0. A dataset
1. The max number of epochs   

It should:
0. Determine the right number of nodes at the input layer given the input data
1. Set initial weights for those in the first layer and from the hidden to the output layer
2. Use a threshold function at the output layer to assign the positive and negative classes
2. Update the weights each time using the derivative functions 

This function should eventually output all of the weights and the predictions. 

In [None]:
# Add your firstNN here

### Final Thoughts

To finish up this lab, apply your `firstNN` implementation to our  Share your thoughts in a post on **#lab_submission** channel on slack with your answer. Your post must start with **Lab20** to get credit.  

If your have questions from this lab, post them to #lab_questions with the same preamble (i.e. starting with **Lab20**). If you have the same question, please use one of the emoji's to upvote the question. If you would like to answer someone's question, please use the thread function. This will tie your answer to their question. 


#### Resources consulted 

0. _Python Machine Learning_
1. [Images based on this site](http://www.texample.net/tikz/examples/neural-network/)
2. [How to build your own Neural Network from scratch in Python](https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6)
3. [Building Neural Network from scratch](https://towardsdatascience.com/building-neural-network-from-scratch-9c88535bf8e9)
4. [Sigmoid](https://stackoverflow.com/questions/3985619/how-to-calculate-a-logistic-sigmoid-function-in-python)
5. [Activation functions](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html#relu)