## Lab 9: Neural Networks

In this lab, we will explore how to construct a standard feed-forward neural network using only basic numpy and python functions.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline



##  Some good materials for understanding neural networks
[Neural Networks, Manifolds, and Topology - Colah's blog](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)<br>
[How the backpropagation algorithm works - Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/chap2.html
)

## The structure and mathematical forms of neural networks
![Neural network](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/img/example_network.svg)
<br><br><br>
## About activation functions
### tanh
$y=\tanh(x)$
<br>$y\in(-1,1)$
<br>$y'=1-y^2$

In [None]:
x=np.linspace(-5,5,2000)
y=np.tanh(x)
plt.plot(x,y)

### sigmoid
$y={\displaystyle \frac{1}{1+e^{-x}} }$
<br><br>$y\in(0,1)$
<br>$y'=y(1-y)$


In [None]:
x=np.linspace(-5,5,2000)
y=1/(1+np.exp(-x))
plt.plot(x,y)

### ReLU
$y={\displaystyle \begin{equation}
\begin{cases} &x \:\:\: x\geqslant0 \\
& 0 \:\:\: x<0
\end{cases}
\end{equation}}$
<br><br>$y\in[0,\infty)$
<br><br>$y'=\begin{equation}
\begin{cases}
& 1\:\:\: x\geqslant0 \\
& 0\:\:\: x<0
\end{cases}
\end{equation}$


In [None]:
x=np.linspace(-5,5,2000)
y=x*(x>=0)
plt.plot(x,y)

### softmax
$y_i=f_i(\vec{x})={\displaystyle \frac{e^{x_i}}{\sum_{j=1}^J e^{x_j}}}$
<br>$y_i\in[0,1]$
<br><br>${\displaystyle \frac{\partial y_i}{\partial x_j}=y_i(\delta_{ij}-y_j)}$

<br><br><br><br>
## Understanding back propagation
http://neuralnetworksanddeeplearning.com/chap2.html
$$\begin{eqnarray} C_{MSE}&=&\frac{1}{2n}\lVert a^L-y^{GT}\rVert^2 \\
\delta^L=\frac{\partial C}{\partial z^L}&=&\nabla_aC\odot\sigma'(z^L) \\
\delta^l=\frac{\partial C}{\partial z^l}&=&((w^{l+1})^T\delta^{l+1})\odot\sigma'(z^l) \\
\frac{\partial C}{\partial b^l}&=&\delta^l \\
\frac{\partial C}{\partial w^l}&=&(\delta^l)^T a^{l-1}
\end{eqnarray}$$
<br>
## Question 1: Implementation of NN in a class

Finish the `init_weights` method below by adding in randomly sampled weights and biases using `np.random.random`. Your weights in a given layer should be constructed as a 2d array, your biases as a 1d array. What should the dimensions for your weight array look like based on the architechture of the two layers it connects?

In [None]:
import numpy as np

class NN():
    def __init__(self,architecture,learning_rate=0.01,activation=lambda x:x):
        '''The architecture is a list, with each element specifying the number of nodes in each layer'''
        self.arch=architecture
        self.activation=activation
        self.lr=learning_rate
        self.ffcount=len(self.arch)-1
        self.initialized=False
        
    def init_weights(self):
        self.weights=[]
        self.biases=[]
        for n in range(self.ffcount):
            prev_layer_neurons=self.arch[n]
            current_layer_neurons=self.arch[n+1]
            self.weights.append(...)
            self.biases.append(...)
        self.initialized=True
        
    def feed_forward(self,x):
        if self.initialized:
            a_n=np.array(x)
            z_s=[]
            a_s=[a_n]
            for n in range(self.ffcount):
                z_n=self.weights[n].dot(a_n)+self.biases[n]
                a_n=self.activation(z_n)
                z_s.append(z_n)
                a_s.append(a_n)
            self.z_s=z_s
            self.a_s=a_s
            self.pred=a_n
            return a_n
        else:
            print("Please initialize the weights first!")
            
    def calc_error(self,y,activation_grad):
        y=np.array(y)
        errors=[(self.pred-y)*activation_grad(self.z_s[-1])]
        for n in range(self.ffcount-1):
            errors.append(self.weights[-n-1].T.dot(errors[-1])*activation_grad(self.z_s[-n-2]))
        errors.reverse()
        self.errors=errors
        return errors
            
    def calc_grad(self):
        weights_grad=[]
        for n in range(self.ffcount):
            weights_grad.append(self.errors[n].reshape(-1,1).dot(self.a_s[n].reshape(1,-1)))
        biases_grad=self.errors
        self.weights_grad=weights_grad
        self.bias_grad=biases_grad
        return weights_grad,biases_grad
    
    def back_prop(self):
        for n in range(self.ffcount):
            self.weights[n]=self.weights[n]-self.lr*self.weights_grad[n]
            self.biases[n]=self.biases[n]-self.lr*self.bias_grad[n]
            
    def fit(self,x,y,activation_grad):
        self.feed_forward(x)
        self.calc_error(y,activation_grad)
        self.calc_grad()
        self.back_prop()
        
    def predict(self,x):
        return self.feed_forward(x)

## Question 2: Activation Functions

Implement the tanh activation function and it's respective derivative as setup below so that we can use them in our NN model.

In [None]:
def tanh(x):
    return 

def tanh_grad(x):
    return 



### Test Your Work
The cell below should run without errors.

In [None]:
nn=NN([6,2,2],activation=tanh)
nn.init_weights()
print("Initialized prediction:",nn.predict([-1,1,-1,-1,1,-1]))
nn.fit([-1,1,-1,-1,1,-1],[-1,-1],tanh_grad)
print("Error in nodes:",nn.errors)
print("Prediction after fitting once:",nn.predict([-1,1,-1,-1,1,-1]))

## Question 3: Regularization, L1 and L2
<br>
$$C=\frac{1}{2n}\lVert a^L-y^{GT}\rVert^2+\lambda\sum\lVert \theta_i\rVert ^p$$

Copy your working `NN()` class code from the previous question (apart from the __init__ section )and use it as a starting point for the `NN_reg()` class setup below. Your goal will be to modify the `calc_error()` method such that the 'error' variable in that method includeds either L1 (Lasso) or L2 (Regression) depending on user input.

**Hint: You can use the .flatten() method on an array of weights in a given layer to get everything into a 1d array**

In [None]:
class NN_reg():
    def __init__(self,architecture,regularization=None,learning_rate=0.01,activation=lambda x:x,reg_lambda=0.1):
        '''The architecture is a list, with each element specifying the number of nodes in each layer'''
        self.arch=architecture
        self.activation=activation
        self.lr=learning_rate
        self.ffcount=len(self.arch)-1
        self.regularization=regularization
        self.reg_lambda=reg_lambda
        self.initialized=False
...

### Question 4: Testing The Fitting Proccess

Using your modified `NN_reg()` code and the one off training example above, create a new model and train it using the same single data point for 1000 epochs. Save the model predictions and relative errors at each epoch, plotting the results in a way that demonstrates the progressive improvement of the model fit.

In [None]:
nn=NN_reg([6,2,2],'L2',activation=tanh)
nn.init_weights()
print("Initialized prediction:",nn.predict([-1,1,-1,-1,1,-1]))

...


## Some guidelines for designing a reasonable NN
* The first hidden layer should usually expand the dimension
* The following hidden layers should gradually contract (form funnel shape network)
* Dimension change between layers should not be crazy
* Notice the total number of parameters - ideally less than 1/10 of number of training examples
* Should start from shallow networks, if both training error/ test error are high, then consider increasing network complexity
* Usually suggested to use ReLU activation for hidden layers (can also try tanh)
* Choose the right type of activation for the output layer!
