# Tutorial 2: Neural Network Training
The tutorial is meant to build some understanding of how neural networks learn relationships between features `x` and a 
target `y`. The tutorial deals with gradient descent. In the exercises, we assume that students are familiar with the general form of neural networks. Hence, the architecture of neural networks is not part of the exercises. You can find corresponding demonstrations in our [feedforward neural network demo](https://github.com/Humboldt-WI/delta/blob/master/demos/fnn/nn_foundations.ipynb). That notebook covers the architecture of neural networks and the learning procedure. Here, we only focus on the latter.

For this exercise, we restrict the architecture of the neural networks to the form
$f(x)=\beta\cdot sigmoid(x)$. This corresponds to a very simple neural network with linear output function, sigmoid activation, 1 hidden layer and bias (constants) forced to zero. By considering this simple neural network, the code becomes simpler, and you can (hopefully) gain a better intuition of neural network learning procedures.  

We will go through further exercises covering back-propagation and stochastic gradient descent during the session.

In [1]:
## required libraries
import numpy as np
import matplotlib.pyplot as plt

In [193]:
## define sigmoid function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

## create data
x = np.array(range(-10,10))
y = 2*sigmoid(x)

## Exercises
Our goal is to find the coefficient beta, such that the function $f(x)=\beta\cdot sigmoid(x)$ fits the data best 
(according to the mean squared error).  
Your task is to implement gradient descent in order to find beta. This means in detail:


### Part A.
You need to calculate the derivative of the loss function $L(Y,f(X))=\frac{1}{n}\sum_{i}(y_{i}-f(x_{i}))^{2}$ w.r.t. $\beta$. For simplicity we provide an impementation of this loss function called `grad_beta`

In [194]:
def func_f(beta, x):
    return beta*sigmoid(x)
def grad_beta(beta, y, x): 
    return np.mean(-2*(y-func_f(beta, x))*sigmoid(x))

### Part B. 
Implement a function `grad_desc(beta_ini, lrate, n_epochs)`, with an initial value of beta, the learning rate and the number of iterations (called epochs) as parameters. The function should find the $\beta$ leading to the minimum loss.

### Part C. 
Apply your function for `beta_ini=0`, `n_epochs=20` and some learning rates of your choice. Which is the best learning rate? What happens for particularly high or low learning rates?