# Regression with Neural Networks

Artificial neural networks are used to solve an extensive variety of problems. In this chapter we will focus on the problem of regression, since it is the easiest to begin with. We will explore the foundations of neural networks by using them to solve exemplary regression tasks. At first, the single artificial neuron is introduced. In the next step, activation functions are made familiar through interactive tasks. Finally, backpropagation is explained at the end of the chapter.


## The Artificial Neuron in Theory


An artificial neuron is a mathematical function that is inspired by the information processing of a biological neural cell. Each neuron $k$ accepts one or multiple values as inputs (X) and outputs one value (Y). It thereby performs a very simple mathematical operation:

- **Weights** $w_{ki}$ multiply the input values
- A **summation $v_{k}$** of the weighted inputs is calculated
- A **constant value** $b_k$ is added to the sum (so called **Bias**)
- An **activation function $𝜙$** is applied to the sum

The resulting output of the activation function is the output (also called "activation") $y$ of the neuron. 
Even though this mechanism is very simple, a multitude of simple neurons is able to solve very complex problems.

An artificial Neuron can be described as follows (1):
\begin{align}
y & = \phi\sum_{n=0}^N {x_n \cdot w_n}       \;\;\;\;\;\;\;\;\;\;\;        (1)
\end{align}


<img src="images/neural_network.png" />
<p style="text-align: center;">
    Fig. 1 - General artificial neuron
</p>


**Note:** An artificial Neuron is just an abstract concept. Even though we will create _neuron objects_ in this Jupyter Notebook for code reusability, a neuron is not bound to any specific shape. As long as we have a mathematical function that can be expressed in the Form of (1), we can view it as a "Neuron".

## A Simple Neuron in Practice
For the sake of explanation, we will now examine a neuron with only one single input and without any activation function. This neuron is already able to model functions with the form (2). The neuron can be visualized as seen in Figure 2.

\begin{align}
y & = w * x + b \;\;\;\;\;\;\;\;\;\;\;        (2)
\end{align}



<img src="images/single_neuron_no_activation.png" />
<p style="text-align: center;">
    Fig. 2 - Simple artificial neuron
</p>




Our neuron class will just have one input, one weight and one bias. 
- Upon initialization, it will be connected to an interactive plot
- Its weights and biases can be changed using the set_values method.
- Its weights and biases can be polled using the get_weights/get_bias methods.
- When the Neuron is changed, it notifies the interactive plot to redraw its output
- It has a compute method that computes the activation based in the weight and input

Run the cell below to define a neuron class.

In [1]:
# do not change
class SimpleNeuron:
    def __init__(self, plot):
        self.plot = plot #I am assigned the following plot
        self.plot.register_neuron(self) #hey plot, remember me
        
    def set_values(self, weight, bias):
        self.weight = weight
        self.bias = bias
        self.plot.update() #hey plot, I have changed, redraw my output
        
    def get_weight(self):
        return self.weight
    
    def get_bias(self):
        return self.bias

    def compute(self, x):
        self.activation = self.weight * x + self.bias
        return self.activation

##  The Problem of Regression
In the task of regression analysis, a model function has to be found that matches a given set of data points N as **accurately** as possible. A commonly used metric for the accuracy of the approximation is the **least squares approach**. The vertical distance between each data point $(x_n|y_n)$ and the model function $m(x_n)$ is calculated by subtraction of the y-values of the data points with the predicted y-values of the model function (3).

\begin{align}
d_n & = m(x_n)-y_n \;\;\;\;\;\;\;\;\;\;\;        (3)
\end{align}

The distances are then each squared and summed up. Since we want to compare the quality of an approximation with other approximations that have a different amount of data points, we also divide the sum by the total number of data points (_Mean Squared Error_). This will be our **Loss** function(4). Our goal is to keep this metric as low as possible, since the lower the loss, the better the approximation.

\begin{align}
Loss & = \frac{1}{n} \sum_{n=0}^N {[m(x_n)-y_n]}^2 \;\;\;\;\;\;\;\;\;\;\;        (4)
\end{align}

If we have achieved an accurate regression, we can make **predictions** with it. We will train our neurons to match a given set of points and then use them to predict new points. That is, we will give the neuron new x-values and it will predict y-values.


<img src="images/least_squares_explanation.png" />
<p style="text-align: center;">
    Fig. 3 - Distance to model function visualized
</p>





We will create a function "loss" that performs the operation (4). It will receive a neuron object and a set of points as arguments.
- For each point that we give it, it first separates x and y-values. 
- It hands the neuron an x-value and asks the neuron to compute a prediction for the y-value. (see $m(x_n)$) 
- Then it subtracts the real y-value from the predicted y-value, as in operation (3), resulting in a distance
- It then squares up the distance and accumulates the squared distances.  
- In the last step, it divides the sum of squared distances by the amount of compared points.

Run the cell below to define a loss function.

In [2]:
# do not change
def loss(neuron, points):
    sum_squared_dist = 0

    for point_x, point_y in zip(points["x"], points["y"]):  # zip merges both points["x"] and points["y"]

        predicted_point_y = neuron.compute(point_x)
        dist = point_y - predicted_point_y
        squared_dist = dist ** 2
        sum_squared_dist += squared_dist

    loss = sum_squared_dist / len(points["y"])
    return loss


### Preparing an Interactive Plot

After importing the necessary libraries, we will set up an interactive plot class. It plots the output of a neuron by asking it to compute a set of x-values, which results in a set of predicted y-values that can be drawn on a plane. If the weight or bias of a neuron is changed, the neuron calls the "redraw" method of its plot to update it. The plot can also plot fixed points. Interactive sliders will be used to directly modify the weights and biases of neuron objects.

**Note:** The plot classes are not part of the subject matter for this lab.

Run the cells below to import libraries and define an interactive plot.

In [3]:
# do not change
import numpy as np
import plotly.offline as plotly
import plotly.graph_objs as go
from ipywidgets import interact, Layout, HBox, FloatSlider
import time
import threading

In [4]:
# do not change
# an Interactive Plot monitors the activation of a neuron or a neural network
class Interactive2DPlot:
    def __init__(self, points, ranges, width=800, height=400, margin=dict(t=0, l=170), draw_time=0.05):
        self.idle = True
        self.points = points
        self.x = np.arange(ranges["x"][0], ranges["x"][1], 0.1)
        self.y = np.arange(ranges["y"][0], ranges["y"][1], 0.1)
        self.draw_time = draw_time
        self.layout = go.Layout(
            xaxis=dict(title="x", range=ranges["x"], fixedrange=True),
            yaxis=dict(title="y", range=ranges["y"], fixedrange=True),
            width=width,
            height=height,
            showlegend=False,
            autosize=False,
            margin=margin,
        )
        self.trace = go.Scatter(x=self.x, y=self.y)
        self.plot_points = go.Scatter(x=points["x"], y=points["y"], mode="markers")
        self.data = [self.trace, self.plot_points]
        self.plot = go.FigureWidget(self.data, self.layout)
        # self.plot = plotly.iplot(self.data, self.layout,config={"displayModeBar": False})

    def register_neuron(self, neuron):
        self.neuron = neuron

    def redraw(self):
        self.idle = False
        time.sleep(self.draw_time)
        self.plot.data[0].y = self.neuron.compute(self.x)
        self.idle = True

    def update(self):
        print("Loss: {:0.2f}".format(loss(self.neuron, self.points)))
        if self.idle:
            thread = threading.Thread(target=self.redraw)
            thread.start()

### Task: Train Neuron
You are given a set of 3 points and one neuron to do a curve fit. Run the cell below.

**Change the weight and bias of the neuron using the sliders to minimize the loss.**

Hint: you can also change the sliders with the arrow keys on your keyboard after clicking on the slider



In [5]:
# do not change
points_lr = dict(x=[1, 2, 3], y=[1.5, 0.7, 1.2])
ranges_lr = dict(x=[-4, 4], y=[-4, 4])

linreg_plot = Interactive2DPlot(points_lr, ranges_lr)
simple_neuron = SimpleNeuron(linreg_plot)

slider_layout = Layout(width="90%")


interact(
    simple_neuron.set_values, 
    weight=FloatSlider(min=-3, max=3, step=0.1, value = 0, layout=slider_layout),
    bias=FloatSlider(min=-3, max=3, step=0.1, value = 0, layout=slider_layout)
)

linreg_plot.plot


interactive(children=(FloatSlider(value=0.0, description='weight', layout=Layout(width='90%'), max=3.0, min=-3…

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '1f107528-bfe3-4884-be1b-2edc4be3f393',
 …

### Task: Questions

**Question:** What is the optimal weight and bias combination?

**Answer:** The optimal combination is weight = -0.2 and bias = 1.60, resulting in a loss of 0.10.

### Preparing a 3D-Plot
We can see that searching for the lowest loss is a **parameter optimization problem**. For now, the problem can be solved manually, but if we want to use neural networks to solve more complex problems, we have to find a way to automate this process.

The loss function is changed with both the specified weight and the specified bias. This relationship can be visualized three-dimensionally, which can give us further insight to construct an algorithm that solves the optimization problem. 
In this 3D-View, logarithmic scales are used to emphasize the topography. We will define a new function to compute the logarithmic loss for a set of points.

The plot will be defined as follows:
- the **Y axis** represents the bias
- the **X axis** represents the weights 
- the **Z axis** (height) represents the corresponding loss value at a given weight/bias configuration. For illustration purposes, the logarithm of the MSE Loss is displayed.
- the **black ball** represents the current weight/bias configuration. Its height represents the loss of that configuration

Run the cells below to define a 3D plot.

In [6]:
# do not change
def log_mse(neuron, points):
    least_squares_loss = loss(neuron, points)
    return np.log10(least_squares_loss)

In [7]:
# do not change
class Interactive3DPlot:
    def __init__(self, points, ranges, width=600, height=600, draw_time=0.1):
        self.idle = True
        self.points = points
        self.draw_time = draw_time
        self.threading = threading

        self.range_weights = np.arange(  # Array with all possible weight values in the given range
            ranges["x"][0], ranges["x"][1], 0.1
        )
        self.range_biases = np.arange(  # Array with all possible bias values in the given range
            ranges["y"][0], ranges["y"][1], 0.1
        )
        self.range_biases_t = self.range_biases[:, np.newaxis]  # Bias array transposed
        self.range_losses = []  # initialize z axis for 3D surface

        self.ball = go.Scatter3d(  # initialize ball
            x=[], y=[], z=[], hoverinfo="none", mode="markers", marker=dict(size=12, color="black")
        )

        self.layout = go.Layout(
            width=width,
            height=height,
            showlegend=False,
            autosize=False,
            margin=dict(t=0, l=0),
            scene=dict(
                xaxis=dict(title="weight", range=ranges["x"], autorange=False, showticklabels=True),
                yaxis=dict(title="bias", range=ranges["y"], autorange=False, showticklabels=True),
                zaxis=dict(title="log(MSE)", range=ranges["z"], autorange=True, showticklabels=False),
            ),
        )

        self.data = [
            go.Surface(
                z=self.range_losses,
                x=self.range_weights,
                y=self.range_biases,
                colorscale="Viridis",
                opacity=0.9,
                showscale=False,
                hoverinfo="none",
            ),
            self.ball,
        ]

        self.plot = go.FigureWidget(self.data, self.layout)

    def register_neuron(self, neuron):
        self.neuron = neuron
        self.calc_surface()

    def calc_surface(self):  # height of 3d surface represents loss of weight/bias combination
        self.neuron.weight = (  #instead of 1 weight and 1 bias, let Neuron have an array of all weights and biases
            self.range_weights
        )
        self.neuron.bias = self.range_biases_t
        self.range_losses = log_mse(  # result: matrix of losses of all weight/bias combinations in the given range
            self.neuron, self.points
        )
        self.plot.data[0].z = self.range_losses

    def update(self):
        if self.idle:
            thread = threading.Thread(target=self.redraw)
            thread.start()

    def redraw(self):  # when updating, only the ball is redrawn
        self.idle = False
        time.sleep(self.draw_time)
        self.ball.x = [self.neuron.weight]
        self.ball.y = [self.neuron.bias]
        self.ball.z = [log_mse(self.neuron, self.points)]
        self.plot.data[1].x = self.ball.x
        self.plot.data[1].y = self.ball.y
        self.plot.data[1].z = self.ball.z
        self.idle = True


In [8]:
# do not change
class DualPlot:
    def __init__(self, points, ranges_3d, ranges_2d):
        self.plot_3d = Interactive3DPlot(points, ranges_3d)
        self.plot_2d = Interactive2DPlot(points, ranges_2d, width=400, height=500, margin=dict(t=200, l=30))

    def register_neuron(self, neuron):
        self.plot_3d.register_neuron(neuron)
        self.plot_2d.register_neuron(neuron)

    def update(self):
        self.plot_3d.update()
        self.plot_2d.update()

### Task: Train Neuron
You are given the same set of 3 points and again one neuron to do a curve fit. Run the cell below.

- **Change the weight and bias of the neuron using the sliders to minimize the loss**
- **Observe all changes**

**Note**: you can turn the 3D-Plot by clicking on it and moving your cursor, but you have to stay inside the widget with your cursor.

In [9]:
# do not change
ranges_3d = dict(x=[-2.5, 2.5], y=[-2.5, 2.5], z=[-1, 2.5])  # set up ranges for the 3d plot
plot_task2 = DualPlot(points_lr, ranges_3d, ranges_lr)  # create a DualPlot object to mange plotting on two plots
neuron_task2 = SimpleNeuron(plot_task2)  # create a new neuron for this task

interact(
    neuron_task2.set_values,
    weight=FloatSlider(min=-2, max=2, step=0.2, layout=slider_layout),
    bias=FloatSlider(min=-2, max=2, step=0.2, layout=slider_layout),
)

HBox((plot_task2.plot_3d.plot, plot_task2.plot_2d.plot))

interactive(children=(FloatSlider(value=0.0, description='weight', layout=Layout(width='90%'), max=2.0, min=-2…

HBox(children=(FigureWidget({
    'data': [{'colorscale': 'Viridis',
              'hoverinfo': 'none',
      …

### Task: Questions

**Question:** What does the optimal weight and bias combination correspond to in the 3D Plot?_

**Answer:** Since the z axis represents the loss which we try to minimize, the optimal weight and bias combination corresponds to the lowest point in the direction of z in the plot.

**Question:** What is the steepness of the valley at the point of optimal weight and bias combination?

**Answer:** The optimal combination is the minimum in the plot, therefore the gradient is zero in that point.

***
## Activation Functions
We can only go so far in approximating functions with a neuron that only has weights and biases, since all we can do is linear regression. Activation functions expand our capabilities by introducing an additional nonlinearity into the neuron. With it, we can model more complex functions. The most commonly used activation function nowadays is **ReLU**. It just outputs the input value, as long as it's greater than 0. If it's lower than zero, it outputs 0. We can conveniently describe this function by taking the maximum of the input value and of 0. The greater value of both will be chosen as the output (5).

\begin{align}
f_{relu}(x) & = max(0,x)  \;\;\;\;\;\;\;\;\;\;\;    (5)
\end{align}




Run the cell below to define a ReLU function.

In [10]:
# do not change
def relu(input_val):
    return np.where(input_val > 0, input_val, 0.0)

We can draw a neuron with a ReLu activation function as follows:
<img src="images/single_neuron_relu.png" />
<p style="text-align: center;">
    Fig. 5 - Neuron with ReLU activiation function visualized
</p>


Let's create a new class to implement this neuron in Python. We will inherit all properties of a neuron from SimpleNeuron.
We only change the output by first feeding it through our ReLU function:

### Task: Implement Activation Function
Complete the code below by adding a relu function to the neuron, like in Figure 5.

In [11]:
class ReluNeuron(SimpleNeuron): #inherit from SimpleNeuron class
    
    def compute(self, inputs):
        # STUDENT CODE HERE
        
        self.activation = relu(self.weight * inputs + self.bias)

        # STUDENT CODE until HERE
        return self.activation

***
### Task: Nonlinear Climate Control

You find yourself as an engineer at the company "ClimaTronics". Your company wants to implement AI technology to regulate their new air conditioning system "Perfect Climate 9000". Even though the problem can be solved easily with conventional programming, the management department wants you to implement AI to attract investors. You have to fulfill the following requirements that are visualized in the datasheet excerpt:


`The climate control shall remain off for temperatures under 25°C. At a temperature of 30°C, it shall reach 10% of its cooling power. Between 30°C and 40°C, the cooling power shall rise quadratically with the temperature. Cooling power shall reach its maximum at 40°C.`
<img src="images/datasheet.png" />



Run the cell below for to display a interactive plot.

In [12]:
# do not change
points_climate = dict(x=[25.0, 27.5, 30.0, 32.5, 35, 37.5, 40.0], y=[0.0, 2.0, 10.0, 23.7, 43, 68.7, 100.0])

ranges_climate = dict(x=[-4, 45], y=[-4, 105])
climate_plot = Interactive2DPlot(points_climate, ranges_climate)
our_relu_neuron = ReluNeuron(climate_plot)

interact(
    our_relu_neuron.set_values,
    weight=FloatSlider(min=-10, max=10, step=0.1, value=0, layout=slider_layout),
    bias=FloatSlider(min=-300.0, max=200.0, step=1, value=0, layout=slider_layout),
)

climate_plot.plot

interactive(children=(FloatSlider(value=0.0, description='weight', layout=Layout(width='90%'), max=10.0, min=-…

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '95a07f7d-fba5-4100-927a-a45b9df060aa',
 …

### Task: Questions


**Question:** When setting the bias to 0.00, how does changing the weight affect the output function?

**Answer:** With a bias of 0.00, the weight controls the steepness of the output function. For weights higher than 0, the steepness increases for x>0, for negative weights the steepness changes for x<0.

**Question:** How does changing the bias affect the output function?

**Answer:** The bias controls the offset of the starting temperature.

**Question:** When setting the weight to 1.00 and the bias to -10, at what temperature does the climate control start?

**Answer:** It starts at 10°C.

**Question:** When setting the weight to 1.00 and the bias to -20, at what temperature does the climate control start?

**Answer:** It starts at 20°C.

**Question:** When setting the weight to 2.00 and the bias to -20, at what temperature does the climate control start?

**Answer:** It starts at 10°C.

**Question:** What's the best weight/bias configuration that you could find?

**Answer:** Within the given range the best combination is weight = 7.1, bias = -200 resulting in a loss of 50.71.

If you change the range of the bias you can get a better result of 18.24 with the combination weight = 9.0, bias = -266.

### Conclusion
Using just one neuron, we can easily understand and retrace the influence of weight and bias.
But our one-neuron-approximation is not enough to closely approximate the needed quadratic relationship.

***
##  Neural Networks

The approximation can be improved by using multiple neurons. Instead of just one neuron for our approximation, we construct a neural network. We will use two ReLU neurons and one output neuron that will have weights as well. Now we are free how we want to weigh the result of the two ReLU neurons in the middle.

### Hidden Layers
In this neural network, the two neurons in the middle now represent a **hidden layer**. It is called hidden, because the calculations no longer have any concrete representation.

In the last task, the weight and bias had an easily traceable influence on the output.
But by adding more neurons, the relationship between each weight and bias with the output becomes untraceable.
We obtain the weights and biases by simply adapting them until the result turns out to be correct. In this process, we quickly loose overview of what exactly we are calculating. It becomes very hard to untangle a neuron and describe its responsibility in the system. 

The input value is multiplied by the first weights and after adding biases and running it through the activation function the values are multiplied again by the second weights. Hidden layers can be stacked multiple times after one another. This gives room for multiple calculation steps, allowing more complex functions.

Neural networks using at least one hidden layer have an interesting property: They can be used to approximate any continuous function. _(See "Further Reading")_

<img src="images/hidden_layer.png" />



We will create a class for neural networks. The network will have four weights and two biases.

**Note:** For the sake of simplicity and code reusability, we will treat neural networks the same way we treat individual neurons in the past examples. Remember that an artificial neuron is only a mathematical function? A whole neural network can be also fully described by just one single function, as is done here when calculating the activation. The neurons don't have to take the concrete shape of individual data objects.

Run the cell below to define a neural network.

In [13]:
# do not change
class NeuralNetwork:
    def __init__(self, plot):
        self.plot = plot #I am assigned the following plot
        self.plot.register_neuron(self) #hey plot, remember me
        
    def set_config(self, w_i1, w_o1, b1, w_i2, w_o2, b2):
        self.w_i1 = w_i1
        self.w_o1 = w_o1
        self.b1 = b1
        self.w_i2 = w_i2
        self.w_o2 = w_o2
        self.b2 = b2
        self.show_config()
        self.plot.update()  # please redraw my output

    def show_config(self):
        print("w_i1:", self.w_i1, "\t| ", "w_o1:", self.w_o1,"\n")
        print("b1:", self.b1, "\t| ", "w_i2:", self.w_i2,"\n")
        print("w_o2:", self.w_o2, "\t| ", "b2:", self.b2,"\n")

    def compute(self, x):
        self.prediction = (relu(self.w_i1 * x + self.b1) * self.w_o1
                         + relu(self.w_i2 * x + self.b2) * self.w_o2)
        return self.prediction

***
###  Task: Nonlinear Climate Control with Neural Network

Run the cell below and adapt weights and bias to reach a better approximation of the desired curve than in the previous task

In [14]:
# do not change
climate_plot_adv = Interactive2DPlot(points_climate, ranges_climate)
our_neural_net = NeuralNetwork(climate_plot_adv)

interact(
    our_neural_net.set_config,
    w_i1=FloatSlider(min=-10, max=10, step=0.1, layout=slider_layout),
    w_o1=FloatSlider(min=-10, max=10, step=0.1,  layout=slider_layout),
    b1=FloatSlider(min=-200.0, max=200.0, step=1,  layout=slider_layout),
    w_i2=FloatSlider(min=-10, max=10, step=0.1, layout=slider_layout),
    w_o2=FloatSlider(min=-10, max=10, step=0.1,  layout=slider_layout),
    b2=FloatSlider(min=-200.0, max=200.0, step=1,layout=slider_layout),
)
climate_plot_adv.plot

interactive(children=(FloatSlider(value=0.0, description='w_i1', layout=Layout(width='90%'), max=10.0, min=-10…

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': 'd16f728c-83b6-4b1c-8be1-083a3959ec0b',
 …

### Task: Question

**Question:** What is the best configuration you could find? (Copy from above the plot)

**Answer:** 

w_i1: 1.1 	|  w_o1: 6.3 

b1: -37.0 	|  w_i2: 4.5 

w_o2: 1.0 	|  b2: -124.0 

Loss: 2.16

### Conclusion
We can conclude that the quadratic relationship can be better approximated by using additional weights and biases. Using two ReLU Neurons, we can create a function with two bends.
However, the complexity of finding the optimal weights/biases increases drastically with each variable. The more powerful our neural networks should be, the harder the optimization becomes.

***
##  Backpropagation

The solution to our optimization problems is called backpropagation. We can automate the process of adjusting weights and biases. In this example, we will turn back to the basics and use a simple neuron without an activation function. Backpropagation works by taking the partial derivatives of the loss function with respect to weight and bias. At each point, the bias and weight gradients points to the direction of higher loss. The magnitude of the gradient represents the amount of increase in loss for a given step length in that direction. 

Suppose we were to _maximize_ loss in Fig 6.: All we need to do is to follow the partial derivatives by adding them to our current weight/bias point. That means, decrease weight a lot (see the axes in Fig. 6), and decrease bias by some lesser amount, since it has less magnitude.

However, because we want to go down, we will _subtract_ the gradient from out current point. This will move us closer to the minimum. In the next step, we are further down and the valley, but not close enough. So we just repeat the steps until we reach the minimum.

Every step we take is called one **epoch**. (In this case _training steps_ and _epochs_ are equivalent). Because it is hard to determine whether the minimum is reached, we will specify the number of epochs before our descent and simply let the program run.

If the magnitude of the gradients is too big, we will never reach a minimum. This is because our algorithm wants to move the ball too much at each step. It will oscillate around the minimum, but never arrive at it. In extreme cases, the movement even can oscillate up to infinty. To give us control over the amount of movement, the gradient is multiplied by a factor called **learning rate**. By setting it to an optimal value, we can prevent oscillations. However, if the learning rate is too small, the network will take forever to "learn", since the weights and biases are changing only very slowly.

Number of epochs and learning rate are so called **hyperparameters**. They influence the training process but are not part of the network itself.



<img src="images/backprop.png" />
<p style="text-align: center;">
    Fig. 6 - Partial derivatives of Loss function
</p>


### Preparing Backpropagation Plot
We will create a new 3D-Plot that tracks our past weight/bias/loss values as we try to optimize the loss step by step. The black ball will leave a trace of its past values. Run the cell below to enable plotting the backpropagation steps.

In [15]:
# do not change
plot_backprop = DualPlot(points_lr, ranges_3d, ranges_lr)
trace_to_plot = go.Scatter3d(x=[], y=[], z=[], hoverinfo="none", mode="lines", line=dict(width=10, color="grey"))

plot_backprop.plot_3d.data.append(trace_to_plot)  # Expand 3D Plot to also plot traces
plot_backprop.plot_3d.plot = go.FigureWidget(plot_backprop.plot_3d.data, plot_backprop.plot_3d.layout)
plot_backprop.plot_3d.draw_time = 0


def redraw_with_traces(plot_to_update, neuron, trace_list, points):  # executed every update step
    plot_to_update.plot_3d.plot.data[2].x = trace_list["x"]
    plot_to_update.plot_3d.plot.data[2].y = trace_list["y"]
    plot_to_update.plot_3d.plot.data[2].z = trace_list["z"]
    plot_to_update.plot_3d.plot.data[1].x = [neuron.weight]
    plot_to_update.plot_3d.plot.data[1].y = [neuron.bias]
    plot_to_update.plot_3d.plot.data[1].z = [log_mse(neuron, points)]
    plot_to_update.update()


def add_traces(neuron, points, trace_list):  # executed every epoch
    trace_list["x"].extend([neuron.weight])
    trace_list["y"].extend([neuron.bias])
    trace_list["z"].extend([log_mse(neuron, points)])

***
## DIY Backpropagation

To do backpropagation, first you have to determine the partial derivatives of the loss function of the "simple neuron" with respect to weight and bias. After that, you have to figure out how to properly adjust the weights and biases to the gradient scaled to the learning rate.
Down below at the end of the document you can verify your results by training. If you hit the benchmark, your algorithm is correct.


### Task: Determine the Gradient


**Finish the function below by yourself.**

There are multiple solutions to this, your algorithm may adjust the weight and bias in the right direction despite the gradient calculation being wrong.

**Benchmark:** If you can reach a loss of 0.22 after 100 epochs and a learning rate of 0.01, your solution is correct

In [20]:
def simple_neuron_loss_gradient(neuron, points):

    gradient_sum = dict(weight=0, bias=0)
    for point_x, point_y in zip(points["x"], points["y"]):  # gradient is calculated for each points and summed up
        gradient_sum["weight"] += (
            # hint: point_x and point_y are the current point values
            # STUDENT CODE HERE

            2 * (neuron.get_weight() * (point_x) + neuron.get_bias() - point_y) * point_x          
            
            # STUDENT CODE until HERE
        )

        gradient_sum["bias"] += (
            # STUDENT CODE HERE     
            
            2 * (neuron.get_weight() * (point_x) + neuron.get_bias() - point_y)

            # STUDENT CODE until HERE
        )

    gradient = dict(weight=gradient_sum["weight"] / len(points["x"]), bias=gradient_sum["bias"] / len(points["x"]))
    return gradient

### Task: Adjust the Neuron
After finding the gradient you have to adjust the weight and bias of the neuron, based on the partial derivatives and the learning rate. You have to verify your results by training the net down below.
**Finish the function below by yourself.**

In [21]:
def adjust_neuron(neuron, gradient):
    # STUDENT CODE HERE
    
    neuron.weight -= learning_rate * gradient["weight"]
    neuron.bias -= learning_rate * gradient["bias"]

    # STUDENT CODE until HERE

### Defining training process

In [22]:
# do not change
def train(neuron, points, trace_list):
    redraw_with_traces(neuron.plot, neuron, trace_list, points)
    for i in range(1, epochs + 1):  # first Epoch is Epoch no.1
        add_traces(neuron, points, trace_list)
        gradient = simple_neuron_loss_gradient(neuron, points)
        adjust_neuron(neuron, gradient)

        if i % redraw_step == 0:
            print("Epoch:{} \t".format(i), end="")
            redraw_with_traces(neuron.plot, neuron_backprop, trace_list, points)

### Task: Choose Hyperparameters and Train
**Choose an optimal learning rate and number of epochs by trying out values and running the two cells below**


In [27]:
# STUDENT CODE HERE

learning_rate = 0.16
epochs = 100

# STUDENT CODE until HERE

redraw_step = 10 # update plot every n'th epoch. too slow? set this to a higher value (e.g. 100)

neuron_backprop = SimpleNeuron(plot_backprop)
HBox((plot_backprop.plot_3d.plot, plot_backprop.plot_2d.plot))

HBox(children=(FigureWidget({
    'data': [{'colorscale': 'Viridis',
              'hoverinfo': 'none',
      …

In [28]:
#run this cell to test algorithm

np.random.seed(4) # keep this for benchmarking, remove to play around

neuron_backprop.set_values(  # set weight and bias randomly
    (5 * np.random.random() - 2.5), (5 * np.random.random() - 2.5)
)
trace_list1 = dict(x=[], y=[], z=[])
train(neuron_backprop, points_lr, trace_list1)

Loss: 18.45
Loss: 18.45
Epoch:10 	Loss: 0.44
Epoch:20 	Loss: 0.20
Epoch:30 	Loss: 0.14
Epoch:40 	Loss: 0.12
Epoch:50 	Loss: 0.10
Epoch:60 	Loss: 0.10
Epoch:70 	Loss: 0.10
Epoch:80 	Loss: 0.09
Epoch:90 	Loss: 0.09
Epoch:100 	Loss: 0.09


**Benchmark:** If you can reach a loss of 0.22 after 100 epochs and a learning rate of 0.03, your solution is correct

### Task: Questions

**only answer this after your algorithm has hit the benchmark**

**Question:** What happens when you set the learing rate to 0.18? Explain this behavior.

**Answer:** The ball bounces between the valley but does't go down because the learning rate is to high.

**Question:** What happens when you set the learing rate to 0.182? Explain this behavior.

**Answer:** With an even higher learning rate, the amplitude of the oscillation increases and the ball goes up instead of down.

**Question:** What is the fastest learning rate you could find? (Using the other benchmark criteria)

**Answer:** With a learning rate of 0.16 we can reach a loss of 0.1 after 50 epochs and a loss of 0.09 after 80 epochs.

***
### Further Reading: Neural Networks are Universal Function Approximators

It can be mathematically proven that neural Networks can approximate any continuous function, as long as they have at least one hidden layer, use nonlinear activation functions, and use a sufficient (but finite) amount of hidden layer neurons. 

https://www.sciencedirect.com/science/article/pii/089360809190009T?via%3Dihub
Kurt Hornik,
Approximation capabilities of multilayer feedforward networks,
Neural Networks,
Volume 4, Issue 2,
1991,
Pages 251-257