# Welcome to Machine Learning -- An Interdisciplinary Introduction

## Linear Regression


The task in linear regression is to fit a line:
$$y = w_1\cdot x + w_0$$
through a list of $N$ data points:
$$X = \{x_n \mid 0\leq n < N\}$$
with according target values:
$$T = \{t_n \mid 0\leq n < N\}$$
by adapting the parameters $w_0$ and $w_1$ such that they minimize a loss function: 
$$\arg\min\limits_{w_0,w_1} \mathcal J = \frac1N\sum\limits_{n=0}^{N-1} (y_n - t_n)^2 = \frac1N\sum\limits_{n=0}^{N-1} (w_1\cdot x_n + w_0 - t_n)^2$$

Analytically, this can be achieved by deriving the above equation to $w_0$ and to $w_1$ and setting the gradient to 0.
Afterward, we can calculate $w_0$ and $w_1$ as follows:
$$w_1 = \frac{\sum\limits_{n=0}^{N-1}(x_n - \overline x) (t_n - \overline t)}{\sum\limits_{n=0}^{N-1} (x_n - \overline x)^2} \qquad w_0 = \overline t - w_1\cdot\overline x$$
where $\overline x$ and $\overline t$ are the simple arithmetic means of $x_n$ and $t_n$, respectively:
$$\overline x = \frac1N \sum\limits_{n=0}^{N-1} x_n \qquad \overline t = \frac1N \sum\limits_{n=0}^{N-1} t_n$$


## Task 1: Regression Function

Implement a function that takes a list of samples $X$ and their targets $T$ and computes the regression line coordinates.


In [None]:
import torch

def regression(X, T):
  """Performs linear regression for the given input and target values.
  For the optimal line y=w_1*x+w_0, it returns the parameters w_0 and w_1"""
  # compute means of the inputs and the targets
  x_bar = ...
  t_bar = ...

  # compute variables w_0 and w_1 according to the above equations
  w_1 = ...
  w_0 = ...

  # return the two variables
  return w_0, w_1


## Task 2: Linear Data

Generate some noisy linear data:
$$t_n = w_1^* \cdot x_n + w_0^* + \xi$$
where the noise is uniformly distributed $\xi\in[-0.4,0.4]$, and $w_0^*$ and $w_1^*$ can be chosen arbitrarily.

In total, select $N=50$ samples with $x_n$ uniformly distributed in range $[-5,5]$.
Compute the according noisy targets $t_n$.
You can choose $w_0^*=20$ and $w_1^*=0.3$, or parameters of your own choice.

In [None]:
def line(x, w_0, w_1):
  """Returns the output w_1 * x + w_0 for the given parameters"""
  return ...


def noisy_data(x, w_0_star, w_1_star, noise = .4):
  """Returns the noisy target data by 
  - first computing the line according to parameters w_0_star and w_1_star, 
  - and second adding uniformly distributed noise"""
  return line(x, w_0_star, w_1_star) + ...

# sample uniformly values for X
X = ...
# generate the noisy target data for these input samples
T = noisy_data(X, 20, .3)

## Task 3: Obtain Line Parameters

Compute the regression line for our data and print the resulting values for $w_0$ and $w_1$.
How much do these values deviate from the values $w_0^*$ and $w_1^*$ selected to compute the noisy data above?

In [None]:
w_0, w_1 = ...

print (f"The optimal line is w_0={w_0:2.5f}, and w_1={w_1:2.5f}")

## Task 4: Plot Data and Lines

Obtain the values of the line according to the optimized parameters.
Plot the line and the data points together in one plot.

In [None]:
from matplotlib import pyplot

# obtain the points of the line according to the estimated parameters a and b
Y = ...

# plot the optimized line
...

# plot the data points
...

## Task 5: Non-linear Data

Create target values that do not follow a line, for example:

$$t_n = \sin(x_n)$$

Compute the line parameters and plot the data and the estimated line into one plot.

In [None]:
# define new non-linear target values
T = ...

# perform linear regression and obtain the line parameters
w_0, w_1 = ...

# compute the line of the obtained parameters
Y = ...

# plot the line
...

# plot the points
...