# Welcome to Machine Learning -- An Interdisciplinary Introduction

## Linear Regression


The task in linear regression is to fit a line:
$$y = a\cdot x + b$$
through a list of $N$ data points:
$$X = \{(x_n, t_n) \mid 0\leq n < N\}$$
by adapting the parameters $a$ and $b$ such that they minimize a loss function: 
$$\argmin\limits_{a,b} \frac1N\sum\limits_{n=0}^{N-1} (y_n - t_n)^2 = \frac1N\sum\limits_{n=0}^{N-1} (a\cdot x_n + b - t_n)^2$$

Analytically, this can be achieved by deriving the above equation to $a$ and to $b$ and setting the gradient to 0.
Afterward, we can calculate $a$ and $b$ as follows:
$$a = \frac{\sum\limits_{n=0}^{N-1}(x_n - \overline x) (t_n - \overline t)}{\sum\limits_{n=0}^{N-1} (x_n - \overline x)^2} \qquad b = \overline t - a\overline x$$
where $\overline x$ and $\overline t$ are the simple arithmetic means of $x_n$ and $t_n$, respectively:
$$\overline x = \frac1N \sum\limits_{n=0}^{N-1} x_n \qquad \overline t = \frac1N \sum\limits_{n=0}^{N-1} t_n$$


## Task 1: Regression Function

Implement a function that takes a list of samples $X$ and computes the regression line coordinates.


In [None]:
import torch

def regression(X, T):
  """Performs linear regression for the given input and target values.
  For the optimal line y=a*x+b, it returns the parameters a and b"""
  # compute means of the inputs and the targets
  x_bar = ...
  t_bar = ...

  # compute variables a and b according to the above equations
  a = ...
  b = ...

  # return the two variables
  return a, b


## Task 2: Linear Data

Generate some noisy linear data:
$$t_n = a^* \cdot x_n + b^* + \xi$$
where the noise is uniformly distributed $\xi\in[-.4,.4]$, and $a$ and $b$ can be chosen arbitrarily.

In total, select $N=50$ samples with $x_n$ uniformly distributed in range $[-5,5]$.
Compute the according noisy targets $t_n$.
You can choose $a^*=0.3$ and $b^*=20$, or parameters of your own choice.

In [None]:
def line(x, a, b):
  """Returns the output a * x + b for the given parameters"""
  return ...


def noisy_data(x, a_star, b_star, noise = .4):
  """Returns the noisy target data by 
  - first computing the line according to parameters a_star and b_star, 
  - and second adding uniformly distributed noise"""
  return line(x,a_star,b_star) + ...

# sample uniformly values for X
X = ...
# generate the noisy target data for these input samples
T = noisy_data(X, .3, 20)

## Task 3: Obtain Line Parameters

Compute the regression line for our data and print the resulting values for $a$ and $b$.
How much do these values deviate from the values $a^*$ and $b^*$ selected to compute the noisy data above?

In [None]:
a, b = ...

print (f"The optimal line is a={a:2.5f}, and b={b:2.5f}")

## Task 4: Plot Data and Lines

Obtain the values of the line according to the optimized parameters.
Plot the line and the data points together in one plot.

In [None]:
from matplotlib import pyplot

# obtain the points of the line according to the estimated parameters a and b
Y = ...

# plot the optimized line
...

# plot the data points
...

## Task 4: Non-linear Data

Create target values that do not follow a line, for example:

$$t_n = \sin(x_n)$$

Compute the line parameters and plot the data and the estimated line into one plot.

In [None]:
# define new non-linear target values
T = ...

# perform linear regression and ontain the line parameters
a, b = ...

# obtain the line of the obtained parameters
Y = ...

# plot the line
...

# plot the points
...