# Linear Regression

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt 

## Numerical Solution - Gradient Descent


### Practical Part
- Implement gradient descent for some simple functions.
- See the functions in the notebook.
- Vary the learning rate. 
- Consider the simultaneous optimization problem. What is the implication of your findings?

### Understanding the learning rate
Consider the following functions:
- $f_1(x) = x^2$
- $f_2(x) = x^3$
- $f_3(x) = \sin (x) + 0.01\cdot (x- 1.5\pi )^2$

Find their absolute minima by gradient descent. Choose different starting values $x_0$ and vary the learning rate from very small to very large values. 
Plot the function and plot each step of gradient descent i.e. the series $x_0$, $x_1$, ... until the minimum $x_\text{min}$. 



In [None]:
import math 

def f1(x):
    return x**2

def f1_prime(x):
    return 2*x

def f2(x):
    return x**3

def f2_prime(x):
    return 3*x**2

def f3(x):
    return np.sin(x) + 0.01*(x- 1.5*math.pi)**2

def f3_prime(x):
    return np.cos(x) + 2.*0.01*(x-1.5*math.pi)


# TODO
def plot_gd(f, f_prime, x0, lr, epochs):
    """This function realizes gradient descent algorithm on the function f. It creates a plot of the function 
    inccuding all gradient descent steps. 
    Inputs:
     - f: function
     - f_prime: first derivate function of f
     - x0: initial value to start gradient descent from
     - lr: learning rate
     - epochs: number of iterations
     
    """
    # left and right limits of the plot
    lo = min(x0, -x0) 
    hi = max(x0, -x0)
    # x-values of the plot, step size 0.01 * plotting range 
    xvals = np.arange(lo, hi, 0.01*(hi-lo))
    plt.plot(xvals, f(xvals))
    x = x0
    for i in range (epochs):
        plt.plot(x, f(x), 'r*')

        # TODO insert the gradient descent step


    print ()
    print ("Final value =", x)

    


In [None]:
lr = # TODO - try different values (remember, search a suitable learning rate over orders of magnitude!)
EPOCHS = 100
x = -3 # initial value - change, if needed

plot_gd(f1, f1_prime, x, lr, EPOCHS)

In [None]:
lr = # TODO
EPOCHS = 100
x = -3 # initial value

plot_gd(f1, f1_prime, x, lr, EPOCHS)

In [None]:
lr = # TODO
EPOCHS = 100
x = 3 # initial value

plot_gd(f2, f2_prime, x, lr, EPOCHS)

In [None]:
lr = # TODO
EPOCHS = 1000
x = # TODO try various initial values

plot_gd(f3, f3_prime, x, lr, EPOCHS)

### Implement GD for linear regression
Implement gradient descent for the following toy problem:
You have two features $x_1$ and $x_2$. Your target $y=0.5x_1 + 0.3x_2 + \epsilon$ with $\epsilon\sim \mathcal{N}(0; \sigma^2)$.

Solve the linear regression with __gradient descent__.


Find a good learning rate $\alpha$.

### Simultaneous optimization

Optimize $f: \mathbb{R}^2\to\mathbb{R}^2$, $f(x_1, x_2) = (x_1^2, (a\cdot x_2)^2)$.  
Look at different values for $a$, e.g. $a=2$, $a=5$, $a=10$.

## Linear regression examples



In [None]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.model_selection import train_test_split


In [None]:
data = load_diabetes(return_X_y=False, as_frame=True, scaled=False)

In [None]:
data.data

In [None]:
X = data.data.values
y = data.target.values

### Practical Part 1 - Implement Linear Regression 
- Split the data randomly into train and test sets (75\%/25\% is fine).
- Predict the target variable with linear regression.
- Use the scikit-learn implementation.
- Measure the errors of your prediction and try to figure out where the errors occur. 
- Ablation study: Study the effect of data normalization on the quality of your model.

### Practical Part 2 - Regularization
- Until now, only the features themselves have been considered. 
- Add non-linear terms representing the interactions between the features.
- For second order, these terms are $x_ix_j$ for features $x_i$ and $x_j$, third order $x_ix_jx_k$, and so on.
- Limit yourself to third order and interaction terms only ($i,j,k$ mutually different)
- Re-do linear regression with these additional features.
- What do you observe?


### Practical Part 3 - Ridge Regression and Lasso
- Use Ridge and Lasso for the extended diabetes dataset (including the interaction terms)
- Look into the scikit-learn documentation to learn about the parameter ''alpha''. 
- For which values of alpha do you get the best model?
- When applying Lasso, figure out which features have a non-zero coefficient. 