# Exercises

There are three exercises in this notebook:

1. Use the cross-validation method to test the linear regression with different $\alpha$ values, at least three.
2. Implement a SGD method that will train the Lasso regression for 10 epochs.
3. Extend the Fisher's classifier to work with two features. Use the class as the $y$.

## 1. Cross-validation linear regression

You need to change the variable ``alpha`` to be a list of alphas. Next do a loop and finally compare the results.

In [2]:
import numpy as np
x1 = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((15,1)),x1])

I = np.identity(2)
alpha = [0.1, 0.001, 0.01] # change here

# add 1-3 line of code here
wa = [np.array(np.linalg.inv(x.T*x + a * I)*x.T*y).ravel() for a in alpha]

# add 1-3 lines to compare the results
from sklearn.metrics import mean_squared_error
print(alpha[np.argmin([mean_squared_error(y, list([xi*w[1]+w[0] for xi in x1])) for w in wa])])



0.001


## 2. Implement based on the Ridge regression example, the Lasso regression.

Please implement the SGD method and compare the results with the sklearn Lasso regression results. 

In [431]:
def sgd(xi, yi, wi, alpha, lr=0.001):
    haty = xi.dot(wi[0]) + wi[1]
    intermidiate = -2 * np.matmul((yi - haty).T, xi)
    if wi[0] > 0:
        wi[0] -= lr*(intermidiate + alpha)
    else:
        wi[0] -= lr*(intermidiate - alpha)
    wi[1] -= lr*intermidiate
    return wi

In [432]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1)

w = np.zeros(2)
alpha = 0.1

for k in range(10):
    w = sgd(x, y, w, alpha)
print(w)

[-2.93142408e+29 -2.93142408e+29]


## 3. Extend the Fisher's classifier

Please extend the targets of the ``iris_data`` variable and use it as the $y$.

In [440]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
#iris_df.head()

x = iris_df['sepal width (cm)'].values # change here
y = iris_df['sepal length (cm)'].values # change here
y1 = iris_df['petal length (cm)'].values

dataset_size = np.size(x)

mean_x, mean_y, mean_y1 = np.mean(x), np.mean(y), np.mean(y1)

SS_xy = (np.sum(y * x) - dataset_size * mean_y * mean_x)  + (np.sum(y1 * x) - dataset_size * mean_y1 * mean_x)
SS_xx = np.sum(x * x) - dataset_size * mean_x * mean_x

a = SS_xy / SS_xx
b = mean_y + mean_y1 - a * mean_x


y_pred = a * x + b
print(y_pred)

[ 8.73433411  9.7136254   9.32190888  9.51776714  8.53847585  7.95090107
  8.93019237  8.93019237  9.90948366  9.51776714  8.34261759  8.93019237
  9.7136254   9.7136254   7.75504282  6.97160978  7.95090107  8.73433411
  8.14675933  8.14675933  8.93019237  8.34261759  8.53847585  9.12605063
  8.93019237  9.7136254   8.93019237  8.73433411  8.93019237  9.32190888
  9.51776714  8.93019237  7.55918456  7.3633263   9.51776714  9.32190888
  8.73433411  8.53847585  9.7136254   8.93019237  8.73433411 11.08463321
  9.32190888  8.73433411  8.14675933  9.7136254   8.14675933  9.32190888
  8.34261759  9.12605063  9.32190888  9.32190888  9.51776714 11.08463321
 10.10534192 10.10534192  9.12605063 10.88877495  9.90948366 10.30120018
 11.67220799  9.7136254  11.28049147  9.90948366  9.90948366  9.51776714
  9.7136254  10.30120018 11.28049147 10.69291669  9.32190888 10.10534192
 10.69291669 10.10534192  9.90948366  9.7136254  10.10534192  9.7136254
  9.90948366 10.49705844 10.88877495 10.88877495 10.