# Exercises

There are three exercises in this notebook:

1. Use the cross-validation method to test the linear regression with different $\alpha$ values, at least three.
2. Implement a SGD method that will train the Lasso regression for 10 epochs.
3. Extend the Fisher's classifier to work with two features. Use the class as the $y$.

## 1. Cross-validation linear regression

You need to change the variable ``alpha`` to be a list of alphas. Next do a loop and finally compare the results.

In [108]:
import numpy as np
import pandas as pd

x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((15,1)),x])

I = np.identity(2)
alpha = [1, 10, 100]
results = []

for i in range(len(alpha)):
    w = np.linalg.inv(x.T * x + alpha[i] * I) * x.T * y
    w = w.ravel()
    results.append(w)

results = np.asarray(results).flatten().reshape(3,2)
table = pd.DataFrame(data=results, index=["alpha = 1", "alpha = 10", "alpha = 100"], columns=["w1", "w2"])
table

Unnamed: 0,w1,w2
alpha = 1,-20.590447,0.710486
alpha = 10,-2.291063,0.606881
alpha = 100,-0.22873,0.595091


## 2. Implement based on the Ridge regression example, the Lasso regression.

Please implement the SGD method and compare the results with the sklearn Lasso regression results. 

In [109]:
def sgd(coeffs, x, y, epochs, rate, l1):
    norm = np.linalg.norm(x, axis = 0)
    w, b, m, n = coeffs[0], coeffs[1], y.shape[0], x.shape[1]
    for i in range(epochs):
        x_in = x[:,1].reshape(-1, 1)
        y_pred = x_in * w + b
        if w > 0:
            dW = (- (2 * x_in.T.dot(y - y_pred)) + l1 ) / norm[1] ** 2
        else:
            dW = (- (2 * x_in.T.dot(y - y_pred)) - l1 ) / norm[1] ** 2
        db = - 2 * np.sum(y - y_pred) // norm[0] ** 2
        w = w - rate * dW
        b = b - rate * db
    coeffs[0] = b
    coeffs[1] = w
    return coeffs

In [110]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((15,1)),x])

I = np.identity(2)
alpha = 0.1
init_c = np.zeros((2,1))
results = []

w2 = sgd(init_c, x, y, 1600, 0.1, alpha)
w2 = w2.ravel()
results.append(w2)

w1 = np.linalg.inv(x.T * x + alpha * I) * x.T * y
w1 = w1.ravel()
w1 = np.squeeze(np.asarray(w1))
results.append(w1)

results = np.asarray(results).flatten().reshape(2,2)
table = pd.DataFrame(data=results, index=["sklearn lasso", "sgd"], columns=["w1", "w2"])
table

Unnamed: 0,w1,w2
sklearn lasso,-101.9,1.169511
sgd,-101.723971,1.169788


## 3. Extend the Fisher's classifier

Please extend the targets of the ``iris_data`` variable and use it as the $y$.

In [111]:
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
iris_target_df = pd.DataFrame(iris_data.target)

x = iris_df[['sepal width (cm)', 'sepal length (cm)']].values
y = iris_target_df.values

dataset_size = np.size(y)
mean_x, mean_y = np.mean(x), np.mean(y)

SS_xy = np.sum(y * x) - dataset_size * mean_y * mean_x
SS_xx = np.sum(x * x) - dataset_size * mean_x * mean_x

a = SS_xy / SS_xx
b = mean_y - a * mean_x

y_pred = a * x + b