# Exercises

There are three exercises in this notebook:

1. Use the cross-validation method to test the linear regression with different $\alpha$ values, at least three.
2. Implement a SGD method that will train the Lasso regression for 10 epochs.
3. Extend the Fisher's classifier to work with two features. Use the class as the $y$.

## 1. Cross-validation linear regression

You need to change the variable ``alpha`` to be a list of alphas. Next do a loop and finally compare the results.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge

In [2]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15, 1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15, 1)

x = np.asmatrix(np.c_[np.ones((15, 1)), x])

I = np.identity(2)
alphas_list = [-1, -0.5, -0.1, 0, 0.1, 0.5, 1]

results = []
for alpha in alphas_list:
    w = np.linalg.inv(x.T * x + alpha * I) * x.T * y
    w = w.ravel()

    ridge = Ridge(alpha=alpha, fit_intercept=False).fit(X=x, y=y)
    results.append([alpha, w.item(0), w.item(1), ridge.coef_.item(0), ridge.coef_.item(1)])

df = pd.DataFrame(results, columns=["alpha", "w0 scratch", "w1 scratch", "w0 sklearn", "w1 sklearn"])
df

Unnamed: 0,alpha,w0 scratch,w1 scratch,w0 sklearn,w1 sklearn
0,-1.0,26.667097,0.442962,26.667097,0.442962
1,-0.5,62.54823,0.239837,62.54823,0.239837
2,-0.1,-817.017374,5.219094,-817.017374,5.219094
3,0.0,-180.924018,1.618142,-180.924018,1.618142
4,0.1,-101.723971,1.169788,-101.723971,1.169788
5,0.5,-36.97522,0.803242,-36.97522,0.803242
6,1.0,-20.590447,0.710486,-20.590447,0.710486


## 2. Implement based on the Ridge regression example, the Lasso regression.

Please implement the SGD method and compare the results with the sklearn Lasso regression results. 

In [3]:
def sgd(x, y, alpha, epoch):
    rows, cols = x.shape
    weights = np.ones(cols)
    bias = 1

    for i in range(epoch):
        prediction = x * weights + bias
        delta = y - prediction

        gradient_weights = -2 * sum(x * delta) / (np.linalg.norm(x) ** 2)
        gradient_bias = -(2 / y.size) * sum(delta)

        weights = weights - alpha * gradient_weights
        bias = bias - alpha * gradient_bias

    return bias, weights

In [4]:
from sklearn.linear_model import Lasso

x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15, 1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15, 1)

alpha = 0.1

lasso_regression = Lasso(alpha=alpha)
lasso_regression.fit(X=x, y=y)
print("Sklearn intercept: [" + str(lasso_regression.intercept_[0]) + "] coef: [" + str(lasso_regression.coef_[0]) + "]")

sgd_result = sgd(x, y, alpha, 10)
print("SGD 10 epoch intercept: [" + str(sgd_result[0][0]) + "] coef: [" + str(sgd_result[1][0]) + "]")

sgd_result = sgd(x, y, alpha, 1000)
print("SGD 1000 epoch intercept: [" + str(sgd_result[0][0]) + "] coef: [" + str(sgd_result[1][0]) + "]")

sgd_result = sgd(x, y, alpha, 5000)
print("SGD 5000 epoch intercept: [" + str(sgd_result[0][0]) + "] coef: [" + str(sgd_result[1][0]) + "]")

sgd_result = sgd(x, y, alpha, 10000)
print("SGD 10000 epoch intercept: [" + str(sgd_result[0][0]) + "] coef: [" + str(sgd_result[1][0]) + "]")

Sklearn intercept: [-180.85790859980537] coef: [1.6177649901016677]
SGD 10 epoch intercept: [-36.617890183861505] coef: [0.8002520152421093]
SGD 1000 epoch intercept: [-119.3377951917765] coef: [1.2679979955681466]
SGD 5000 epoch intercept: [-178.93760641987433] coef: [1.606848856303508]
SGD 10000 epoch intercept: [-180.89686575615042] coef: [1.617988098170569]


## 3. Extend the Fisher's classifier

Please extend the targets of the ``iris_data`` variable and use it as the $y$.

In [5]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)
iris_df['target'] = iris_data.target

x = iris_df[['sepal width (cm)', 'sepal length (cm)']].values
y = iris_df['target'].values.reshape(-1, 1)

dataset_size = np.size(x)

mean_x, mean_y = np.mean(x), np.mean(y)

SS_xy = np.sum(y * x) - dataset_size * mean_y * mean_x
SS_xx = np.sum(x * x) - dataset_size * mean_x * mean_x

a = SS_xy / SS_xx
b = mean_y - a * mean_x

y_pred = a * x + b

df = pd.DataFrame(y_pred, columns=["", ""])
df

Unnamed: 0,Unnamed: 1,Unnamed: 2
0,0.924785,1.051418
1,0.885212,1.035589
2,0.901042,1.019760
3,0.893127,1.011845
4,0.932700,1.043504
...,...,...
145,0.885212,1.178051
146,0.845640,1.146393
147,0.885212,1.162222
148,0.916871,1.138479
