# Exercises

There are three exercises in this notebook:

1. Use the cross-validation method to test the linear regression with different $\alpha$ values, at least three.
2. Implement a SGD method that will train the Lasso regression for 10 epochs.
3. Extend the Fisher's classifier to work with two features. Use the class as the $y$.

## 1. Cross-validation linear regression

You need to change the variable ``alpha`` to be a list of alphas. Next do a loop and finally compare the results.

In [175]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((15,1)),x])

I = np.identity(2)

alpha_dict = {}
for val in range(-10, 11):
    alpha_dict[round(val * 0.1, 1)] = []
                
for key, value in alpha_dict.items():
    w = np.linalg.inv(x.T*x + key * I)*x.T*y
    w=w.ravel()
    alpha_dict[key].append(w)
    
prepared_results = np.asarray(list(alpha_dict.values()))
prepared_results = prepared_results.flatten()
prepared_results = prepared_results.reshape(21, 2)
final_result = pd.DataFrame(data=prepared_results, index=alpha_dict.keys(), columns=["w1", "w2"])
final_result


Unnamed: 0,w1,w2
-1.0,26.667097,0.442962
-0.9,30.122934,0.423398
-0.8,34.607973,0.398008
-0.7,40.662425,0.363733
-0.6,49.284735,0.314922
-0.5,62.54823,0.239837
-0.4,85.580223,0.109451
-0.3,135.462841,-0.172936
-0.2,324.767138,-1.244596
-0.1,-817.017374,5.219094


## 2. Implement based on the Ridge regression example, the Lasso regression.

Please implement the SGD method and compare the results with the sklearn Lasso regression results. 

In [176]:
MAX_ITER = 700

def sgd(x, y, alpha):
    normalized_data = np.linalg.norm(x, axis=0)
    iterations = 0
    w = 1
    b = 1
    
    while iterations < MAX_ITER:
        x_res = x[:, 1].reshape(-1, 1)
        y_pre = x_res * w + b
        delta = y - y_pre
        
        if w > 0:
            dW = (-x_res.T.dot(delta) * 2 + alpha ) / (normalized_data[1] * normalized_data[1])
        else:
            dW = (-x_res.T.dot(delta) * 2 - alpha ) / (normalized_data[1] * normalized_data[1])
        
        db= (-np.sum(y - y_pre) * 2) / (normalized_data[0] * normalized_data[0])

        w = w - alpha * dW
        b = b - alpha * db
        
        iterations += 1
    
    return np.array([b, w.item(0)])

In [177]:
import itertools

x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((15,1)),x])

I = np.identity(2)
alpha = 0.1 

sgd_result = sgd(x, y, alpha)
sgd_result = sgd_result.ravel()

lasso_result = np.linalg.inv(x.T*x + alpha * I)*x.T*y
lasso_result = lasso_result.ravel()

prepared_results = np.asarray([sgd_result[0], sgd_result[1], lasso_result.item(0), lasso_result.item(1)])
prepared_results = prepared_results.flatten()
prepared_results = prepared_results.reshape(2, 2)

final_result = pd.DataFrame(data=prepared_results, index=['sgd', 'lasso'], columns=["b", "w"])
final_result

Unnamed: 0,b,w
sgd,-101.244801,1.165131
lasso,-101.723971,1.169788


## 3. Extend the Fisher's classifier

Please extend the targets of the ``iris_data`` variable and use it as the $y$.

In [178]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
iris_df_target = pd.DataFrame(iris_data.target)

x = iris_df[['sepal width (cm)', 'sepal length (cm)']].values
y = iris_df_target.values

dataset_size = np.size(y)

mean_x, mean_y = np.mean(x), np.mean(y)

SS_xy = np.sum(y * x) - dataset_size * mean_y * mean_x
SS_xx = np.sum(x * x) - dataset_size * mean_x * mean_x

a = SS_xy / SS_xx
b = mean_y - a * mean_x

y_pred = a * x + b

prepared_results = np.asarray(y_pred)
prepared_results = prepared_results.flatten()
prepared_results = prepared_results.reshape(150, 2)
final_result = pd.DataFrame(data=prepared_results, columns=["first", "second"])
final_result

Unnamed: 0,first,second
0,0.813219,1.127687
1,0.714948,1.088379
2,0.754257,1.049070
3,0.734602,1.029416
4,0.832873,1.108033
...,...,...
145,0.714948,1.442155
146,0.616677,1.363538
147,0.714948,1.402846
148,0.793565,1.343884
