# Exercises

There are three exercises in this notebook:

1. Use the cross-validation method to test the linear regression with different $\alpha$ values, at least three.
2. Implement a SGD method that will train the Lasso regression for 10 epochs.
3. Extend the Fisher's classifier to work with two features. Use the class as the $y$.

## 1. Cross-validation linear regression

You need to change the variable ``alpha`` to be a list of alphas. Next do a loop and finally compare the results.

In [45]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

In [46]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((15,1)),x])

I = np.identity(2)
alpha = [0.001, 0.01, 0.1, 0.2, 0.3, 0.4, 0.5,1]
results = []

# add 1-3 line of code here
for i in range(len(alpha)):
    w = np.linalg.inv(x.T*x + alpha[i] * I)*x.T*y
    w=w.ravel()
    results.append(w)
    
# add 1-3 lines to compare the results
results = np.asarray(results)
results = results.flatten()
results = results.reshape(8,2)
df = pd.DataFrame(data=results, index=["alpha = 0.001", "alpha = 0.01", "alpha = 0.1", "alpha = 0.2", "alpha = 0.3", "alpha = 0.4", "alpha = 0.5", "alpha = 1"], columns=["w1", "w2"])

print(df)

                       w1        w2
alpha = 0.001 -179.526286  1.610230
alpha = 0.01  -167.855340  1.544160
alpha = 0.1   -101.723971  1.169788
alpha = 0.2    -70.751422  0.994451
alpha = 0.3    -54.237043  0.900962
alpha = 0.4    -43.972861  0.842856
alpha = 0.5    -36.975220  0.803242
alpha = 1      -20.590447  0.710486


## 2. Implement based on the Ridge regression example, the Lasso regression.

Please implement the SGD method and compare the results with the sklearn Lasso regression results. 

In [47]:
def sgd(arr_coef, x, y, lamda, iterations):
   
    n = x.shape[1]
    norm = (np.linalg.norm(x,axis = 0))
    
    for i in range(iterations):
        for j in range(n):

            norm_j = norm[j]
            x_j = x[:,j].reshape(-1,1)
            y_pred = x * arr_coef 
            r = x_j.T * (y - y_pred  + x_j * arr_coef[j])
        
            if j == 0: 
                arr_coef[j] =  r/(norm_j**2)
            else:
                if r < - lamda:
                    arr_coef[j] = (r + lamda)/(norm_j**2)
                elif r >  lamda:
                    arr_coef[j] = (r - lamda)/(norm_j**2)
                else: 
                    arr_coef[j] = 0
                            
    return arr_coef.flatten()

In [48]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((x.shape[0],1)),x])
n = x.shape[1] 
alpha = 0.1 
initial_coefficients = np.zeros((n,1))

w = sgd(initial_coefficients, x, y, alpha, 1000)

print(str(w[1]) + ", " + str(w[0]))

1.6178194139934838, -180.86698943443562


## 3. Extend the Fisher's classifier

Please extend the targets of the ``iris_data`` variable and use it as the $y$.

In [49]:
iris = load_iris()

iris_df = pd.DataFrame(iris.data,columns=iris.feature_names)
iris_target_df = pd.DataFrame(iris.target)

x = iris_df[['sepal width (cm)', 'sepal length (cm)']].values
y = iris_target_df.values
# print(x)
# print(y)
dataset_size = np.size(y)

mean_x, mean_y = np.mean(x), np.mean(y)

SS_xy = np.sum(y * x) - dataset_size * mean_y * mean_x
SS_xx = np.sum(x * x) - dataset_size * mean_x * mean_x

a = SS_xy / SS_xx
b = mean_y - a * mean_x

y_pred = a * x + b
# print(y_pred)