# Exercises

There are three exercises in this notebook:

1. Use the cross-validation method to test the linear regression with different $\alpha$ values, at least three.
2. Implement a SGD method that will train the Lasso regression for 10 epochs.
3. Extend the Fisher's classifier to work with two features. Use the class as the $y$.

In [1]:
import numpy as np

In [2]:
%load_ext autoreload
%autoreload 2

## 1. Cross-validation linear regression

You need to change the variable ``alpha`` to be a list of alphas. Next do a loop and finally compare the results.

In [31]:
x = (
    np.array(
        [188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]
    )
    .reshape(-1, 1)
    .reshape(15, 1)
)
y = (
    np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121])
    .reshape(-1, 1)
    .reshape(15, 1)
)

x = np.asmatrix(np.c_[np.ones((15, 1)), x])


I = np.identity(2)
alphas = [0.1, 0.15, 0.02]

models = []
for alpha in alphas:
    w = np.linalg.inv(x.T * x + alpha * I) * x.T * y
    models.append(w.ravel())

y_array = np.asarray(y)
for idx, model in enumerate(models):
    _y = (model @ x.T).T
    _y_array = np.asarray(_y)
    mse = ((_y_array - y_array) ** 2).mean()
    print(f"alpha={alphas[idx]}\tMSE={mse}")

alpha=0.1	MSE=426.04507708317476
alpha=0.15	MSE=453.6802014401018
alpha=0.02	MSE=377.4196885131722


## 2. Implement based on the Ridge regression example, the Lasso regression.

Please implement the SGD method and compare the results with the sklearn Lasso regression results. 

In [None]:
import numpy as np
from sklearn.preprocessing import StandardScaler

x = np.array(
    [188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]
).reshape(-1, 1)
y = np.array(
    [141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]
).reshape(-1, 1)

# Add intercept column
x_new = np.c_[np.ones((15, 1)), x]

scaler = StandardScaler()
x_new[:, 1:] = scaler.fit_transform(x_new[:, 1:])


def sgd(x, y, alpha, epochs):
    rows, cols = x.shape
    weights = np.zeros((cols, 1))  # Initialize weights to zeros
    for epoch in range(epochs):
        # Shuffle the data to make it stochastic
        indices = np.arange(rows)
        np.random.shuffle(indices)
        x_shuffled = x[indices]
        y_shuffled = y[indices]
        for i in range(rows):
            xi = x_shuffled[i : i + 1]  # Ensuring 2D array structure
            yi = y_shuffled[i, 0]  # Extracting single target value
            prediction = xi.dot(weights)
            error = yi - prediction
            gradient = -2 * xi.T.dot(error)  # Gradient calculation per sample
            weights -= alpha * gradient  # Update the weights

    return weights


weights = sgd(x_new, y, alpha=1e-2, epochs=10000)

print(weights)

[[102.52635232]
 [ 26.30569287]]


In [33]:
from sklearn.linear_model import Lasso

lasso_regressor = Lasso(alpha=1.0)
lasso_regressor.fit(np.asarray(x_new), np.asarray(y))
print(lasso_regressor.coef_, lasso_regressor.intercept_)

[ 0.         25.33725365] [102.46666667]


## 3. Extend the Fisher's classifier

Please extend the targets of the ``iris_data`` variable and use it as the $y$.

In [34]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
iris_df['target'] = iris_data.target
iris_df.head()

independent_variables = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
dependent_variable = 'target'
x = iris_df[independent_variables].values # change here
y = iris_df[dependent_variable].values.reshape(-1, 1) # change here

dataset_size = np.size(x)

mean_x, mean_y = np.mean(x), np.mean(y)

SS_xy = np.sum(y * x) - dataset_size * mean_y * mean_x
SS_xx = np.sum(x * x) - dataset_size * mean_x * mean_x

a = SS_xy / SS_xx
b = mean_y - a * mean_x


y_pred = a * x + b
df = pd.DataFrame(y_pred)
df


Unnamed: 0,0,1,2,3
0,1.244804,1.005314,0.690983,0.511365
1,1.214867,0.930473,0.690983,0.511365
2,1.184931,0.960409,0.676015,0.511365
3,1.169963,0.945441,0.705951,0.511365
4,1.229836,1.020282,0.690983,0.511365
...,...,...,...,...
145,1.484294,0.930473,1.259772,0.825696
146,1.424421,0.855632,1.229836,0.765824
147,1.454357,0.930473,1.259772,0.780792
148,1.409453,0.990346,1.289708,0.825696
