<a href="https://colab.research.google.com/github/danielbauer1979/ML_656/blob/main/Module4_ExampleOfLassoInTelData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#LASSO Example in Linear Regression Context

Here is a quick example how to do a LASSO regression using numerical outcomes (Linear Regression context) using the data from Assignment 1.

### Import Packages

In [15]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import mean_squared_error

Note that I am importing `StandardScaler` to scale our data. It appears that the `normalize` option in the Lasso regression tool has depreciated, so it is (now) necessarily to do it manually as a preprocessing step.

## Data

Let's get the data and modify it in the same way we did in Assignment 1.

In [None]:
!git clone https://github.com/danielbauer1979/ML_656.git

In [None]:
data = pd.read_csv('ML_656/tel.csv')
data['Tuesday'] = data.apply(lambda row: int(row.Day==2), axis=1)
data['Wednesday'] = data.apply(lambda row: int(row.Day==3), axis=1)
data['Thursday'] = data.apply(lambda row: int(row.Day==4), axis=1)
data['Friday'] = data.apply(lambda row: int(row.Day==5), axis=1)
data = data.drop(columns=['Day'])
data.head()

In [8]:
X = data.drop(columns=['Hours'])
y = data['Hours']

So as announced above, we need to scale (i.e., normalize) the data. Essentially we are centering each column and dividing it by the standard deviation to get variables of the same scale.

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

It may be better to separate the variables into categorical and numerical variables. But given that the packages treat all the variables as numbers, we arguably don't make a big mistake here.

We split data into training and test sets (we use the test set for tuning our Lasso model).

In [18]:
X_train, X_test , y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=1)

##OLS Model

We start with OLS, where we used the selected features from homework 1:

In [None]:
model_ols = LinearRegression(fit_intercept=True)
X_red = data[['SOA','Hot','Friday']]
model_ols.fit(X_red, y)
print(model_ols.intercept_)
print(model_ols.coef_)

The *Root-Mean-Squared-Error*:
$$
RMSE = \sqrt{\sum_i (y_i - \hat{y}_i)^2}
$$
across the total dataset is:

In [None]:
y_sel = model_ols.predict(X_red)
TestRMSE_ols = np.sqrt(mean_squared_error(y,y_sel))
print(TestRMSE_ols)

Note that one advantage of using AIC or Anova in a simple regression context means we don't need to split the dataset, but we can use all data.

## LASSO Regression

We now turn to LASSO regression. One question is whether we can build a model that beats the performance of our OLS model with three features.

And let's run a LASSO regression, with some predefined values of lambda---or here called alpha---i.e., the penalty parameter in the LASSO regression (in full disclosure, you often have to experiment a bit to find the right range parameter range):

In [69]:
alphas = np.array([0.1,0.5,1.0,1.5,2])
model_lasso = Lasso(max_iter = 10000)
MSE = []
for a in alphas: #go through all alphas...
    model_lasso.set_params(alpha=a) #fil the model...
    model_lasso.fit(X_train, y_train)
    MSE.append(mean_squared_error(y_test, model_lasso.predict(X_test))) #and then determine the MSE based the test set for each model

Let's take the square root of our MSE to get the RMSE:

In [None]:
RMSE = np.sqrt(MSE)
RMSE

and let's plot:

In [None]:
plt.plot(RMSE)

So we get a nice U shaped error curve as we expect based on the bias-variance tradeoff. The "sweet spot", i.e., our best performing model is alpha = 2. So we have a winner!

OK, as a last step, let's refit our best model using the entire dataset, that is not only our training set, and let's look at the coefficients:

In [None]:
model_lasso.set_params(alpha=2.0)
model_lasso.fit(X_scaled, y)
model_lasso.coef_

So we see the LASSO sets ByDa, SOB, SOC, Field, and the Wednesday dummie to zero---that's SELECTION. For the other parameters, it applied some degree of SHRINKAGE.

Let's see how the model performs relative to our OLS model:

In [None]:
y_lasso = model_lasso.predict(X_scaled)
FullRMSE_lasso = np.sqrt(mean_squared_error(y,y_lasso))
print(FullRMSE_lasso)

So it beats the OLS model in terms of RMSE, which was 8.3.

Let's look at the predictions between the two models relative to the actual outcomes:

In [None]:
df = pd.DataFrame({'y':y, 'y_hat_sel':y_sel, 'y_hat_lasso':y_lasso})
df

So in some instances, the one model performs better, on other instances the other. But the overall performance of the LASSO model is superior.