# Lab 13: Lasso & Ridge Regression, GridSearchCV

# Question 1

- install the ISLP package, load the `Cartseats` [dataset](https://islp.readthedocs.io/en/latest/datasets/Carseats.html), assign the data to the variable `seats`, check the shape of the data

In [None]:
! pip install ISLP

In [1]:
from ISLP import load_data

seats = load_data('Carseats')
print(seats.shape)
seats.head()


(400, 11)


Unnamed: 0,Sales,CompPrice,Income,Advertising,Population,Price,ShelveLoc,Age,Education,Urban,US
0,9.5,138,73,11,276,120,Bad,42,17,Yes,Yes
1,11.22,111,48,16,260,83,Good,65,10,Yes,Yes
2,10.06,113,35,10,269,80,Medium,59,12,Yes,Yes
3,7.4,117,100,4,466,97,Medium,55,14,Yes,Yes
4,4.15,141,64,3,340,128,Bad,38,13,Yes,No


# Question 2: Feature Engineering

- We are going to use `Urban`, `ShelveLoc`, and `US` in our dataset. It is your job to convert the text data in these columns into usable datasets for predicting upon. Convert the data and assign the new values back to the original column names.

In [2]:
mapping = {"No": 0, "Yes": 1}
quality_mapping = {"Bad": -1, "Medium": 0, "Good": 1}

seats['US'] = seats['US'].map(mapping)
seats['Urban'] = seats['Urban'].map(mapping)

seats['ShelveLoc'] = seats['ShelveLoc'].map(quality_mapping)

# Question 3

- Using `scikit-learn` train_test_split function,  split the `seats` dataframe with a test size of 8% and shuffle the data. The features should be every variable expcept `Sales`. The `Sales` column should be the target variable. Set the `random_state` to 583.

In [3]:
from sklearn.model_selection import train_test_split

X = seats.drop('Sales', axis=1)
y = seats['Sales']
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=.08,
                                                    shuffle=True,
                                                    random_state=583)

# Question 4 Lasso Vs Ridge Versus ElasticNet

- Run three different estimators: a lasso, ridge, and elasticnet using sklearn:
  - Set the alpha to .3 for ridge and lasso.
  - Set the l1_ratio to .5 for elasticnet
- Predict on X_test for each model. Calculate the mean absolute error for all three estimators. Which model performed best?

In [12]:
from sklearn.linear_model import Lasso, Ridge, ElasticNet

lasso = Lasso(alpha=0.3)
lasso.fit(X_train, y_train)

ridge = Ridge(alpha=0.3)
ridge.fit(X_train, y_train)

elastic_net = ElasticNet(alpha=0.3, l1_ratio=0.5)
elastic_net.fit(X_train, y_train)

In [13]:
from sklearn.metrics import mean_absolute_error

lasso_pred = lasso.predict(X_test)
ridge_pred = ridge.predict(X_test)
elastic_pred = elastic_net.predict(X_test)

lasso_mae = mean_absolute_error(y_test, lasso_pred)
ridge_mae = mean_absolute_error(y_test, ridge_pred)
elastic_mae = mean_absolute_error(y_test, elastic_pred)

print(f"Lasso MAE was {lasso_mae}")
print(f"Ridge MAE was {ridge_mae}")
print(f"ElasticNet MAE was {elastic_mae}")

Lasso MAE was 0.9084900058406228
Ridge MAE was 0.7881579832486895
ElasticNet MAE was 0.972946936036626


Ridge performed best.
- Let's take the results of Lasso. An MAE of 0.90849 for the Lasso model means that, on average, the predictions made by this model are off by approximately \$908.49 (since the unit is in \$1000s).
- Similarly, an MAE of 0.78816 for the Ridge model means that its predictions are off by approximately \$788.16 on average.

# Question 5: GridSearchCV

- but did we choose the optimal parameters? Run a gridsearchcv on Lasso, Ridge, and ElasticNet again, this time with 5 CVs per model.
- Lassoa and rdige should have alpha [.1, .3, .5, .7, .9] as options
- l1_ratio should have alpha [.1, .3, .5, .7, .9], and l1_ratio = [.1, .3, .5, .7, .9] as options
- Which model performs best now?

In [14]:
import warnings
from sklearn.model_selection import GridSearchCV
from sklearn.exceptions import ConvergenceWarning

warnings.simplefilter("ignore", category=ConvergenceWarning)


In [15]:
# Elastic
param_grid = {
    'alpha': [.1, .3, .5, .7, .9],
    'l1_ratio': [.1, .3, .5, .7, .9]
}

enet_model = ElasticNet()

grid_search = GridSearchCV(enet_model,
                           param_grid,
                           scoring='neg_mean_squared_error',
                           cv=5,
                           verbose=1)

grid_search.fit(X_train, y_train)
elastic_grid_pred = grid_search.predict(X_test)


Fitting 5 folds for each of 25 candidates, totalling 125 fits


In [17]:
# Ridge
param_grid = {
    'alpha': [.1, .3, .5, .7, .9]
}

ridge_model = Ridge()

grid_search = GridSearchCV(ridge_model,
                           param_grid,
                           scoring='neg_mean_squared_error',
                           cv=5,
                           verbose=1)

grid_search.fit(X_train, y_train)
ridge_grid_pred = grid_search.predict(X_test)


Fitting 5 folds for each of 5 candidates, totalling 25 fits


In [18]:
# Lasso
param_grid = {
    'alpha': [.1, .3, .5, .7, .9]
}

lasso_model = Lasso()

grid_search = GridSearchCV(lasso_model,
                           param_grid,
                           scoring='neg_mean_squared_error',
                           cv=5,
                           verbose=1)

grid_search.fit(X_train, y_train)
lasso_grid_pred = grid_search.predict(X_test)


Fitting 5 folds for each of 5 candidates, totalling 25 fits


In [20]:
grid_lasso_mae = mean_absolute_error(y_test, lasso_grid_pred)
grid_ridge_mae = mean_absolute_error(y_test, ridge_grid_pred)
grid_elastic_mae = mean_absolute_error(y_test, elastic_grid_pred)

print(f"Lasso Grid MAE was {grid_lasso_mae}")
print(f"Ridge Grid MAE was {grid_ridge_mae}")
print(f"ElasticNet Grid MAE was {grid_elastic_mae}")

Lasso Grid MAE was 0.8288934213127658
Ridge Grid MAE was 0.7895682765993483
ElasticNet Grid MAE was 0.8342399783867769


Ridge Still performs better, but the error for Lass