# MATH 3375 Examples Notebook #10

# Dimension Reduction and Regularization

## Elastic Net

We continue using the 2004 cars data set to examine Elastic Net, and compare it to LASSO and Ridge Regression. 


In [None]:
#Look at data set
car_data <- read.csv("cars2004.csv", stringsAsFactors=TRUE)
head(car_data,3)
tail(car_data,3)

In [None]:
car_data$Length <- as.integer(as.character(car_data$Length))
car_data$Width <- as.integer(as.character(car_data$Width))
tail(car_data,3)

In [None]:
car_data$Length[is.na(car_data$Length)] <- as.integer(median(car_data$Length[!is.na(car_data$Length)]))
car_data$Width[is.na(car_data$Width)] <- as.integer(median(car_data$Width[!is.na(car_data$Width)]))
head(car_data,3)
tail(car_data,3)

## 1. Quick Review of LASSO and Ridge Regression

LASSO minimizes:

$$SSE + \lambda\sum_{j=1}^{p}\left | \beta_j \right |$$

Ridge Regression computes model coefficients that minimize:

$$SSE + \lambda\sum_{j=1}^{p} {\beta_j}^2$$
    
* LASSO is used for variable selection; Ridge Regression is not. 
* The constraints in LASSO _**shrink**_ coefficients until some are zero, eliminating the variable from the model.
* The constraints in Ridge Regression _account for multicollinearity_ but do not eliminate any variables.


### Implementations with _glmnet_ Library

Below we show review basic implementations of LASSO and Ridge Regression with the **glmnet** library.

_Only un-comment the install line if the library fails to load. Run the install once and re-comment the line to avoid running the install again._

In [None]:
#install.packages("glmnet")
library(glmnet)

#### LASSO Implementation

In [None]:
set.seed(3375)
hp_model_lasso <- cv.glmnet(x=as.matrix(car_data[c(4:7,9:14)]),y=as.matrix(car_data[8]),alpha=1,nfolds=5)
coef(hp_model_lasso)

#### Using the LASSO Result

The default LASSO model with the most shrinkage (largest acceptable $\lambda$) has dropped the coefficients for all coefficients except MSRP, EngineSize, Cylinders, and City.MPG.

To use these results, we create an OLS model with the predictors that were not dropped by the LASSO model.  

In [None]:
hp_model_ols_lasso <- lm(HP ~ MSRP+EngineSize+Cylinders+City.MPG, data=car_data)
summary(hp_model_ols_lasso)

#### Ridge Regression Implementation


In [None]:
set.seed(3375)
hp_model_ridge <- cv.glmnet(x=as.matrix(car_data[c(4:7,9:14)]),y=as.matrix(car_data[8]),alpha=0,nfolds=5)
coef(hp_model_ridge)


## 2. Elastic Net

Elastic Net is a model that combines the dimension reduction features of LASSO with the improved bias-variance balance of Ridge Regression.

Elastic Net coefficients are computed to minimize:

$$SSE +  \alpha \left(\lambda\sum_{j=1}^{p}\left | \beta_j \right |\right) + (1-\alpha) \left(\lambda\sum_{j=1}^{p} {\beta_j}^2 \right)$$

Notice that this is a _combination_ of the LASSO and Ridge penalties, **_weighted_** by the $\alpha$ value and its complement.

In fact, when $\alpha = 1$, Elastic Net becomes LASSO, and when $\alpha = 0$, Elastic Net becomes Ridge.



### Using glmnet for Elastic Net Model

Recall that **alpha** was 1 for LASSO and 0 for Ridge.  For Elastic Net, the alpha parameter is somewhere between 0 and 1. The closer **alpha** is to zero, the more heavily weighted the Ridge penalty; the closer **alpha** is to 1, the more heavily weighted the LASSO penalty.  We use values of 0.2, 0.5, and 0.8 below for comparison.  


In [None]:
set.seed(3375)
hp_model_elastic1 <- cv.glmnet(x=as.matrix(car_data[c(4:7,9:14)]),y=as.matrix(car_data[8]),alpha=0.2,nfolds=5)
plot(hp_model_elastic1)

In [None]:
set.seed(3375)
hp_model_elastic2 <- cv.glmnet(x=as.matrix(car_data[c(4:7,9:14)]),y=as.matrix(car_data[8]),alpha=0.5,nfolds=5)
plot(hp_model_elastic2)

In [None]:
set.seed(3375)
hp_model_elastic3 <- cv.glmnet(x=as.matrix(car_data[c(4:7,9:14)]),y=as.matrix(car_data[8]),alpha=0.8,nfolds=5)
plot(hp_model_elastic3)

### Coefficients of the Three Elastic Net Models

In [None]:
coef(hp_model_elastic1)

In [None]:
coef(hp_model_elastic2)

In [None]:
coef(hp_model_elastic3)

### How to Select $\alpha$?

Some possibilities for selecting $\alpha$ include:

* Use 0.5 and equally weight the LASSO and Ridge components.
* Use cross-validation with multiple $\alpha$ values and select the $\alpha$ that results in the lowest MSE. (Note that the **glmnet** library does NOT do this for you.)

### Predictions from All Models

Below we generate predictions of HP for the first 5 rows of our data set, using the Ridge model, all 3 of the Elastic Net models, and the LASSO model. These are displayed (left to right) in increasing order of alpha (0 for Ridge up to 1 for LASSO). 

For Ridge and Elastic Net, we keep the original model computed by **glmnet**. Recall that for LASSO, it is recommended to use the OLS model with the same predictors that were selected by LASSO. Below, we show the predictions for the original LASSO **_and_** the OLS model with features selected by LASSO, so we can see how they compare.



In [None]:
#Predict HP for first 5 rows of original data set

pred_ridge <- predict(hp_model_ridge, newx=as.matrix(car_data[1:5,c(4:7,9:14)]))
pred_elastic1 <- predict(hp_model_elastic1, newx=as.matrix(car_data[1:5,c(4:7,9:14)]))
pred_elastic2 <- predict(hp_model_elastic2, newx=as.matrix(car_data[1:5,c(4:7,9:14)]))
pred_elastic3 <- predict(hp_model_elastic3, newx=as.matrix(car_data[1:5,c(4:7,9:14)]))
pred_lasso <- predict(hp_model_lasso, newx=as.matrix(car_data[1:5,c(4:7,9:14)]))
pred_ols_lasso <- predict(hp_model_ols_lasso, car_data[1:5,c(4:7,9:14)])

df_preds <- data.frame(pred_ridge,pred_elastic1,pred_elastic2,pred_elastic3,pred_lasso,pred_ols_lasso)
colnames(df_preds) <- c("Ridge","Elastic Net 1","Elastic Net 2","Elastic Net 3","LASSO (Original)","OLS from LASSO")

df_preds

#### Model Comparison

Below we compute the MSE for the 5 predictions using each of the 6 models above. Note that **_this is in-sample error only_**. For a better estimate of model performance, we would need to reserve some data in advance as a test set, and create the models with a training data set that excludes the test data.

In [None]:
MSE_ridge <- mean((car_data$HP[1:5] - pred_ridge)^2)
MSE_elastic1 <- mean((car_data$HP[1:5] - pred_elastic1)^2)
MSE_elastic2 <- mean((car_data$HP[1:5] - pred_elastic2)^2)
MSE_elastic3 <- mean((car_data$HP[1:5] - pred_elastic3)^2)
MSE_lasso <- mean((car_data$HP[1:5] - pred_lasso)^2)
MSE_ols_lasso <- mean((car_data$HP[1:5] - pred_ols_lasso)^2)

df_MSE <- data.frame(MSE_ridge,MSE_elastic1,MSE_elastic2,MSE_elastic3,MSE_lasso,MSE_ols_lasso)
colnames(df_MSE) <- c("Ridge","Elastic Net 1","Elastic Net 2","Elastic Net 3","LASSO (Original)","OLS from LASSO")

df_MSE