# Regularization and `glmnet`

Let's load our libraries:

In [None]:
library(ISLR)
library(glmnet)

`glmnet` takes a matrix of predictors, so let’s construct one and leave out the `Salary` variable that we’re trying to predict:

In [None]:
Hitters<-Hitters[complete.cases(Hitters), ]
X<-model.matrix(Salary~., data=Hitters)[, -1]
median_salary<-median(Hitters$Salary)
y_median<-as.numeric(Hitters$Salary >= median_salary)

`glmnet` uses a regularization penalty of the “elasticnet” form

$$ \frac{1-\alpha}{2} ||\beta||_2 + \alpha ||\beta||_1 $$

so $ \alpha=1 $ corresponds to the lasso penalty and $ \alpha=0\ $ corresponds to ridge penalty. Let’s fit a lasso classification model for whether the salary is greater than the median:

In [None]:
c.model.lasso<-glmnet(X, y_median, family="binomial", alpha=1)

This will show us the coefficients as a function of the norm (which is inverse to $ \lambda $). Notice the number of non-zero coefficients along the top axis:

In [None]:
plot(c.model.lasso)

Similarly, let’s fit a lasso regression model for the salary itself:

In [None]:
r.model.lasso<-glmnet(X, Hitters$Salary, family="gaussian", alpha=1)

In [None]:
plot(r.model.lasso)

We can use the `cv.glmnet` function to do cross-validation and select the best value of $ \lambda $ for us:

In [None]:
c.cv.model.lasso<-cv.glmnet(X, y_median, family="binomial", alpha=1, nfolds=10) # can also try type.measure="class"
r.cv.model.lasso<-cv.glmnet(X, Hitters$Salary, family="gaussian", alpha=1, nfolds=10)

We can see the out-of-sample error as a function of $ \lambda $ (notice the number of non-zero coefficients along the top axis):

In [None]:
plot(c.cv.model.lasso)

In [None]:
plot(r.cv.model.lasso)

Finally, let’s see how to get the coefficients out. Let’s take the classification models as an example:

In [None]:
attributes(c.model.lasso)

In [None]:
c.model.lasso$lambda

In [None]:
c.model.lasso$beta

On the right, $ \lambda $ is smallest, so more variables are included with larger coefficients. On the left, labmda is smaller, so fewer variables are included with smaller coefficients.

Let’s take the 25th value of $ \lambda $ as an example:

In [None]:
c.model.lasso$beta[, 25]

And finally, let’s run a classification model with the ridge penalty. Notice that all features are always included, but their size shrinks:

In [None]:
c.model.ridge<-glmnet(X, y_median, family="binomial", alpha=0)

This will show us the coefficients as a function of the norm (which is inverse to $ \lambda $):

In [None]:
plot(c.model.ridge)

In [None]:
c.cv.model.ridge<-cv.glmnet(X, y_median, family="binomial", alpha=0, nfolds=10)

In [None]:
plot(c.cv.model.ridge)