## Lasso, Ridge, & Elastic Net Regression

Based on this [YouTube Video](https://www.youtube.com/watch?v=ctmNq7FgbvI). Code is [HERE](https://github.com/StatQuest/ridge_lasso_elastic_net_demo/blob/master/ridge_lass_elastic_net_demo.R)

In [1]:
library(glmnet)
set.seed(42)

Loading required package: Matrix

Loading required package: foreach

Loaded glmnet 2.0-16




### Create a dataset for testing

In [2]:
n = 1000 # 1000 samples
p = 5000 # 5000 parameters to estimate
real_p = 15 # 15 params will help predict the outcome, the others will just be random noise

x = matrix(rnorm(n*p), nrow=n, ncol=p) # Randome matrix with n*p values, spread across n rows and p cols

In [3]:
# Apply will return a vector of 1,000 values that are the sums of the first 15 columns in x
# This way only the first 15 params have anything to do with the outcome of interest
y = apply(x[,1:real_p], 1, sum) + rnorm(n) # + rnorm(n) adds a little noise to the sums

### Train-test split

In [4]:
# First param gives range to sample from (from 1 to n), second gives number of sample to draw (2/3 of n)
train_rows = sample(1:n, .66*n)

x.train = x[train_rows,] # Apply mask to x for test
x.test = x[-train_rows,] # Apply opposite of mask to x for train

# Repeat with y
y.train = y[train_rows]
y.test = y[-train_rows]

### Ridge Regression

###### Fit

[Documentation](https://www.rdocumentation.org/packages/glmnet/versions/4.1-1/topics/cv.glmnet) for `cv.glmnet()` and [documentation](https://www.rdocumentation.org/packages/glmnet/versions/4.1-1/topics/glmnet) for `glmnet()` for which `cv.glmnet()` wraps a cv function around in order to get the best Lambda.

In [5]:
# When alpha is set to 0, cv.glmnet() does a Ridge regression

alpha0.fit = cv.glmnet(
    x=x.train,
    y=y.train,
    type.measure='mse',
    nfolds=10,
    alpha=0.1,
    family='gaussian' # This arg is passed through to glmnet()
)

alpha0.fit

$lambda
 [1] 13.4844914 12.8716005 12.2865663 11.7281229 11.1950617 10.6862288
 [7] 10.2005232  9.7368937  9.2943369  8.8718949  8.4686536  8.0837402
[13]  7.7163217  7.3656030  7.0308250  6.7112631  6.4062259  6.1150530
[19]  5.8371144  5.5718086  5.3185613  5.0768245  4.8460749  4.6258134
[25]  4.4155630  4.2148689  4.0232966  3.8404315  3.6658780  3.4992582
[31]  3.3402115  3.1883937  3.0434763  2.9051456  2.7731023  2.6470605
[37]  2.5267475  2.4119029  2.3022782  2.1976361  2.0977502  2.0024042
[43]  1.9113918  1.8245161  1.7415890  1.6624311  1.5868711  1.5147453
[49]  1.4458978  1.3801795  1.3174482  1.2575682  1.2004098  1.1458493
[55]  1.0937687  1.0440552  0.9966013  0.9513042  0.9080660  0.8667930
[61]  0.8273959  0.7897895  0.7538923  0.7196268  0.6869186  0.6556971
[67]  0.6258946  0.5974468  0.5702919  0.5443712  0.5196287  0.4960108
[73]  0.4734663  0.4519466  0.4314049  0.4117969  0.3930801  0.3752140
[79]  0.3581599  0.3418810  0.3263420  0.3115092  0.2973507  0.283835

Note the coefficients drop off in value at V16 and beyond.

In [6]:
coef(alpha0.fit)

5001 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept)  4.636008e-02
V1           6.685313e-01
V2           7.386210e-01
V3           8.374333e-01
V4           7.901204e-01
V5           6.946227e-01
V6           7.747022e-01
V7           7.548833e-01
V8           6.394083e-01
V9           7.138466e-01
V10          7.635185e-01
V11          6.999739e-01
V12          7.361838e-01
V13          6.706379e-01
V14          6.851706e-01
V15          7.367221e-01
V16          .           
V17          .           
V18          .           
V19          .           
V20          .           
V21          .           
V22          .           
V23          .           
V24          .           
V25          .           
V26          .           
V27          .           
V28         -1.903046e-02
V29          .           
V30          .           
V31          .           
V32          .           
V33          .           
V34          .           
V35          .      

###### Predict
[Documentation](https://www.rdocumentation.org/packages/glmnet/versions/1.1-1/topics/predict.glmnet) for `predict()`

In [7]:
alpha0.predicted = predict(
    object=alpha0.fit,
    newx=x.test,
    s=alpha0.fit$lambda.1se
)

###### Evaluate

In [8]:
mean((y.test - alpha0.predicted)^2)

### Lasso Regression

###### Fit

In [9]:
# When alpha is set to 1, glmnet() does a Lasso regression

alpha1.fit = cv.glmnet(
    x=x.train,
    y=y.train,
    type.measure='mse',
    nfolds=10,
    alpha=1,
    family='gaussian' # This arg is passed through to glmnet()
)

alpha1.fit

$lambda
 [1] 1.34844914 1.28716005 1.22865663 1.17281229 1.11950617 1.06862288
 [7] 1.02005232 0.97368937 0.92943369 0.88718949 0.84686536 0.80837402
[13] 0.77163217 0.73656030 0.70308250 0.67112631 0.64062259 0.61150530
[19] 0.58371144 0.55718086 0.53185613 0.50768245 0.48460749 0.46258134
[25] 0.44155630 0.42148689 0.40232966 0.38404315 0.36658780 0.34992582
[31] 0.33402115 0.31883937 0.30434763 0.29051456 0.27731023 0.26470605
[37] 0.25267475 0.24119029 0.23022782 0.21976361 0.20977502 0.20024042
[43] 0.19113918 0.18245161 0.17415890 0.16624311 0.15868711 0.15147453
[49] 0.14458978 0.13801795 0.13174482 0.12575682 0.12004098 0.11458493
[55] 0.10937687 0.10440552 0.09966013 0.09513042 0.09080660 0.08667930
[61] 0.08273959 0.07897895 0.07538923 0.07196268 0.06869186 0.06556971
[67] 0.06258946 0.05974468 0.05702919 0.05443712 0.05196287 0.04960108
[73] 0.04734663 0.04519466 0.04314049 0.04117969 0.03930801 0.03752140
[79] 0.03581599 0.03418810 0.03263420 0.03115092 0.02973507 0.0283835

Note the coefficients are mostly zero from V16 onward.

In [10]:
coef(alpha1.fit)

5001 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept)  0.0441443180
V1           0.8521641533
V2           0.8549434246
V3           0.9697882113
V4           0.9269022850
V5           0.8060840652
V6           0.9105423646
V7           0.8674874541
V8           0.8139070711
V9           0.8566926218
V10          0.8899915194
V11          0.8641560730
V12          0.8785025424
V13          0.8200853823
V14          0.8174555874
V15          0.8911272143
V16          .           
V17          .           
V18          .           
V19          .           
V20          .           
V21          .           
V22          .           
V23          .           
V24          .           
V25          .           
V26          .           
V27          .           
V28          .           
V29          .           
V30          .           
V31          .           
V32          .           
V33          .           
V34          .           
V35          .      

###### Predict

In [11]:
alpha1.predicted = predict(
    object=alpha1.fit,
    newx=x.test,
    s=alpha1.fit$lambda.1se
)

###### Evaluate

In [12]:
mean((y.test - alpha1.predicted)^2)

### ElasticNet Regression

###### Fit

In [13]:
# When alpha is set to 1, glmnet() does a Lasso regression

alpha0.5.fit = cv.glmnet(
    x=x.train,
    y=y.train,
    type.measure='mse',
    nfolds=10,
    alpha=0.5,
    family='gaussian' # This arg is passed through to glmnet()
)

alpha0.5.fit

$lambda
 [1] 2.69689828 2.57432009 2.45731327 2.34562459 2.23901233 2.13724577
 [7] 2.04010465 1.94737874 1.85886737 1.77437898 1.69373072 1.61674804
[13] 1.54326435 1.47312060 1.40616500 1.34225263 1.28124518 1.22301061
[19] 1.16742289 1.11436171 1.06371225 1.01536489 0.96921499 0.92516267
[25] 0.88311260 0.84297377 0.80465931 0.76808630 0.73317560 0.69985163
[31] 0.66804230 0.63767874 0.60869526 0.58102912 0.55462045 0.52941210
[37] 0.50534950 0.48238058 0.46045564 0.43952722 0.41955003 0.40048084
[43] 0.38227836 0.36490322 0.34831781 0.33248623 0.31737421 0.30294907
[49] 0.28917956 0.27603591 0.26348965 0.25151364 0.24008195 0.22916986
[55] 0.21875373 0.20881104 0.19932026 0.19026084 0.18161319 0.17335859
[61] 0.16547918 0.15795789 0.15077846 0.14392535 0.13738372 0.13113942
[67] 0.12517893 0.11948935 0.11405838 0.10887425 0.10392575 0.09920216
[73] 0.09469327 0.09038931 0.08628098 0.08235938 0.07861602 0.07504280
[79] 0.07163199 0.06837620 0.06526840 0.06230185 0.05947013 0.0567671

###### Predict

In [14]:
alpha0.5.predicted = predict(
    object=alpha0.5.fit,
    newx=x.test,
    s=alpha0.5.fit$lambda.1se
)

###### Evaluate

In [15]:
mean((y.test - alpha0.5.predicted)^2)

### Hyperparamter Tuning for `alpha`

In [16]:
# Initialize an empty list to store information
list.of.fits = list()

###### Fit

In [17]:
# Loop through 11 values

for (i in 0:10) {
    print(paste0("Fitting at alpha = ", i/10))
    
    # Name the element
    fit.name = paste0("alpha", i/10)
    
    # Train the model
    list.of.fits[[fit.name]] = cv.glmnet(
        x=x.train,
        y=y.train,
        type.measure='mse',
        nfolds=10,
        alpha=i/10,
        family='gaussian' # This arg is passed through to glmnet()
    )
}

[1] "Fitting at alpha = 0"
[1] "Fitting at alpha = 0.1"
[1] "Fitting at alpha = 0.2"
[1] "Fitting at alpha = 0.3"
[1] "Fitting at alpha = 0.4"
[1] "Fitting at alpha = 0.5"
[1] "Fitting at alpha = 0.6"
[1] "Fitting at alpha = 0.7"
[1] "Fitting at alpha = 0.8"
[1] "Fitting at alpha = 0.9"
[1] "Fitting at alpha = 1"


###### Predict

In [18]:
# Loop through 11 values

results = data.frame() # Initialize empty df

for (i in 0:10) {
    print(paste0("Predicting at alpha = ", i/10))
    
    # Name the element
    fit.name = paste0("alpha", i/10)
    
    # Predict
    predicted = predict(
        object=list.of.fits[[fit.name]],
        newx=x.test,
        s=list.of.fits[[fit.name]]$lambda.1se
    )
        
    mse = mean((y.test - predicted)^2)
        
    temp = data.frame(alpha=i/10, mse=mse, fit.name=fit.name)
    print(temp)
    
    results = rbind(results, temp)
    
}

[1] "Predicting at alpha = 0"
  alpha      mse fit.name
1     0 14.95281   alpha0
[1] "Predicting at alpha = 0.1"
  alpha      mse fit.name
1   0.1 2.256924 alpha0.1
[1] "Predicting at alpha = 0.2"
  alpha      mse fit.name
1   0.2 1.472927 alpha0.2
[1] "Predicting at alpha = 0.3"
  alpha      mse fit.name
1   0.3 1.362394 alpha0.3
[1] "Predicting at alpha = 0.4"
  alpha      mse fit.name
1   0.4 1.259794 alpha0.4
[1] "Predicting at alpha = 0.5"
  alpha      mse fit.name
1   0.5 1.252103 alpha0.5
[1] "Predicting at alpha = 0.6"
  alpha     mse fit.name
1   0.6 1.25333 alpha0.6
[1] "Predicting at alpha = 0.7"
  alpha      mse fit.name
1   0.7 1.212927 alpha0.7
[1] "Predicting at alpha = 0.8"
  alpha      mse fit.name
1   0.8 1.184028 alpha0.8
[1] "Predicting at alpha = 0.9"
  alpha      mse fit.name
1   0.9 1.182919 alpha0.9
[1] "Predicting at alpha = 1"
  alpha      mse fit.name
1     1 1.184701   alpha1


In [19]:
print(results)

   alpha       mse fit.name
1    0.0 14.952815   alpha0
2    0.1  2.256924 alpha0.1
3    0.2  1.472927 alpha0.2
4    0.3  1.362394 alpha0.3
5    0.4  1.259794 alpha0.4
6    0.5  1.252103 alpha0.5
7    0.6  1.253330 alpha0.6
8    0.7  1.212927 alpha0.7
9    0.8  1.184028 alpha0.8
10   0.9  1.182919 alpha0.9
11   1.0  1.184701   alpha1


### Conclusion

Since `mse` is lowest at `alpha=1`, **lasso** is still our best model! Might vary from time to time due to randomness, but `alpha=1` should be lowest or within just a few fractions of a point.