<font size = 5> Regularization Techniques in Logistic Regression Models <font>

Let's center and scale our predictors, and fit both a lasso and a ridge model. <br>

Do we have any multicollinear predictor variables or irrelevant predictors? What are their coefficients?

In [1]:
#Load Libraries
library(dplyr)
library(fastDummies)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


Thank you for using fastDummies!

To acknowledge our work, please cite the package:

Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.



In [2]:
#load data
df <- read.csv("stroke_data.csv")

In [3]:
head(df)

Unnamed: 0_level_0,id,gender,age,hypertension,heart_disease,ever_married,work_type,Residence_type,avg_glucose_level,bmi,smoking_status,stroke
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<int>,<int>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<int>
1,9046,Male,67,0,1,Yes,Private,Urban,228.69,36.6,formerly smoked,1
2,31112,Male,80,0,1,Yes,Private,Rural,105.92,32.5,never smoked,1
3,60182,Female,49,0,0,Yes,Private,Urban,171.23,34.4,smokes,1
4,1665,Female,79,1,0,Yes,Self-employed,Rural,174.12,24.0,never smoked,1
5,56669,Male,81,0,0,Yes,Private,Urban,186.21,29.0,formerly smoked,1
6,53882,Male,74,1,1,Yes,Private,Rural,70.09,27.4,never smoked,1


In [4]:
#Transform categorical values to dummy 
df_dummy <-   df %>% dummy_cols(remove_first_dummy = TRUE, remove_selected_columns = TRUE)

In [5]:
#Lasso Regression
lasso_glmnet <-     glmnet::glmnet(x = df_dummy %>% select(-stroke) %>% scale() %>% as.matrix(),
                                   y = df_dummy %>% pull(stroke) %>% factor(),
                                   family = binomial(),
                                   lambda = 1   #lasso regression
                                  )

#coefficients of Lasso Regression
coef(lasso_glmnet)

18 x 1 sparse Matrix of class "dgCMatrix"
                                   s0
(Intercept)                 -3.112984
id                           .       
age                          .       
hypertension                 .       
heart_disease                .       
avg_glucose_level            .       
bmi                          .       
gender_Male                  .       
gender_Other                 .       
ever_married_Yes             .       
work_type_Govt_job           .       
work_type_Never_worked       .       
work_type_Private            .       
work_type_Self-employed      .       
Residence_type_Urban         .       
smoking_status_never smoked  .       
smoking_status_smokes        .       
smoking_status_Unknown       .       

We see that Lasso has performed feature selection and determined that none of the predictors are contributing significantly to explaining the variation in the target variable (having a stroke).

In [6]:
#Ridge Regression
ridge_glmnet <-     glmnet::glmnet(x = df_dummy %>% select(-stroke) %>% scale() %>% as.matrix(),
                                   y = df_dummy %>% pull(stroke) %>% factor(),
                                   family = binomial(),
                                   lambda = 0    #ridge regression
                                  )

#coefficients of Lasso Regression
coef(ridge_glmnet)

18 x 1 sparse Matrix of class "dgCMatrix"
                                      s0
(Intercept)                 -4.224246607
id                           0.008309876
age                          1.665103530
hypertension                 0.151321829
heart_disease                0.075317558
avg_glucose_level            0.206412835
bmi                          0.034322675
gender_Male                 -0.007469772
gender_Other                -0.104028779
ever_married_Yes            -0.051870391
work_type_Govt_job          -0.284425117
work_type_Never_worked      -0.338920857
work_type_Private           -0.340214298
work_type_Self-employed     -0.406077320
Residence_type_Urban         0.002324600
smoking_status_never smoked -0.032265833
smoking_status_smokes        0.112745330
smoking_status_Unknown      -0.127515370

Looking at the coefficients for Ridge regression, we observe that all of the coefficients are pretty small but none of them are set to zero. <br>
Coefficient of predictor "age" has the largest value compared to the other ones.

Now let's use the elastic net regularization to find the optimal lambda and coefficients for our predictors.

In [7]:
#Using cross validation to find the optimal lambda value for regression
glmnet::cv.glmnet(x = df_dummy %>% select(-stroke) %>% scale() %>% as.matrix(),
                  y = df_dummy %>% pull(stroke) %>% factor(),
                  family = binomial()
                  )


Call:  glmnet::cv.glmnet(x = df_dummy %>% select(-stroke) %>% scale() %>%      as.matrix(), y = df_dummy %>% pull(stroke) %>% factor(),      family = binomial()) 

Measure: GLM Deviance 

      Lambda Index Measure      SE Nonzero
min 0.001647    37  0.2834 0.01505       8
1se 0.015360    13  0.2974 0.01508       4

In [8]:
#Find the best lambda value for Elastic Net regularization
elastic_glmnet <-   glmnet::glmnet(x = df_dummy %>% select(-stroke) %>% scale() %>% as.matrix(),
                                  y = df_dummy %>% pull(stroke) %>% factor(),
                                  family = binomial(),
                                  lambda = 0.02
                                  )
#coefficients of elastic net Regression
coef(elastic_glmnet)

18 x 1 sparse Matrix of class "dgCMatrix"
                                     s0
(Intercept)                 -3.35000478
id                           .         
age                          0.73418821
hypertension                 0.01153639
heart_disease                .         
avg_glucose_level            0.02237766
bmi                          .         
gender_Male                  .         
gender_Other                 .         
ever_married_Yes             .         
work_type_Govt_job           .         
work_type_Never_worked       .         
work_type_Private            .         
work_type_Self-employed      .         
Residence_type_Urban         .         
smoking_status_never smoked  .         
smoking_status_smokes        .         
smoking_status_Unknown       .         

We like the result for the elastic net regularization better because it is somewhat between ridge and lasso and it did not set all the coefficients to zero. <br>

However it is worth to note that the none zero coefficients are still very small. Mabye because the model is penalizing the coefficients strongly and wants to shrink them towards zero. <br>

Also, if there's multicollinearity among the predictors, the Elastic Net might shrink the coefficients of correlated predictors towards each other, resulting in smaller individual coefficients.

It is also possible that the predictors of the model are not greatly associated with the target value. In that case elastic net (same as ridge) would shrink the coefficients toward zero.