### Lasso and Elastic Net

The _Lasso_ ("least absolute shrinkage and selection operator") is a regression model that does variable selection and regularization at the same time.  It does his by constraining the sum of the absolute values of regression coefficients to be less than a fixed value.  As a result, coefficients are "shrunk" towards zero, and some are forced to be zero, effectively selecting them "out" of the model.

Originally specified for least squares, the Lasso has been generalized to generalized linear models ("glm's"), to proportional hazards models (e.g. for survival analysis), and to other types of models.  

_Elastic Net_ extends the penalty constraint used in the Lasso such that an unique optimization solution can be had.  The Lasso and ridge regression are special cases of the Elastic Net specification.

Here's the original Elastic Net reference:

[Zou, H. & Hastie, T. (2005) "Regularization and variable selection via the
elastic net"](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.4696)

The `statsmodels` package includes Elastic Net code.  Let's use it to estimate a Lasso regression model for the patient satisfaction data.

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [46]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [47]:
patSatDF=pd.read_csv('DECART-patSat.csv')

In [48]:
patSatDF.columns

Index(['caseID', 'patSat', 'q2', 'q3', 'q4', 'q5', 'q6', 'q7', 'q8', 'q9',
       'ptCat'],
      dtype='object')

In [52]:
mod1formula="patSat ~ q2 + + q3 + q4 + q5 + q6 +  q7 + q8 + q9"

mod1 = sm.OLS.from_formula(mod1formula, data=patSatDF)
mod1Result=mod1.fit()
print(mod1Result.summary())

                            OLS Regression Results                            
Dep. Variable:                 patSat   R-squared:                       0.685
Model:                            OLS   Adj. R-squared:                  0.684
Method:                 Least Squares   F-statistic:                     490.2
Date:                Tue, 31 Jul 2018   Prob (F-statistic):               0.00
Time:                        20:30:58   Log-Likelihood:                -3218.3
No. Observations:                1811   AIC:                             6455.
Df Residuals:                    1802   BIC:                             6504.
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.2116      0.141     -1.504      0.1

In [65]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

In [57]:
EEmod1Result=mod1.fit_regularized(alpha=.5)  # Try changing alpha

In [58]:
print(EEmod1Result.params)

Intercept    0.000000
q2           0.217778
q3           0.153341
q4           0.126520
q5           0.486177
q6           0.000000
q7           0.000000
q8           0.000000
q9           0.000000
dtype: float64


In [67]:
# Here we'll iterate over values of alpha and save params for each alpha
alphList = []
alphas = np.arange(0, 0.55, 0.05)
mod = sm.OLS.from_formula(mod1formula, data=patSatDF)
for alpha in alphas:
    Result = mod.fit_regularized(alpha=alpha)
    alphList.append(Result.params)
Results = pd.DataFrame(alphList, index=[str(x) for x in alphas])
print(Results)

                     Intercept        q2        q3        q4        q5  \
0.0                   0.092279  0.136626  0.167548  0.138489  0.529250   
0.05                  0.000000  0.239431  0.000000  0.062438  0.499130   
0.1                   0.000000  0.243807  0.000000  0.037775  0.473152   
0.15000000000000002   0.000000  0.340779  0.000000  0.136884  0.519547   
0.2                   0.000000  0.299611  0.000000  0.075122  0.474417   
0.25                  0.000000  0.222406  0.148862  0.123075  0.495403   
0.30000000000000004   0.000000  0.202805  0.121089  0.085791  0.467058   
0.35000000000000003   0.000000  0.203001  0.115374  0.066651  0.443893   
0.4                   0.000000  0.220283  0.151797  0.125697  0.488505   
0.45                  0.000000  0.219049  0.152602  0.126192  0.487211   
0.5                   0.000000  0.217778  0.153341  0.126520  0.486177   

                           q6        q7        q8        q9  
0.0                  0.246529 -0.052442 -0.138334