## Workshop - Regularization

In this workshop, we are going to:

1. Tune an elastic-net regression 
2. Compare the following models:
    1. The null model
    2. The tuned elastic-net model
    3. The trimmed non-regularized model with standardized features
    4. The trimmed non-regularized model with non-standardized features
    
# Preliminaries

- Load any necessary packages and/or functions
- Load in and prepare the class data
- Create x and y with a label of `pct_d_rgdp`
- Create `x_train`, `x_test`, `y_train`, `y_test` with
    * training size of two-thirds
    * random state of 490
- Standardize the features
- Add constants

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn import linear_model as lm

In [2]:
df = pd.read_pickle("C:/Users/dp846/OneDrive/Desktop/ECON490ML/class data/class_data.pkl")
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,pct_d_rgdp,urate_bin,pos_net_jobs,emp_estabs,estabs_entry_rate,estabs_exit_rate,pop,pop_pct_black,pop_pct_hisp,lfpr,density,year
fips,year,GeoName,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1001,2002,"Autauga, AL",3.202147,lower,1,12.531208,11.268,9.256,45909.0,17.386569,1.611884,74.841638,77.231178,2002
1001,2003,"Autauga, AL",1.434404,lower,1,12.598415,10.603,9.940,46800.0,17.493590,1.692308,75.093851,78.730077,2003
1001,2004,"Autauga, AL",15.061365,lower,1,12.780078,11.140,8.519,48366.0,17.584667,1.796717,74.459624,81.364507,2004
1001,2005,"Autauga, AL",0.333105,higher,1,12.856784,11.735,8.673,49676.0,17.612127,1.986875,74.920228,83.568276,2005
1001,2006,"Autauga, AL",7.440034,higher,1,12.832506,10.645,8.766,51328.0,17.898613,2.032029,73.641001,86.347380,2006
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56045,2014,"Weston, WY",2.055429,lower,1,8.410000,9.694,5.612,7138.0,1.204819,3.950686,87.627044,2.976537,2014
56045,2015,"Weston, WY",12.558802,lower,0,8.415385,6.076,8.608,7208.0,1.054384,3.953940,86.978480,3.005727,2015
56045,2016,"Weston, WY",-10.381257,similar,0,7.644231,13.896,7.444,7220.0,1.038781,4.099723,87.816245,3.010731,2016
56045,2017,"Weston, WY",-0.153371,lower,0,7.808081,5.941,9.901,6968.0,1.248565,4.118829,87.065369,2.905647,2017


In [3]:
df_prepped = df.drop(columns = ['urate_bin', 'year']).join([
    pd.get_dummies(df['urate_bin'], drop_first = True),
    pd.get_dummies(df.year, drop_first = True)    
])

In [4]:
y = df_prepped['pct_d_rgdp']
x = df_prepped.drop(columns = 'pct_d_rgdp')

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = 2/3, random_state = 490)

x_train_std = x_train.apply(lambda x: (x - np.mean(x))/np.std(x), axis = 0)
x_test_std  = x_test.apply(lambda x: (x - np.mean(x))/np.std(x), axis = 0)

x_train_std = sm.add_constant(x_train_std)
x_test_std  = sm.add_constant(x_test_std)
x_train     = sm.add_constant(x_train)
x_test      = sm.add_constant(x_test)

Take a look at `lm.ElasticNet?` and 
```
fit = sm.OLS(y_train, x_train)
fit.fit_regularized?
```
Determine which coefficients are the same, but named differently.
Specifically, $\alpha$ and the weight on the different constraints (i.e. $||\beta||_2$ and $||\beta||_1$).

In [5]:
# lm.ElasticNet?
fit = sm.OLS(y_train, x_train)
fit.fit_regularized?

[1;31mSignature:[0m
[0mfit[0m[1;33m.[0m[0mfit_regularized[0m[1;33m([0m[1;33m
[0m    [0mmethod[0m[1;33m=[0m[1;34m'elastic_net'[0m[1;33m,[0m[1;33m
[0m    [0malpha[0m[1;33m=[0m[1;36m0.0[0m[1;33m,[0m[1;33m
[0m    [0mL1_wt[0m[1;33m=[0m[1;36m1.0[0m[1;33m,[0m[1;33m
[0m    [0mstart_params[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mprofile_scale[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mrefit[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [1;33m**[0m[0mkwargs[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a regularized fit to a linear regression model.

Parameters
----------
method : str
    Either 'elastic_net' or 'sqrt_lasso'.
alpha : scalar or array_like
    The penalty weight.  If a scalar, the same penalty weight
    applies to all variables in the model.  If a vector, it
    must have the same length as `params`, and contains a
    penalty weigh

Perform a 5-fold cross-validation grid search with a random state of 490. 
Identify the optimally tuned hyperparameters.
Use this grid:
```
param_grid = {'alpha': 10.**np.arange(-5, -1, 1), 
              'l1_ratio': np.arange(0, 1, 0.1)}
```
You will get a warning message about convergence.
We will discuss it after the workshop.
Think about why it occuring.

****
# Question

How many models did we just fit?

***
Using the tuned hyperparameters, fit your elastic net model with `statsmodels`

Using the selected features refit

- the non-regularized model with standardized features
- the non-regularized model with non-standardized features

Compare the percent improvement from the null model RMSE to the elastic-net and OLS model.