## Workshop - Regularization

In this workshop, we are going to:

1. Tune an elastic-net regression 
2. Compare the following models:
    1. The null model
    2. The tuned elastic-net model
    3. The trimmed non-regularized model with standardized features
    4. The trimmed non-regularized model with non-standardized features
    
# Preliminaries

- Load any necessary packages and/or functions
- Load in and prepare the class data
- Create x and y with a label of `pct_d_rgdp`
- Create `x_train`, `x_test`, `y_train`, `y_test` with
    * training size of two-thirds
    * random state of 490
- Standardize the features
- Add constants

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn import linear_model as lm

In [2]:
df = pd.read_pickle('C:/Users/johnj/Documents/Data/aml in econ 02 spring 2021/class data/class_data.pkl')
df.columns

Index(['pct_d_rgdp', 'urate_bin', 'pos_net_jobs', 'emp_estabs',
       'estabs_entry_rate', 'estabs_exit_rate', 'pop', 'pop_pct_black',
       'pop_pct_hisp', 'lfpr', 'density', 'year'],
      dtype='object')

In [3]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,pct_d_rgdp,urate_bin,pos_net_jobs,emp_estabs,estabs_entry_rate,estabs_exit_rate,pop,pop_pct_black,pop_pct_hisp,lfpr,density,year
fips,year,GeoName,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1001,2002,"Autauga, AL",3.202147,lower,1,12.531208,11.268,9.256,45909.0,17.386569,1.611884,74.841638,77.231178,2002
1001,2003,"Autauga, AL",1.434404,lower,1,12.598415,10.603,9.940,46800.0,17.493590,1.692308,75.093851,78.730077,2003
1001,2004,"Autauga, AL",15.061365,lower,1,12.780078,11.140,8.519,48366.0,17.584667,1.796717,74.459624,81.364507,2004
1001,2005,"Autauga, AL",0.333105,higher,1,12.856784,11.735,8.673,49676.0,17.612127,1.986875,74.920228,83.568276,2005
1001,2006,"Autauga, AL",7.440034,higher,1,12.832506,10.645,8.766,51328.0,17.898613,2.032029,73.641001,86.347380,2006
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56045,2014,"Weston, WY",2.055429,lower,1,8.410000,9.694,5.612,7138.0,1.204819,3.950686,87.627044,2.976537,2014
56045,2015,"Weston, WY",12.558802,lower,0,8.415385,6.076,8.608,7208.0,1.054384,3.953940,86.978480,3.005727,2015
56045,2016,"Weston, WY",-10.381257,similar,0,7.644231,13.896,7.444,7220.0,1.038781,4.099723,87.816245,3.010731,2016
56045,2017,"Weston, WY",-0.153371,lower,0,7.808081,5.941,9.901,6968.0,1.248565,4.118829,87.065369,2.905647,2017


In [4]:
df_prepped = df.drop(columns = ['urate_bin', 'year']).join([
    pd.get_dummies(df['urate_bin'], drop_first = True),
    pd.get_dummies(df.year, drop_first = True)    
])

In [5]:
y = df_prepped['pct_d_rgdp']
x = df_prepped.drop(columns = 'pct_d_rgdp')

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = 2/3, random_state = 490)

x_train_std = x_train.apply(lambda x: (x - np.mean(x))/np.std(x), axis = 0)
x_test_std  = x_test.apply(lambda x: (x - np.mean(x))/np.std(x), axis = 0)

x_train_std = sm.add_constant(x_train_std)
x_test_std  = sm.add_constant(x_test_std)
x_train     = sm.add_constant(x_train)
x_test      = sm.add_constant(x_test)

Take a look at `lm.ElasticNet?` and 
```
fit = sm.OLS(y_train, x_train)
fit.fit_regularized?
```
Determine which coefficients are the same, but named differently.
Specifically, $\alpha$ and the weight on the different constraints (i.e. $||\beta||_2$ and $||\beta||_1$).

In [5]:
# lm.ElasticNet?

In [6]:
fit = sm.OLS(y_train, x_train)
# fit.fit_regularized?

Perform a 5-fold cross-validation grid search with a random state of 490. 
Identify the optimally tuned hyperparameters.
Use this grid:
```
param_grid = {'alpha': 10.**np.arange(-5, -1, 1), 
              'l1_ratio': np.arange(0, 1, 0.1)}
```
You will get a warning message about convergence.
We will discuss it after the workshop.
Think about why it occuring.

In [6]:
param_grid = {'alpha': 10.**np.arange(-5, -1, 1), 
              'l1_ratio': np.arange(0, 1, 0.1)}

cv_enet = lm.ElasticNet(fit_intercept = False, normalize = False,
                        random_state = 490)
grid_search = GridSearchCV(cv_enet, param_grid, cv = 5,
                         scoring = 'neg_root_mean_squared_error',
                          n_jobs = 8)
grid_search.fit(x_train_std, y_train)
print(grid_search.best_params_)
best = grid_search.best_params_
best

{'alpha': 0.01, 'l1_ratio': 0.0}


  model = cd_fast.enet_coordinate_descent(


{'alpha': 0.01, 'l1_ratio': 0.0}

****
# Question

How many models did we just fit?

In [7]:
len(np.arange(-5, -1, 1)) * len(np.arange(0, 1, 0.1)) * 5

200

***
Using the tuned hyperparameters, fit your elastic net model with `statsmodels`

In [9]:
fit_enet = sm.OLS(y_train, x_train_std).fit_regularized(alpha = best['alpha'], L1_wt = best['l1_ratio'])
fit_enet.params

const                1.983286
pos_net_jobs         0.572045
emp_estabs          -0.183756
estabs_entry_rate    0.876108
estabs_exit_rate    -0.539253
pop                 -0.102356
pop_pct_black        0.023722
pop_pct_hisp         0.315638
lfpr                 0.489344
density             -0.007976
lower                0.611242
similar              0.261767
2003                -0.091758
2004                -0.163169
2005                -0.019092
2006                 0.356963
2007                -0.415763
2008                -0.447180
2009                -0.635916
2010                 0.080947
2011                -0.184657
2012                -0.479680
2013                -0.061949
2014                -0.409004
2015                -0.328123
2016                -0.679548
2017                -0.310168
2018                -0.158566
dtype: float64

Using the selected features refit

- the non-regularized model with standardized features
- the non-regularized model with non-standardized features

In [10]:
fit_ols_std = sm.OLS(y_train, x_train_std).fit()
fit_ols     = sm.OLS(y_train, x_train).fit()

Compare the percent improvement from the null model RMSE to the elastic-net and OLS model.

In [11]:
rmse_null = np.sqrt(np.mean(  (y_test - np.mean(y_train))**2  ))

In [12]:
rmse_enet = np.sqrt(np.mean(  (y_test - fit_enet.predict(x_test_std))**2  ))
print(rmse_enet)
round((rmse_enet-rmse_null)/rmse_null*100, 2)

9.215449872590813


-2.0

In [13]:
rmse_ols_std = np.sqrt(np.mean(  (y_test - fit_ols_std.predict(x_test_std))**2  ))
print(rmse_ols_std)
round((rmse_ols_std - rmse_null)/rmse_null*100, 2)

9.215449872590813


-2.0

In [14]:
rmse_ols = np.sqrt(np.mean(  (y_test - fit_ols.predict(x_test))**2  ))
print(rmse_ols)
round((rmse_ols - rmse_null)/rmse_null*100, 2)

9.215378381791668


-2.0