## Workshop - Regularization

In this workshop, we are going to:

1. Tune an elastic-net regression 
2. Compare the following models:
    1. The null model
    2. The tuned elastic-net model
    3. The trimmed non-regularized model with standardized features
    4. The trimmed non-regularized model with non-standardized features
    
# Preliminaries

- Load any necessary packages and/or functions
- Load in and prepare the class data
- Create x and y with a label of `pct_d_rgdp`
- Create `x_train`, `x_test`, `y_train`, `y_test` with
    * training size of two-thirds
    * random state of 490
- Standardize the features
- Add constants

In [37]:
import numpy as np
from numpy.linalg import inv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn import linear_model as lm
from sklearn.model_selection import GridSearchCV, train_test_split

In [22]:
df = pd.read_pickle('class_data.pkl')
df_prepped = df.drop(columns = ['urate_bin', 'year', 'GeoName']).join([
    pd.get_dummies(df['urate_bin'], drop_first = True),
    pd.get_dummies(df.year, drop_first = True)    
])


In [23]:
y = df_prepped['pct_d_rgdp']
x = df_prepped.drop(columns = 'pct_d_rgdp')

In [24]:
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = 2/3, random_state = 490)

In [25]:

x_train_std = x_train.apply(lambda x: (x - np.mean(x))/np.std(x), axis = 0)
x_test_std  = x_test.apply(lambda x: (x - np.mean(x))/np.std(x), axis = 0)

x_train_std = sm.add_constant(x_train_std)
x_test_std  = sm.add_constant(x_test_std)
x_train     = sm.add_constant(x_train)
x_test      = sm.add_constant(x_test)

Take a look at `lm.ElasticNet?` and 
```
fit = sm.OLS(y_train, x_train)
fit.fit_regularized?
```
Determine which coefficients are the same, but named differently.
Specifically, $\alpha$ and the weight on the different constraints (i.e. $||\beta||_2$ and $||\beta||_1$).

In [28]:
fit_ridge = sm.OLS(y_train, x_train_std).fit_regularized(alpha = 10, L1_wt = 0)
fit_ridge.params

array([ 0.18029875,  0.07906755, -0.02013349,  0.09076773, -0.02174276,
       -0.0005661 , -0.03749642,  0.03020956, -0.02024074, -0.00095107,
        0.05522591, -0.00858626,  0.02204336,  0.01622809,  0.01651888,
        0.05921041, -0.01055714, -0.02435106, -0.05832944,  0.01867119,
        0.00071573, -0.02285545,  0.01496746, -0.01367612, -0.00468585,
       -0.03847915, -0.00920576,  0.00800213])

In [46]:
fit_lasso = sm.OLS(y_train, x_train_std).fit_regularized(alpha = 10, L1_wt = 1)
fit_lasso.params

const                0.0
pos_net_jobs         0.0
emp_estabs           0.0
estabs_entry_rate    0.0
estabs_exit_rate     0.0
pop                  0.0
pop_pct_black        0.0
pop_pct_hisp         0.0
lfpr                 0.0
density              0.0
lower                0.0
similar              0.0
2003                 0.0
2004                 0.0
2005                 0.0
2006                 0.0
2007                 0.0
2008                 0.0
2009                 0.0
2010                 0.0
2011                 0.0
2012                 0.0
2013                 0.0
2014                 0.0
2015                 0.0
2016                 0.0
2017                 0.0
2018                 0.0
dtype: float64

Perform a 5-fold cross-validation grid search with a random state of 490. 
Identify the optimally tuned hyperparameters.
Use this grid:
```
param_grid = {'alpha': 10.**np.arange(-5, -1, 1), 
              'l1_ratio': np.arange(0, 1, 0.1)}
```
You will get a warning message about convergence.
We will discuss it after the workshop.
Think about why it occuring.

In [48]:
param_grid = [
    {'alpha': 10.**np.arange(-5, -1, 1), 'l1_ratio': np.arange(0, 1, 0.1)}
]

# We are manually supplying an intercept
# and standardized (not normalized) the features
cv_lasso = lm.ElasticNet(fit_intercept = False, normalize = False,
                    random_state = 490)
grid_search = GridSearchCV(cv_lasso, param_grid, cv = 5,
                         scoring = 'neg_root_mean_squared_error')
grid_search.fit(x_train_std, y_train)
print(grid_search.best_params_)
best = grid_search.best_params_
best

  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


KeyboardInterrupt: 

****
# Question

How many models did we just fit?

***
Using the tuned hyperparameters, fit your elastic net model with `statsmodels`

Using the selected features refit

- the non-regularized model with standardized features
- the non-regularized model with non-standardized features

Compare the percent improvement from the null model RMSE to the elastic-net and OLS model.