## Workshop - Regression-Based Classification

Does `statsmodels` marginal effect use the average of covariates or the average predicted values? 
- Use the class data.
- Show your work.

Load the necessary packages and data:

In [1]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

In [2]:
df = pd.read_pickle('C:/Users/johnj/Documents/Data/aml in econ 02 spring 2021/class data/class_data.pkl')
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,pct_d_rgdp,urate_bin,pos_net_jobs,emp_estabs,estabs_entry_rate,estabs_exit_rate,pop,pop_pct_black,pop_pct_hisp,lfpr,density,year
fips,year,GeoName,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1001,2002,"Autauga, AL",3.202147,lower,1,12.531208,11.268,9.256,45909.0,17.386569,1.611884,74.841638,77.231178,2002
1001,2003,"Autauga, AL",1.434404,lower,1,12.598415,10.603,9.94,46800.0,17.49359,1.692308,75.093851,78.730077,2003
1001,2004,"Autauga, AL",15.061365,lower,1,12.780078,11.14,8.519,48366.0,17.584667,1.796717,74.459624,81.364507,2004
1001,2005,"Autauga, AL",0.333105,higher,1,12.856784,11.735,8.673,49676.0,17.612127,1.986875,74.920228,83.568276,2005
1001,2006,"Autauga, AL",7.440034,higher,1,12.832506,10.645,8.766,51328.0,17.898613,2.032029,73.641001,86.34738,2006


Fit a logistic regression using either `sm.Logit()` or `smf.logit()`.

In [3]:
fit_logit = smf.logit(data = df, formula = 'pos_net_jobs ~ pct_d_rgdp + estabs_entry_rate').fit()

Optimization terminated successfully.
         Current function value: 0.667049
         Iterations 5


Get the marginal effects (`.get_margeff()`). Print the summary (`.summary()`).

In [4]:
fit_logit.get_margeff().summary()

0,1
Dep. Variable:,pos_net_jobs
Method:,dydx
At:,overall

Unnamed: 0,dy/dx,std err,z,P>|z|,[0.025,0.975]
pct_d_rgdp,0.0048,0.0,18.276,0.0,0.004,0.005
estabs_entry_rate,0.0282,0.001,37.372,0.0,0.027,0.03


***
# Covariate Averages
$$
\frac{\partial p(x_i)}{\partial \beta} \approx \frac{e^{\hat{\beta}_0 + \bar{x}\hat{\beta}_1 + \bar{x}\hat{\beta_2}}}{(1 + e^{\hat{\beta}_0 + \bar{x}\hat{\beta}_1 + \bar{x}\hat{\beta_2}})^2}\hat{\beta}
$$

In [5]:
beta = fit_logit.params
avgs = np.array([1., np.mean(df.pct_d_rgdp), np.mean(df.estabs_entry_rate)])

In [6]:
(  np.exp(sum(beta*avgs))  )/(  (1 + np.exp(beta*avgs))**2  )*beta

Intercept           -0.574506
pct_d_rgdp           0.006311
estabs_entry_rate    0.009561
dtype: float64

***
# Predicted values Averages
$$
\frac{\partial p(x_i)}{\partial \beta} \approx \frac{1}{n} \sum_{i=1}
^n \frac{e^{\hat{y}_i}}{1 + e^{\hat{y}_i}}\hat{\beta}
$$

In [7]:
yhat = fit_logit.fittedvalues

In [8]:
np.mean(  (np.exp(yhat))/(  (1 + np.exp(yhat)))**2  )*beta

Intercept           -0.209411
pct_d_rgdp           0.004793
estabs_entry_rate    0.028197
dtype: float64

*** 
# Interpretaton

Interpret the marginal effect on one features.