## Transformations in Political Economy - Technological Change and Populism (POL63102)
### Coding Session 5: Fixed Effects

---
This document guides you through coding session 5. Please try to follow the instructions on your own PC and feel free to ask questions if something is unclear. After this session you should be able to do the following:

- Implement Fixed Effects Regression in Python
- Change table formatting to omit fixed effect coefficients and add notes
---

This time we will load the full "house_2002_2016.dta" data set from Autor et al (2020):

In [3]:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import os
from pathlib import Path
from stargazer.stargazer import Stargazer

df = pd.read_stata('C:/Users/felix/Dropbox/HfP/Teaching/SoSe21/Populism_Course/Autor_et_al_2020/2-FinalDataPackage/dta/house_2002_2016.dta')

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3772 entries, 0 to 3771
Columns: 120 entries, congressionaldistrict to redistrict_2016
dtypes: float32(116), float64(3), object(1)
memory usage: 1.8+ MB


There are 120 variables and 3,771 observations.
Importantly, there is a variable for county and district the county-district cell lies in. Let's look at the first five entries for these variables using the *head()* method:

In [9]:
df[["cty_fips" , "congressionaldistrict"]].head()

Unnamed: 0,cty_fips,congressionaldistrict
0,1001.0,AL 2
1,1003.0,AL 1
2,1005.0,AL 2
3,1007.0,AL 6
4,1009.0,AL 4


Note that values of the variable *congressionaldistrict* are not numbers. How many unique counties and districts are there in the data?

In [21]:
print("There are", len(df["cty_fips"].unique()), 
      "unique counties, and ", 
      len(df["congressionaldistrict"].unique()), 
      "unique districts in the data.")

There are 3108 unique counties, and  432 unique districts in the data.


We might implement district fixed effects by adding a district dummy for each district to the regression as below:

In [4]:
reg_fe = smf.ols('d2_shnr_2002_2016 ~ d_imp_usch_pd + l_shind_manuf_cbp + C(congressionaldistrict)', data=df).fit(cov_type='HC1')
print(reg_fe.summary())

                            OLS Regression Results                            
Dep. Variable:      d2_shnr_2002_2016   R-squared:                       0.804
Model:                            OLS   Adj. R-squared:                  0.779
Method:                 Least Squares   F-statistic:                     722.6
Date:                Fri, 11 Jun 2021   Prob (F-statistic):               0.00
Time:                        18:48:34   Log-Likelihood:                -14547.
No. Observations:                3767   AIC:                         2.996e+04
Df Residuals:                    3333   BIC:                         3.267e+04
Df Model:                         433                                         
Covariance Type:                  HC1                                         
                                        coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------------------
Interc



However, note that the output is very long. As we are not interested in coefficients on each single district dummy, let's produce stargazer output just with the coefficients we are interested in, noting in the table that district fixed effects are applied:

In [None]:
# help(Stargazer)

In [31]:
stargazer_tab = Stargazer([reg_fe])
stargazer_tab.covariate_order({'d_imp_usch_pd': 'Import Competition', 'l_shind_manuf_cbp': 'Share Employed in Manufacturing'})
stargazer_tab.add_custom_notes(['With District Fixed Effects', 'Robust Standard Errors'])
stargazer_tab.dependent_variable_name("Delta Republican Vote Share 2002-2016")
stargazer_tab.rename_covariates({'d_imp_usch_pd': 'Import Competition', 'l_shind_manuf_cbp': 'Share Employed in Manufacturing'})
stargazer_tab

0,1
,
,Delta Republican Vote Share 2002-2016d2_shnr_2002_2016
,
,(1)
,
Import Competition,-0.475
,(0.540)
Share Employed in Manufacturing,17.453***
,(3.978)
Observations,3767


**Exercise:** How does the regression coefficient on import competition change? What is the reason for this?

---
**Congratulations! This is the end of coding session 5.**