Created by: Revekka Gershovich Date: Oct 21 Purpose: To merge cleaned data filed containing Federal and State partisan data compositions. 

Note that Republican and Democratic parties are actually a number of ancestor and closely aligned parties commonly considered together by political scientist merged into one. That is why there is data for Republican party before 1854. Democrats (1 in pres_gov_party) also contain a Jackson (Democrats) party (1829-1854). Republicans (2 in pres_gov_party) also contain Whig party (1834-1856), Anti-Jacksonian party (1824-1837), and Opposition Coalition which includes Whig + Republican + Free Soil parties (1850-1856). 

For more details see ICPRS 16 documentation for state partisan data (in 2_data/1_raw/political_data folder), and for more details on federal partisan data see CleaningFederalPartisanData.ipynb in 1_code/econ_geog_poli_similarity_code/political_similarity_code. 

In [19]:
import os
import os.path as path
import pandas as pd
import numpy as np

In [20]:
# parent_dir = os.path.abspath("/Users/rivka666/Dropbox (MIT)/StateLaws")
parent_dir = os.path.abspath("/Users/revekkagershovich/Dropbox (MIT)/StateLaws")
os.chdir(parent_dir)
assert os.path.exists(parent_dir), "parent_dir does not exist"
data_dir = "./2_data/2_intermediate/political_data"
assert os.path.exists(data_dir), "Data directory does not exist"

In [21]:
# Loading data
state_df = pd.read_stata(path.join(data_dir, "political_composition_no_fed.dta"))
fed_df = pd.read_csv(path.join(data_dir, "federal_political_composition.csv"))

In [22]:
state_df['yr_rd2'] = state_df['yr_rd2'].astype('int64')
fed_df.rename(columns={'president_party': 'gov_party'}, inplace=True)
state_df.rename(columns={'gov_party': 'gov_party'}, inplace=True)


In [23]:
# Add a variable called 'state_abbrev' to state_df, and populate it with the value "Federal"
fed_df['state_abbrev'] = "FD"

In [24]:
# Concatenate the two DataFrames (along columns)
df = pd.concat([state_df, fed_df], axis=0, ignore_index=True)
df = df.sort_values(by=['yr_rd2', 'state_abbrev']).reset_index(drop=True)
# df['index'] = df['index'].astype('int64')

In [25]:
print(df.head())

  state_abbrev  yr_rd2  shr_dem_in_sess  shr_rep_in_sess  dem_upphse  \
0           AL    1834         0.527778         0.472222    0.555556   
1           CT    1834         0.216597         0.783403    0.190476   
2           DE    1834         0.333333         0.666667    0.333333   
3           FD    1834         0.565972         0.309028    0.416667   
4           GA    1834         0.704487         0.295513    0.692308   

   dem_lowhse  rep_upphse  rep_lowhse  gov_party  
0    0.500000    0.444444    0.500000        1.0  
1    0.242718    0.809524    0.757282        NaN  
2    0.333333    0.666667    0.666667        1.0  
3    0.595833    0.541667    0.262500        1.0  
4    0.716667    0.307692    0.283333        NaN  


In [26]:
print(df["state_abbrev"].unique())

['AL' 'CT' 'DE' 'FD' 'GA' 'IL' 'IN' 'KY' 'LA' 'MA' 'MD' 'ME' 'MO' 'MS'
 'NC' 'NH' 'NJ' 'NY' 'OH' 'PA' 'RI' 'SC' 'TN' 'VA' 'VT' 'AR' 'MI' 'FL'
 'IA' 'TX' 'WI' 'CA' 'MN' 'OR' 'KS' 'NV' 'WV' 'NE' 'CO' 'ID' 'MT' 'ND'
 'SD' 'WA' 'WY' 'UT' 'OK' 'AZ' 'NM' 'HI' 'AK']


In [27]:
df.to_stata(path.join(data_dir, "political_composition.dta"), write_index=False)
df.to_csv(path.join(data_dir, "political_composition.csv"), index=False)