**Created by:** Revekka Gershovich Date: Oct 21 Purpose: To merge cleaned data filed containing Federal and State partisan data compositions. 

Note that Republican and Democratic parties are actually a number of ancestor and closely aligned parties commonly considered together by political scientist merged into one. That is why there is data for Republican party before 1854. Democrats (1 in pres_gov_party) also contain a Jackson (Democrats) party (1829-1854). Republicans (2 in pres_gov_party) also contain Whig party (1834-1856), Anti-Jacksonian party (1824-1837), and Opposition Coalition which includes Whig + Republican + Free Soil parties (1850-1856). 

For more details see ICPRS 16 documentation for state partisan data (in 2_data/1_raw/political_data folder), and for more details on federal partisan data see cleanWikiFedData.ipynb in 1_code/econ_geog_poli_similarity_code/political_similarity_code. 

**Previous files:** cleanWikiFedData.ipynb and merging_Klaner_and_ICPSR.ipynb

In [36]:
import os
import os.path as path
import pandas as pd
import numpy as np

In [37]:
parent_dir = os.path.abspath("/Users/revekkagershovich/Dropbox (MIT)/StateLaws")
os.chdir(parent_dir)
assert os.path.exists(parent_dir), "parent_dir does not exist"
intermed_data_dir = "./2_data/2_intermediate/political_data"
assert os.path.exists(intermed_data_dir), "Data directory does not exist"
raw_data_dir = "./2_data/1_raw/political_data/all_partisanComposition"
assert os.path.exists(raw_data_dir), "Data directory does not exist"

In [38]:
# Loading data
state_df = pd.read_csv(path.join(intermed_data_dir, 'state_politicalComposition.csv'))
fed_df = pd.read_csv(path.join(intermed_data_dir, "federal_political_composition.csv"))

In [39]:
state_df['year']

0       1834
1       1834
2       1834
3       1834
4       1834
        ... 
8219    2020
8220    2020
8221    2020
8222    2020
8223    1974
Name: year, Length: 8224, dtype: int64

In [40]:
state_df['year'] = state_df['year'].astype('int64')
fed_df.rename(columns={'president_party': 'gov_party'}, inplace=True)
state_df.rename(columns={'gov_party': 'gov_party'}, inplace=True)


In [41]:
# Add a variable called 'state_abbrev' to state_df, and populate it with the value "Federal"
fed_df['state_abbrev'] = "FD"

In [42]:
# Concatenate the two DataFrames (along columns)
df = pd.concat([state_df, fed_df], axis=0, ignore_index=True)
df = df.sort_values(by=['year', 'state_abbrev']).reset_index(drop=True)
# df['index'] = df['index'].astype('int64')

In [43]:
print(df.head())

   year state_abbrev  gov_party  dem_upphse  rep_upphse  dem_lowhse  \
0  1833           FD        1.0    0.416667    0.541667    0.595833   
1  1834           AL        1.0         NaN         NaN         NaN   
2  1834           CT        NaN    0.190476    0.809524    0.242718   
3  1834           DE        1.0    0.333333    0.666667    0.333333   
4  1834           FD        1.0    0.500000    0.461538    0.590909   

   rep_lowhse  shr_dem_in_sess  shr_rep_in_sess  
0    0.262500         0.565972         0.309028  
1         NaN              NaN              NaN  
2    0.757282         0.237885         0.762115  
3    0.666667         0.333333         0.666667  
4    0.309917         0.574830         0.336735  


In [44]:
print(df["state_abbrev"].unique())

['FD' 'AL' 'CT' 'DE' 'GA' 'IL' 'IN' 'KY' 'LA' 'MA' 'MD' 'ME' 'MO' 'MS'
 'NC' 'NH' 'NJ' 'NY' 'OH' 'PA' 'RI' 'SC' 'TN' 'VA' 'VT' 'AR' 'MI' 'FL'
 'TX' 'IA' 'WI' 'CA' 'MN' 'OR' 'KS' 'WV' 'NV' 'NE' 'CO' 'MT' 'ND' 'SD'
 'WA' 'ID' 'WY' 'UT' 'OK' 'AZ' 'NM' 'AK' 'HI']


In [45]:
df.to_stata(path.join(intermed_data_dir, "political_composition.dta"), write_index=False)
df.to_csv(path.join(intermed_data_dir, "political_composition.csv"), index=False)