**Created by:** Revekka Gershovich
**When:** Nov 25, 2024
**Purpose:** 
**Question from Nicolas:** "Also a quick question regarding the political composition. By looking at the data I noticed that the presidency/governor and chamber compositions seem to have the wrong years (e.g., in 2008 the Dems hold both chambers and the Presidency but it is coded as 2010). I assume it's coming from a rounding but I wanted to make sure. Could you double check?"
**Answer:**  It turned out that by state we coded by year of election, and I coded for middle year. 
**Question from Nicolas:** 
When you have the time, could you look at which legislatures/governors we are missing using the biennium approach as we talked about? 



In [27]:
import os
import os.path as path
import pandas as pd
import numpy as np

In [28]:
parent_dir = os.path.abspath("/Users/revekkagershovich/Dropbox (MIT)/StateLaws")
os.chdir(parent_dir)
assert os.path.exists(parent_dir), "parent_dir does not exist"
data_dir = "./2_data/2_intermediate/political_data"
assert os.path.exists(data_dir), "Data directory does not exist"

In [29]:
df = pd.read_csv(path.join(data_dir, "political_composition.csv"))
state_df = pd.read_stata(path.join(data_dir, "political_composition_no_fed.dta"))
fed_df = pd.read_csv(path.join(data_dir, "federal_political_composition.csv"))

In [43]:
state_df.tail(2)

Unnamed: 0,state_abbrev,yr_rd2,shr_dem_in_sess,shr_rep_in_sess,dem_upphse,dem_lowhse,rep_upphse,rep_lowhse,gov_party
3902,WY,2018.0,0.125,0.875,0.1,0.15,0.9,0.85,2.0
3903,MS,2020.0,0.379414,0.620586,0.365385,0.393443,0.634615,0.606557,2.0


In [47]:
state_df[['dem_upphse', 'dem_lowhse', 'gov_party']].isnull().sum()
missing_rows = state_df[state_df[['dem_upphse', 'dem_lowhse', 'gov_party']].isnull().any(axis=1)]

Two reasons data for gubernatorial elections is missing: 
1. elections in odd years
2. non-republicans and non-democrats are missing

In [49]:
missing_rows.tail()

Unnamed: 0,state_abbrev,yr_rd2,shr_dem_in_sess,shr_rep_in_sess,dem_upphse,dem_lowhse,rep_upphse,rep_lowhse,gov_party
3824,NE,2014.0,,,,,,,2.0
3838,VA,2014.0,0.408377,0.591623,0.49359,0.323163,0.50641,0.676837,
3842,AK,2016.0,0.355128,0.644872,0.3,0.410256,0.7,0.589744,
3863,ND,2016.0,0.292553,0.707447,0.340426,0.244681,0.659574,0.755319,
3878,AK,2018.0,0.373684,0.626316,0.3,0.447368,0.7,0.552632,


In [30]:
fed_df.tail(10)

Unnamed: 0,yr_rd2,shr_dem_in_sess,shr_rep_in_sess,dem_lowhse,rep_lowhse,dem_upphse,rep_upphse,president_party
86,2006,0.527103,0.469159,0.535632,0.464368,0.49,0.49,2.0
87,2008,0.588785,0.411215,0.590805,0.409195,0.58,0.42,1.0
88,2010,0.456075,0.540187,0.443678,0.556322,0.51,0.47,1.0
89,2012,0.474766,0.521495,0.462069,0.537931,0.53,0.45,1.0
90,2014,0.433645,0.562617,0.432184,0.567816,0.44,0.54,1.0
91,2016,0.450467,0.547664,0.445977,0.554023,0.47,0.52,2.0
92,2018,0.525234,0.471028,0.54023,0.45977,0.46,0.52,2.0
93,2020,0.504673,0.491589,0.510345,0.489655,0.48,0.5,1.0
94,2022,0.48785,0.504673,0.489655,0.508046,0.48,0.49,1.0
95,2024,0.480374,0.506542,0.487356,0.501149,0.45,0.53,2.0


In [39]:
fed_df.tail(3)

Unnamed: 0,yr_rd2,shr_dem_in_sess,shr_rep_in_sess,dem_lowhse,rep_lowhse,dem_upphse,rep_upphse,president_party
93,2020,0.504673,0.491589,0.510345,0.489655,0.48,0.5,1.0
94,2022,0.48785,0.504673,0.489655,0.508046,0.48,0.49,1.0
95,2024,0.480374,0.506542,0.487356,0.501149,0.45,0.53,2.0


In [32]:
print(f"Percentage of Democrats in Senate in 2010 according to Wikipedia: {56/100} or {58/100}.")
print(f"Percentage of Democrats in Senate in 2010 according to fed_df: {fed_df.loc[fed_df['yr_rd2'] == 2010, 'dem_upphse'].values[0]}.")
print(f"Percentage of Democrats in Senate in 2010 according to df: {df.loc[(df['yr_rd2'] == 2010) & (df['state_abbrev'] == 'FD'), 'dem_upphse'].values[0]}.")

Percentage of Democrats in Senate in 2010 according to Wikipedia: 0.56 or 0.58.
Percentage of Democrats in Senate in 2010 according to fed_df: 0.51.
Percentage of Democrats in Senate in 2010 according to df: 0.51.


In [33]:
print(f"Percentage of Democrats in Senate in 2008 according to Wikipedia: {49/100}.")
print(f"Percentage of Democrats in Senate in 2008 according to fed_df: {fed_df.loc[fed_df['yr_rd2'] == 2008, 'dem_upphse'].values[0]}.")
print(f"Percentage of Democrats in Senate in 2008 according to df: {df.loc[(df['yr_rd2'] == 2008) & (df['state_abbrev'] == 'FD'), 'dem_upphse'].values[0]}.")

Percentage of Democrats in Senate in 2008 according to Wikipedia: 0.49.
Percentage of Democrats in Senate in 2008 according to fed_df: 0.58.
Percentage of Democrats in Senate in 2008 according to df: 0.58.


In [41]:
print(f"Percentage of Democrats in House in 2008 according to Wikipedia: {233/435}.")
print(f"Percentage of Democrats in House in 2008 according to fed_df: {fed_df.loc[fed_df['yr_rd2'] == 2006, 'dem_lowhse'].values[0]}.")
print(f"Percentage of Democrats in House in 2008 according to df: {df.loc[(df['yr_rd2'] == 2006) & (df['state_abbrev'] == 'FD'), 'dem_lowhse'].values[0]}.")

Percentage of Democrats in House in 2008 according to Wikipedia: 0.535632183908046.
Percentage of Democrats in House in 2008 according to fed_df: 0.535632183908046.
Percentage of Democrats in House in 2008 according to df: 0.535632183908046.


In [40]:
print(f"President's party in 2008 according to Wikipedia: Republican.")
print(f"President's party in 2008 according to fed_df: {fed_df.loc[fed_df['yr_rd2'] == 2008, 'president_party'].values[0]}.")

President's party in 2008 according to Wikipedia: Republican.
President's party in 2008 according to fed_df: 1.0.


In [None]:
print(f"Percentage of Democrats in Senate in 2008 election in MA according to Wikipedia: {35/40}.")
print(f"Percentage of Democrats in Senate in 2008 election in MA according to state_df: {state_df.loc[(state_df['yr_rd2'] == 2008) & (state_df['state_abbrev'] == 'MA'), 'dem_upphse'].values[0]}.")

Percentage of Democrats in Senate in 2008 election in MA according to Wikipedia: 0.875.
Percentage of Democrats in Senate in 2008 election in MA according to state_df: 0.875.


In [36]:
filt_state_df = state_df.loc[state_df['state_abbrev'] == 'MA']
print(filt_state_df.tail(10))

     state_abbrev  yr_rd2  shr_dem_in_sess  shr_rep_in_sess  dem_upphse  \
3376           MA  1996.0         0.764062         0.232813    0.750000   
3425           MA  1998.0         0.800000         0.190625    0.787500   
3474           MA  2000.0         0.810937         0.182813    0.812500   
3523           MA  2002.0         0.850000         0.134375    0.850000   
3572           MA  2004.0         0.850000         0.146875    0.850000   
3621           MA  2006.0         0.856250         0.140625    0.850000   
3670           MA  2008.0         0.876563         0.121875    0.875000   
3719           MA  2010.0         0.886541         0.113459    0.873397   
3768           MA  2012.0         0.849984         0.150016    0.898718   
3815           MA  2014.0         0.859375         0.140625    0.900000   

      dem_lowhse  rep_upphse  rep_lowhse  gov_party  
3376    0.778125    0.250000    0.215625        2.0  
3425    0.812500    0.200000    0.181250        2.0  
3474    0.80