Created by: Revekka Gershovich
Date: Oct 2024
Purpose: Cleaning federal partisan data, and converting it into democratic and republican shares

In [65]:
import os
import os.path as path
import pandas as pd
import numpy as np

In [66]:
# parent_dir = os.path.abspath("/Users/rivka666/Dropbox (MIT)/StateLaws")
parent_dir = os.path.abspath("/Users/revekkagershovich/Dropbox (MIT)/StateLaws")
os.chdir(parent_dir)
assert os.path.exists(parent_dir), "parent_dir does not exist"

The data is downloaded from https://voteview.com/data

Here is the dictionary of all the variables: https://voteview.com/articles/data_help_members

Here is the dictionary specifically of congressional parties: https://voteview.com/articles/data_help_parties

In [67]:
fedPartDt = pd.read_csv("2_data/1_raw/political_data/FederalPartisanData.csv")
fedPartDt = fedPartDt.loc[:, ['congress', 'chamber', 'party_code']]
# Create a dictionary that maps congress number to the years it spans
congress_years = {i: 1789 + 2 * (i - 1) + 1 for i in range(1, 119)} 
# I want the years to be even (i.e. second (middle) year of congress) to keep the dataset in line with state partisan composition data
fedPartDt['yr_rd2'] = fedPartDt['congress'].map(congress_years)
fedPartDt.drop('congress', axis=1, inplace=True)
fedPartDt = fedPartDt[fedPartDt['yr_rd2'] >= 1834]
# Filter to after 1834 because the main political composition data starts after that
print(fedPartDt.head(10))

        chamber  party_code  yr_rd2
4543  President         555    1834
4544      House         555    1834
4545      House          44    1834
4546      House         555    1834
4547      House         555    1834
4548      House         555    1834
4549      House        1275    1834
4550      House        1275    1834
4551      House        1275    1834
4552      House        1275    1834


In the next two cells, I first assert that all parties in question appear in the time periods when they existed, and then I am remapping some of the parties to republican party or democratic parties, i.e. I remapp parties that political scientists commonly considered together in history as ancestors of those major modern US Congress parties to those modern parties. I do this partly to simplyify the dataset, and make the political similarity generation possible but also in order to keep this federal partisan composition dataset in line with the ICPSR state partisan composition dataset. Here is decoding of the mapping I am using. 

Mapping to democrats (1 in the resulting dataset): 100: Dem (1828-present), 555: Jackson party (1829-1854)	
Mapping to republicans (2 in the resulting dataset): 200: Rep (1854-present), 29: Whig (1834-1856), 1275: Anti-Jacksonian (1824-1837), 3333: Opposition Coalition (Whig + Republican + Free Soil: 1850-1856)

In [68]:
assert (fedPartDt.loc[fedPartDt['party_code'] == 555, 'yr_rd2'].between(1829, 1854)).all(), "Jackson (democrats) party appears in the period when it did not exist"
assert (fedPartDt.loc[fedPartDt['party_code'] == 100, 'yr_rd2'] >= 1829).all(), "Democratic party appears in the period when it did not exist"
assert (fedPartDt.loc[fedPartDt['party_code'] == 200, 'yr_rd2'] >= 1854).all(), "Republican party appears in the period when it did not exist"
assert (fedPartDt.loc[fedPartDt['party_code'] == 29, 'yr_rd2'].between(1834,1856)).all(), "Whig party appears in the period when it did not exist"
assert (fedPartDt.loc[fedPartDt['party_code'] == 1275, 'yr_rd2'].between(1824,1837)).all(), "Anti-Jacksonian party appears in the period when it did not exist"
assert (fedPartDt.loc[fedPartDt['party_code'] == 3333, 'yr_rd2'].between(1850,1856)).all(), "Opposition coalition appears in the period when it did not exist"

In [69]:
fedPartDt['party_code'] = fedPartDt['party_code'].replace({100: 1, 555: 1, 200: 2, 29: 2, 1275: 2, 3333: 2}) 
# fedPartDt.loc[~fedPartDt['party_code'].isin([1, 2]), 'party_code'] = pd.NA
print(fedPartDt['party_code'].value_counts())

party_code
1       23782
2       21403
310        89
340        83
328        67
206        60
26         55
370        48
203        40
329        40
537        39
44         28
326        28
300        26
331        23
3334       19
208        16
354        14
112        13
4444       12
1060       11
114         9
108         8
522         7
380         7
347         3
46          3
403         3
603         2
37          2
356         2
355         2
117         2
1111        1
1116        1
213         1
523         1
402         1
Name: count, dtype: int64


In [70]:
president = fedPartDt[fedPartDt['chamber'] == 'President']
print(president.tail(5))

         chamber  party_code  yr_rd2
47733  President           1    2016
48275  President           2    2018
48831  President           2    2020
49385  President           2    2022
49386  President           1    2022


In [71]:
senate_totals = fedPartDt[fedPartDt['chamber'] == 'Senate'].groupby('yr_rd2')['party_code'].value_counts().unstack().fillna(0)
house_totals = fedPartDt[fedPartDt['chamber'] == 'House'].groupby('yr_rd2')['party_code'].value_counts().unstack().fillna(0)

senate_dems_totals = senate_totals[1]
senate_reps_totals = senate_totals[2]

house_dems_totals = house_totals[1]
house_reps_totals = house_totals[2]

totals_df = pd.concat([senate_dems_totals, senate_reps_totals, house_dems_totals, house_reps_totals], axis=1)
totals_df.columns = ['Senate Democrats Totals', 'Senate Republicans Totals', 'House Democrats Totals', 'House Republicans Totals']
totals_df.index.name = 'Year'
totals_df.reset_index(inplace=True)

print(totals_df.tail())

    Year  Senate Democrats Totals  Senate Republicans Totals  \
91  2016                     44.0                       54.0   
92  2018                     48.0                       55.0   
93  2020                     46.0                       54.0   
94  2022                     49.0                       51.0   
95  2024                     49.0                       49.0   

    House Democrats Totals  House Republicans Totals  
91                   190.0                     251.0  
92                   200.0                     250.0  
93                   241.0                     208.0  
94                   232.0                     223.0  
95                   219.0                     229.0  


In [72]:
# Function to calculate the proportion of 1s in party_code
def calculate_proportion_of_dems(df, chamber):
    filtered_df = df[df['chamber'] == chamber]
    proportion = filtered_df.groupby('yr_rd2')['party_code'].apply(lambda x: (x == 1).mean())
    return proportion

def calculate_proportion_of_reps(df, chamber):
    filtered_df = df[df['chamber'] == chamber]
    proportion = filtered_df.groupby('yr_rd2')['party_code'].apply(lambda x: (x == 2).mean())
    return proportion

# Calculate proportions for House, Senate, and President
dem_lowhouse = calculate_proportion_of_dems(fedPartDt, 'House')
dem_upphouse = calculate_proportion_of_dems(fedPartDt, 'Senate')
president_party = fedPartDt[fedPartDt['chamber'] == 'President'].groupby('yr_rd2')['party_code'].last()

rep_lowhouse = calculate_proportion_of_reps(fedPartDt, 'House')
rep_upphouse = calculate_proportion_of_reps(fedPartDt, 'Senate')

# Calculate the proportion of 1s for all sessions (House and Senate)
shr_dem_in_session = fedPartDt[(fedPartDt['chamber'] == 'House') | 
                                  (fedPartDt['chamber'] == 'Senate')].groupby('yr_rd2')['party_code'].apply(lambda x: (x == 1).mean())
shr_rep_in_session = fedPartDt[(fedPartDt['chamber'] == 'House') | 
                                  (fedPartDt['chamber'] == 'Senate')].groupby('yr_rd2')['party_code'].apply(lambda x: (x == 2).mean())

# Combine all proportions into a single DataFrame by aligning indices on 'yr_rd2'
fed_affiliations = pd.concat([
    shr_dem_in_session.rename('shr_dem_in_sess'),
    shr_rep_in_session.rename('shr_rep_in_sess'),
    dem_lowhouse.rename('dem_lowhse'),
    rep_lowhouse.rename('rep_lowhse'),
    dem_upphouse.rename('dem_upphse'),
    rep_upphouse.rename('rep_upphse'),
    president_party.rename('president_party')  # Now using the actual president's party from the data
], axis=1)

# Reset the index to ensure 'yr_rd2' is a column
fed_affiliations = fed_affiliations.reset_index()


In [73]:
# Print the proportions dataframe
print(fed_affiliations.head(5))

   yr_rd2  shr_dem_in_sess  shr_rep_in_sess  dem_lowhse  rep_lowhse  \
0    1834         0.557325         0.321656    0.581395    0.279070   
1    1836         0.575472         0.342767    0.584314    0.321569   
2    1838         0.548896         0.406940    0.519380    0.426357   
3    1840         0.504732         0.470032    0.498054    0.470817   
4    1842         0.411950         0.581761    0.409266    0.583012   

   dem_upphse  rep_upphse  president_party  
0    0.446429    0.517857              1.0  
1    0.539683    0.428571              1.0  
2    0.677966    0.322034              1.0  
3    0.533333    0.466667              1.0  
4    0.423729    0.576271              2.0  


In [74]:
output_dir = "/Users/rivka666/Dropbox (MIT)/StateLaws/2_data/2_intermediate/political_data"
output_file = path.join(output_dir, "federal_political_composition.csv")
fed_affiliations.to_csv(output_file, index=False)