# Milestone 1: Congressional Voting Patterns and Trends
Author: Leonardo Matone 
Webpage: https://v993.github.io/

## Project Description:
Congress has increasingly become more polarized and complicated in recent decades, and these preliminary investigations are a testament to that. Understanding and quantifying politicians specifically contextualized against one another could be a useful tool towards understanding where politicians lie (no pun intended), and how they can be described. 


### Data/Resources:

I have found two sources of congressional data to investigate so far, but I do hope to find more. My two sources are from Kaggle, and can be loaded into a dataframe with ease:

1. [Political Polarization of Congress](https://www.kaggle.com/code/justin2028/political-polarization-us-congress-data-analysis/input)
- Is there a relationship between the age of a representative and their polarity?
- Is there a particular demographic which a highly polarized representative may appeal to?
2. [Congress’ Voting Patterns for/against Trump](https://www.kaggle.com/datasets/fivethirtyeight/trump-score)
- Can we predict party or polarization from voting patterns for/against Trump? 
- Can we estimate polarity from voting patterns?
3. [Election, COVID, and Demographic Data by County](https://www.kaggle.com/datasets/etsc9287/2020-general-election-polls)
- Is there are correlation between counties which voted for Trump and representatives who vote with him?

#### ETL Political Polarization of Congress:
Load Kaggle data into a dataframe and do preliminary cleaning and data examination:

In [2]:
import pandas as pd
import matplotlib.pyplot as plt


### [Political Polarization in Congress](https://www.kaggle.com/code/justin2028/political-polarization-us-congress-data-analysis)

In [17]:
political_polarization_df = pd.read_csv("all_congress_polarization.csv")

In [41]:
mask = (political_polarization_df["congress"] == 116) & (political_polarization_df["chamber"] != "President") 
single_congress_df = political_polarization_df[ mask ].drop(["died", "conditional", "occupancy", "last_means"], axis=1)
display(single_congress_df.head()) # [single_congress_df.isna().any(axis=1)]
# display(single_congress_df.dtypes)
# display(single_congress_df.isna().sum())
len(single_congress_df)

Unnamed: 0,congress,chamber,icpsr,state_icpsr,district_code,state_abbrev,party_code,bioname,bioguide_id,born,nominate_dim1,nominate_dim2,nominate_log_likelihood,nominate_geo_mean_probability,nominate_number_of_votes,nominate_number_of_errors,nokken_poole_dim1,nokken_poole_dim2
48832,116,House,20301,41,3,AL,200,"ROGERS, Mike Dennis",R000575,1958.0,0.361,0.462,-166.79214,0.80364,763.0,69.0,0.52,0.388
48833,116,House,21102,41,7,AL,100,"SEWELL, Terri",S001185,1965.0,-0.393,0.398,-28.40094,0.96464,789.0,11.0,-0.43,0.384
48834,116,House,21192,41,2,AL,200,"ROBY, Martha",R000591,1976.0,0.362,0.658,-90.42097,0.88244,723.0,31.0,0.346,0.672
48835,116,House,21193,41,5,AL,200,"BROOKS, Mo",B001274,1954.0,0.652,-0.417,-140.71682,0.83962,805.0,57.0,0.772,-0.337
48836,116,House,21376,41,1,AL,200,"BYRNE, Bradley",B001289,1955.0,0.61,0.25,-107.81607,0.85611,694.0,42.0,0.702,0.194


553

In [42]:
display(len(single_congress_df))
single_congress_df["chamber"].unique()

553

array(['House', 'Senate'], dtype=object)

The dataframe we have constructed (after some trial and error) contains 553 representatives of both house and senate, with several descriptors for political polarity. There are a few NaNs after we remove troublesome columns ("died" doesn't apply to a recent session of congress, or at least it shouldn't), but most politicians have a score. There's definitely a lot to work with here. The most challenging part of loading this data was understanding the NaN values. For a while I thought that most of the data was unusable before reading into specific columns and why they might be NaNs.

In [None]:
political_polarization_df

Central Questions:
- Is there a relationship between the age of a representative and their polarity?
- 

This dataset is interesting as it rates politicians in terms of polarity. Immediately what springs to mind are examining trends over time, when political polarity (extremism) surges, and when it eases. From the graph below, you can see the distace from the mean increase in recent years, while both parties trended closer to the mean during the second world warand through the 60s until the 90s, where both dip away from the mean:

![Alt text](image.png)

Can we estimate polarity from voting patterns?


#### Project Goals:

#### Work plan:

I plan to meet (with myself) twice a week to dive into the project. I'll be using Visual Studio Code, JupyterNotebook, and Python to investigate my chosen sources of data.