# Legislative Mismatch

### Friendly Cities Lab @ Georgia Tech

Analyzes discrepancies between U.S. Congress Representatives’ voting records and their districts’ Social Vulnerability Index.


## Working with Agriculture Data

In [6]:
import chardet

file_path = "./data/cdp.csv"
with open(file_path, "rb") as f:
  result = chardet.detect(f.read())
  encoding = result['encoding']
  print(result)

{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}


Reshape the agriculture data so that each statistic is a column and each row is a congressional district in a state.

In [15]:
import pandas as pd

file_path = "./data/cdp.csv"
df = pd.read_csv(file_path, encoding=encoding)

df = df.drop_duplicates(subset=["State", "Congressional District", "Label"], keep=False)
reshaped_df = df.pivot(index=["State", "Congressional District"], columns="Label", values="Value").reset_index()
reshaped_df.to_csv("./data/reshaped_agriculture_data.csv", index=False)

num_reshaped_columns = len(reshaped_df.columns)
print(f"Reshaped data saved to reshaped_agriculture_data.csv with {num_reshaped_columns} columns.")

Reshaped data saved to reshaped_agriculture_data.csv with 206 columns.


Filter out empty rows and columns to get only the statistics that exist for every congressional district, of which there are 27.

In [17]:
# Drop empty rows (i.e. State no. 36 districts 4-8)
filtered_df = reshaped_df.dropna(subset=[col for col in reshaped_df.columns if col not in ["State", "Congressional District"]], how="all")
# Drop columns with empty rows, ensuring that each remaining column has data for each congressional district
filtered_df = filtered_df.dropna(axis=1, how="any")
filtered_df.to_csv("./data/filtered_agriculture_data.csv", index=False)

num_filtered_columns = len(filtered_df.columns)
print(f"Reshaped data saved to filtered_agriculture_data.csv with {num_filtered_columns}")

Reshaped data saved to filtered_agriculture_data.csv with 27
