The purpose of this notebook is to summarize the data at the district level and state level.

It uses congress_tools.py to do some of this.

1. Calculate the expected number of representatives based on popular vote (in congress_tools.py: reapportionSeats_state).

2. Calculate the difference between 1 and the actual numbers (delta seats)

3. Pull in other district measures such as compactness (can later summarize state using, e.g., mean compactness)

4. Summarize state-level data

5. Save district and state data into pkl files for later access.

Can, e.g., correlate 2 and 3 (e.g., in plot_state_expected_reps_2013.ipynb)

Reads data such as:

- the popular vote dictionary 113_2012_house_popular_vote.json (created by save_113_2012_house_popular_vote.ipynb)
- the state redistricting summary redistricting_2010.json (created by get_redistricting_authorities.ipynb)
- compactness scores per district in compactness113_byGEO.json (created by Get_compactness_score_113.ipynb)

In [1]:
from __future__ import print_function, division
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import congress_tools as cong

%matplotlib inline

In [2]:
# Read the data we need for creating a dataframe
#
# Run for both 2012 and 2014

year = 2012
# year = 2014

if year == 2014:
    congNum = 114
elif year == 2012:
    congNum = 113

# reapportion seats to expected values
methodReapportion = 'state'
# methodReapportion = 'country'

# Read the popular vote dictionary 113_2012_house_popular_vote.json (created by save_113_2012_house_popular_vote.ipynb)
filename = '../data/%d_%d_house_popular_vote.json' % (congNum, year)
with open(filename,'rb') as f1:
    popv = json.load(f1)

popv = cong.reapportionSeats(popv,method=methodReapportion)

In [3]:
# load data for creating dataframe

# state PVI from Cook Political Report;
# manually copied from http://cookpolitical.com/file/filename.pdf into ../data/pvi_data.csv
dfstatepvi = pd.read_csv('../data/pvi_state.csv', header=0, names = ['state','pvi2010','pvi2014'])
dfstatepvi['state'] = dfstatepvi['state'].map(str.strip)
dfstatepvi['pvi2010'] = dfstatepvi['pvi2010'].map(str.strip)
dfstatepvi['pvi2014'] = dfstatepvi['pvi2014'].map(str.strip)

# Read compactness scores per district in compactness113_byGEO.json;
# districts didn't change between 2012 and 2014 so 113 and 114 are both applicable to both elections
filename = '../data/compactness113_byGEO.json'
with open(filename,"rb") as f3:
    comp = json.load(f3)

# Read the state redistricting summary redistricting_2010.json (created by get_redistricting_authorities.ipynb);
# per state: this contains the number of seats, redistricting method, redistricting control
with open("../data/redistricting_2010.json","rb") as f2:
    dist = json.load(f2)

# get census data per district and use it to calculate percentage of population per race per state;
# could instead get voting age
usecols = [1, 49, 50, 51, 52, 53, 54, 55]
names = ['state', 'total', 'white', 'black', 'hispa', 'asian', 'nativ', 'other']
dfcensus = pd.read_csv('../data/114_2014_house_election_2010census.csv', header=None, skiprows=[0,1,2], usecols=usecols, names=names)
dfcensus['state'] = dfcensus['state'].map(str.strip)

# IPV113 contains district-level info.
# we created it from data in ../data/compactness113_byGEO.json and ../data/pvi_district.csv.
# per district: compactness, PVI ("score" key in each district), presidential election info, etc.
with open('../data/IPV113.json','rb') as f:
    pviDict = json.load(f)

In [4]:
# put the data in a dataframe
df = cong.create_df_dist(popv,comp,dist,dfcensus,dfstatepvi,pviDict)

In [5]:
# save the district-level dataframe

filename = '../data/df_distSummary_%d_%sReapportion.pkl' % (year, methodReapportion)
df.to_pickle(filename)

In [6]:
# summarize the states into their own dataframe
dfstate = cong.summary_dist2state(df)

In [7]:
# save the state-level dataframe

filename = '../data/df_stateSummary_%d_%sReapportion.pkl' % (year, methodReapportion)
dfstate.to_pickle(filename)