# Compute gerrymandering metrics

This notebook demonstrates how to load in election results, creating 3 different DataFrames that one might be interested in:
1. Election results
2. Metrics
3. Percentiles for metrics

First, we will load in our data, both for Congressional elections and state legislative (lower house) elections.

Then we will compute the tests and generate percentile rankings, using a few parameters that we set below.

In [1]:
import utils as u
import metrics as m
import IPython.display as ipd

from collections import defaultdict

impute_val = 1 # impute uncontested races at voteshare=1; in other words, don't impute
competitiveness_threshold = .55 # only examine races where D voteshare is between .45 and .55
min_districts = 7 # only examine races in states with at least 7 districts
min_year = 1972 # only examine races after 1972

chambers = defaultdict(lambda: defaultdict(list))
chambers['State Legislative']['filepath'] = 'election_data/state_legislative/state_legislative_election_results_post1971.csv'
chambers['Congressional']['filepath'] = 'election_data/congressional_election_results_post1948.csv'

metric_dict = {'t_test_diff': m.t_test_diff,
               'mean_median_diff': m.mean_median,
               'declination': m.declination,
               'declination_buffered': m.bdec,
               'efficiency_gap': m.EG,
               'loss_gap': m.EG_loss_only,
               'difference_gap': m.EG_difference,
               'surplus_gap': m.EG_surplus_only,
               'vote_centric_gap': m.EG_vote_centric,
               'vote_centric_gap_two': m.EG_vote_centric_two,
               'partisan_bias': m.partisan_bias,
               'equal_vote_weight_bias': m.equal_vote_weight}

for chamber in chambers:
    chambers[chamber]['elections_df'] = u.parse_results(chambers[chamber]['filepath'])
    chambers[chamber]['tests_df'] = u.tests_df(u.run_all_tests(chambers[chamber]['elections_df'],
                                                               impute_val=impute_val,
                                                               metrics=metric_dict))
    chambers[chamber]['percentile_df'] = u.generate_percentiles(chambers[chamber]['tests_df'],
                                                                metric_dict.keys(),
                                                                competitiveness_threshold=competitiveness_threshold,
                                                                min_districts=min_districts,
                                                                min_year=min_year)

  r = func(a, **kwargs)
100%|██████████| 47/47 [00:01<00:00, 29.48it/s]
  out=out, **kwargs)
100%|██████████| 35/35 [00:02<00:00, 15.22it/s]


## 1. Election results
The first dataframe is just election results. Let's look at a few 2016 Congressional elections:

In [2]:
chambers['Congressional']['elections_df'].loc[2016].head()

Unnamed: 0_level_0,D Voteshare,District Numbers,Weighted Voteshare
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AK,[0.417196842],[1],0.417197
AL,"[0.0, 0.453847555, 0.329768793, 0.0, 0.3320568...","[1, 2, 3, 4, 5, 6, 7]",0.338584
AR,"[0.0, 0.386864661, 0.0, 0.0]","[1, 2, 3, 4]",0.096716
AZ,"[0.538781804, 0.430415074, 1.0, 0.285466478, 0...","[1, 2, 3, 4, 5, 6, 7, 8, 9]",0.483817
CA,"[0.409468978, 0.768500995, 0.593514317, 0.3728...","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...",0.664096


## 2. Metrics
Then we have the output from the metrics that we specified in the first cell. Let's look at those same elections again:

In [3]:
chambers['Congressional']['tests_df'].loc[2016].head()

Unnamed: 0_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,mean_median_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
AK,0.417197,0.0,0.0,1.0,AK,2016.0,0.417197,,0.0,,,0.334394,0.417197,0.251591,-0.082803,0.857923,0.715845,0.5,0.0
AL,0.338584,1.0,1.0,7.0,AL,2016.0,0.338584,,0.008816,0.550013,0.238453,0.034312,0.195727,-0.127104,-0.161416,0.436999,0.295922,0.214286,0.0
AR,0.096716,0.0,0.0,4.0,AR,2016.0,0.096716,,0.096716,,,-0.306568,0.096716,-0.709852,-0.403284,0.553536,0.107072,0.25,0.0
AZ,0.483817,4.0,4.0,9.0,AZ,2016.0,0.483817,0.015907,0.053402,0.093168,0.044567,0.02319,0.039373,0.007007,-0.016183,0.078828,0.060448,0.055556,0.0
CA,0.664096,39.0,39.0,53.0,CA,2016.0,0.664096,0.163943,-0.005997,-0.000673,0.083101,0.092343,-0.071753,0.256439,0.164096,-0.160829,0.050463,-0.028302,0.0


## 3. Percentiles for metrics
Then we have the percentile rankings for all of those metrics, using the parameters specified in the first cell.

In [4]:
chambers['Congressional']['percentile_df'].loc[2016].head()

Unnamed: 0_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,mean_median_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
AZ,0.483817,4.0,4.0,9.0,AZ,2016.0,0.483817,17.73399,66.009852,37.438424,29.55665,18.719212,33.497537,6.896552,35.960591,33.497537,23.152709,43.103448,37.684729
CO,0.491466,3.0,3.0,7.0,CO,2016.0,0.491466,20.197044,45.812808,45.320197,27.586207,45.320197,46.79803,36.945813,16.748768,46.305419,46.79803,61.576355,37.684729
FL,0.484317,11.0,11.0,27.0,FL,2016.0,0.484317,24.137931,59.605911,53.694581,54.679803,51.724138,57.635468,36.453202,34.482759,57.142857,54.679803,67.241379,37.684729
MI,0.503371,5.0,5.0,14.0,MI,2016.0,0.503371,80.788177,83.743842,86.206897,88.669951,86.699507,87.192118,85.714286,8.374384,87.192118,85.714286,87.684729,91.625616
MN,0.522439,5.0,5.0,8.0,MN,2016.0,0.522439,5.418719,25.615764,33.497537,13.300493,66.009852,73.891626,45.812808,45.320197,73.891626,67.487685,79.802956,37.684729


All of the above DataFrames are MultiIndex, so you can specify any state and year like so:

In [5]:
chambers['Congressional']['percentile_df'].loc[2016, 'VA']

voteshare                 0.514767
dseats                           4
seats                            4
ndists                          11
state                           VA
year                          2016
weighted_voteshare        0.514767
t_test_diff                89.6552
mean_median_diff           89.6552
declination                  88.67
declination_buffered       89.6552
efficiency_gap             88.1773
loss_gap                   88.1773
difference_gap             89.6552
surplus_gap                32.5123
vote_centric_gap           88.1773
vote_centric_gap_two         88.67
partisan_bias               85.468
equal_vote_weight_bias     94.0887
Name: (2016, VA), dtype: object

## Which elections in the ongoing cycle seem to be most gerrymandered according to these metrics?

Finally, we might be interested in which elections seem particularly gerrymandered in the ongoing 2012-2021 cycle.

In [6]:
min_percentile = 95
min_n_tests = 3
cycle_start_year = 2012

print(
'''
Shown below are elections since {start} that rank, for
at least {min_tests} of {total_tests} metrics, in at least the {pctile}th
percentile of all elections since {min_year_overall}.

Only showing elections for states that have at least {n_districts} districts,
and for which the statewide total voteshare was competitive,
i.e., between {comp_lo:.2g} and {comp_hi:.2g}.
'''.format(start=cycle_start_year,
    min_tests=min_n_tests,
    total_tests=len(metric_dict),
    pctile=min_percentile,
    min_year_overall=min_year,
    n_districts=min_districts,
    comp_lo=1-competitiveness_threshold,
    comp_hi=competitiveness_threshold))


Shown below are elections since 2012 that rank, for
at least 3 of 12 metrics, in at least the 95th
percentile of all elections since 1972.

Only showing elections for states that have at least 7 districts,
and for which the statewide total voteshare was competitive,
i.e., between 0.45 and 0.55.



In [7]:
for chamber in chambers:
    print('\n' + chamber + ' elections:')
    df = chambers[chamber]['percentile_df']
    cut = df[(df.loc[:, metric_dict.keys()] > min_percentile).sum(axis=1) >= min_n_tests]
    ipd.display(cut.loc[cycle_start_year:])


State Legislative elections:


Unnamed: 0_level_0,Unnamed: 1_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,mean_median_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
Year,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2012,NC,0.47443,43.0,43.0,120.0,NC,2012.0,0.484107,93.841642,70.674487,97.653959,98.240469,81.818182,97.360704,46.041056,41.642229,95.601173,91.495601,85.337243,40.322581
2012,OH,0.522491,39.0,39.0,99.0,OH,2012.0,0.510077,94.721408,89.73607,98.533724,99.41349,95.601173,98.240469,89.442815,38.123167,96.480938,97.947214,94.428152,97.360704
2012,WI,0.562973,39.0,39.0,99.0,WI,2012.0,0.531605,99.706745,96.187683,100.0,100.0,99.41349,100.0,96.480938,80.351906,98.240469,98.240469,97.653959,100.0
2014,MI,0.529284,47.0,47.0,110.0,MI,2014.0,0.512203,90.029326,85.630499,95.307918,96.480938,93.255132,94.721408,86.803519,47.214076,93.255132,95.307918,80.351906,95.601173
2016,NV,0.500161,27.0,27.0,42.0,NV,2016.0,0.46386,95.307918,79.178886,99.41349,99.120235,95.014663,99.120235,82.991202,0.879765,97.067449,97.653959,97.947214,40.322581
2016,WI,0.493231,35.0,35.0,99.0,WI,2016.0,0.467866,99.41349,95.894428,99.120235,99.706745,93.841642,98.533724,78.592375,11.143695,96.774194,97.067449,98.240469,40.322581
2017,VA,0.557503,49.0,49.0,100.0,VA,2017.0,0.547926,95.014663,86.803519,76.539589,84.164223,90.909091,77.419355,91.495601,75.953079,75.953079,89.442815,96.480938,95.894428



Congressional elections:


Unnamed: 0_level_0,Unnamed: 1_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,mean_median_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
Year,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2012,MI,0.530037,5.0,5.0,14.0,MI,2012.0,0.530037,95.073892,76.35468,91.133005,96.059113,93.596059,90.147783,96.551724,58.128079,90.147783,91.625616,87.684729,87.684729
2012,NC,0.510456,4.0,4.0,13.0,NC,2012.0,0.510456,90.640394,81.773399,94.08867,95.073892,95.073892,93.596059,95.566502,21.674877,93.596059,94.581281,99.261084,90.640394
2012,OH,0.484074,4.0,4.0,16.0,OH,2012.0,0.484074,97.536946,75.862069,98.522167,99.507389,96.059113,97.044335,92.610837,35.46798,97.044335,97.044335,98.522167,37.684729
2012,PA,0.504637,5.0,5.0,18.0,PA,2012.0,0.504637,96.551724,80.788177,97.536946,99.014778,98.029557,96.059113,97.044335,10.344828,96.059113,98.029557,96.79803,90.147783
2012,VA,0.492911,3.0,3.0,11.0,VA,2012.0,0.492911,87.192118,72.906404,97.044335,92.118227,94.581281,95.566502,94.08867,14.285714,95.566502,96.059113,97.783251,37.684729
2016,NC,0.46371,3.0,3.0,13.0,NC,2016.0,0.46371,76.847291,60.591133,96.551724,90.640394,92.610837,96.551724,87.192118,74.876847,96.551724,93.596059,99.261084,37.684729
