# Partisan Voter Index (PVI) calculations for Massachusetts Legislative Districts

The Partisan Voter Index or PVI was created by the Cook Political Report to create a common metric for comparing the partisanship or Democratic/Republican lean of legislative districts. The Cook Political Report primarily uses the PVI for comparing U.S. House districts, but it can also be used to compare state legislative districts.

PVI is calculated by taking the Democratic percentage of the over the last two presidential elections for the area you are considering and then subtracting the average for the U.S. as a whole.

In [0]:
from __future__ import division

US_VOTES = {
    "Obama_12": 65915796,
    "Romney_12": 60933500,
    "Clinton_16": 65853516,
    "Trump_16": 62984824,
    }

def dem_pct(d, r):
    """The Democratic percentage of the vote, given Dem and GOP vote totals."""
    return (d / (d + r))

US_DEM_PCT = dem_pct(US_VOTES["Obama_12"] + US_VOTES["Clinton_16"],
                     US_VOTES["Romney_12"] + US_VOTES["Trump_16"])

def calc_pvi(df):
    """Numeric PVI given vote totals from last two presidential elections."""
    dem = df["Obama_12"] + df["Clinton_16"]
    gop = df["Romney_12"] + df["Trump_16"]
    pvi = (dem_pct(dem, gop) - US_DEM_PCT) * 100
    return pvi

def pvi_string(pvi):
    """String representative of numeric PVI value."""
    if pvi <= -0.5:
        s = "R+{:.0f}".format(abs(pvi))
    elif pvi >= 0.5:
        s = "D+{:.0f}".format(abs(pvi))
    else:
        s = "EVEN"
    return s

For example, we can calculate the PVI for Massachusetts as a whole using the vote totals from the 2012 and 2016 presidential elections.

In [2]:
MA_VOTES = {
    "Obama_12": 1921290,
    "Romney_12": 1188314,
    "Clinton_16": 1995196,
    "Trump_16": 1090893,
    }

MA_PVI_N = calc_pvi(MA_VOTES)
MA_PVI_N

11.677772318663537

So Massachusetts is about 11.7 points more Democratic than the country as a whole using the PVI metric.

There is a formatting convention for PVI to round the numeric value to an integer and show negative/GOP-leaning PVI with an "R+" and positive/Dem-leaning PVI with a "D+".

In [3]:
MA_PVI = pvi_string(MA_PVI_N)
MA_PVI

'D+12'

## Precinct-level presidential results

In order to calculate PVI across various legislative districts in Massachusetts, we will start by downloading the precinct-level presidential election data for 2012 and 2016 from the awesome [electionstats.state.ma.us](http://electionstats.state.ma.us) website created by Adam Friedman and supported by the Massachusetts Secretary of the Commonwealth.

Once we have the precinct-level data, we can map the precincts to legislative districts and then do the PVI calculations.

In [4]:
import pandas as pd

def read_pd43(url, set_index=True):
    """Read a precinct-level CSV file from PD43 site."""
    # Force Ward and Pct to string for consistency.
    p = pd.read_csv(url, dtype={"Ward": str, "Pct": str}, thousands=",")
    # Remove TOTALS row
    p = p[p["City/Town"] != "TOTALS"]
    # Optionally set the index to three precinct-identifying columns
    if set_index:
        p = p.set_index(["City/Town", "Ward", "Pct"])
    return p

p12 = read_pd43("http://electionstats.state.ma.us/elections/download/22515/precincts_include:1/")
p12.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Obama/ Biden,Romney/ Ryan,Johnson/ Gray,Stein/ Honkala,All Others,No Preference,Blank Votes,Total Votes Cast
City/Town,Ward,Pct,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Abington,-,1,844,705,6,7,4,0,5,1571
Abington,-,2,742,741,18,8,4,0,7,1520
Abington,-,3,815,801,13,11,2,0,6,1648
Abington,-,4,899,904,8,9,5,0,5,1830
Abington,-,5,844,910,8,2,5,0,5,1774
Acton,-,1,1228,583,18,5,3,0,8,1845
Acton,-,2,1216,722,19,14,3,0,4,1978
Acton,-,3,1378,699,34,24,0,0,8,2143
Acton,-,4,1462,636,25,20,0,0,6,2149
Acton,-,5,1454,565,25,11,8,0,8,2071


We are only interested in the Democratic and Republican vote totals.

In [5]:
p12 = p12.rename(columns={"Obama/ Biden": "Obama_12", "Romney/ Ryan": "Romney_12"})[["Obama_12", "Romney_12"]]
p12.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Obama_12,Romney_12
City/Town,Ward,Pct,Unnamed: 3_level_1,Unnamed: 4_level_1
Abington,-,1,844,705
Abington,-,2,742,741
Abington,-,3,815,801
Abington,-,4,899,904
Abington,-,5,844,910
Acton,-,1,1228,583
Acton,-,2,1216,722
Acton,-,3,1378,699
Acton,-,4,1462,636
Acton,-,5,1454,565


Repeat the same process for 2016 presidential result data.

In [6]:
p16 = read_pd43("http://electionstats.state.ma.us/elections/download/40060/precincts_include:1/")
p16 = p16.rename(columns={"Clinton/ Kaine": "Clinton_16", "Trump/ Pence": "Trump_16"})[["Clinton_16", "Trump_16"]]
p16.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Clinton_16,Trump_16
City/Town,Ward,Pct,Unnamed: 3_level_1,Unnamed: 4_level_1
Abington,-,1,818,717
Abington,-,2,739,785
Abington,-,3,773,808
Abington,-,4,877,878
Abington,-,5,908,829
Acton,-,1,1459,438
Acton,-,2,1550,480
Acton,-,3,1580,435
Acton,-,4,1675,414
Acton,-,5,1580,387


Now we combine the 2012 and 2016 data together.

In [7]:
p = p12.join(p16)
p.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Obama_12,Romney_12,Clinton_16,Trump_16
City/Town,Ward,Pct,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abington,-,1,844,705,818,717
Abington,-,2,742,741,739,785
Abington,-,3,815,801,773,808
Abington,-,4,899,904,877,878
Abington,-,5,844,910,908,829
Acton,-,1,1228,583,1459,438
Acton,-,2,1216,722,1550,480
Acton,-,3,1378,699,1580,435
Acton,-,4,1462,636,1675,414
Acton,-,5,1454,565,1580,387


## Mapping precincts to legislative districts

In order to map the precincts to legislative districts we download results from all of the elections for that office for the year 2016 and mark each one with the district name. This requires some web page scraping using the BeautifulSoup HTML parsing library. 

In [8]:
import bs4
import requests

ELECTION_QUERY = "http://electionstats.state.ma.us/elections/search/year_from:{year}/year_to:{year}/office_id:{office_id}/stage:{stage}"
STATE_REP_ID = 8
STATE_SEN_ID = 9
US_CONG_ID = 5

def get_election_ids(year, office_id, stage):
    """Returns a list (ELECTION_ID, DISTRICT) for the given query parameters."""
    r = requests.get(ELECTION_QUERY.format(year=year, office_id=office_id, stage=stage))
    bs = bs4.BeautifulSoup(r.text, "html5lib")
    elec_trs = bs.findAll("tr", {"class": "election_item general_party"})
    elec_ids = [district_and_id(elec_tr) for elec_tr in elec_trs]
    return elec_ids
  
def district_and_id(elec_tr):
    """Parse the ELECTION_ID and DISTRICT from the HTML element."""
    election_id = elec_tr["id"].split("-")[2]
    district = elec_tr.td.findNext("td").findNext("td").text
    return (election_id, district)

CSV_QUERY = "http://electionstats.state.ma.us/elections/download/{election_id}/precincts_include:1/"

def get_office_precincts(year, office_id, stage):
    """Query for the matching elections, return all of the precincts tagged with the district name."""
    election_list = []
    election_ids = get_election_ids(year, office_id, stage)
    for election_id, district in election_ids:
        p = read_pd43(CSV_QUERY.format(election_id=election_id), set_index=False)
        p = p[["City/Town", "Ward", "Pct"]]
        p["District"] = district
        election_list.append(p)
    all_precincts = pd.concat(election_list, ignore_index=True).drop_duplicates()
    all_precincts = all_precincts.set_index(["City/Town", "Ward", "Pct"])
    return all_precincts

sr_pcts = get_office_precincts(2016, STATE_REP_ID, "General").rename(columns={"District": "State_Rep"})
ss_pcts = get_office_precincts(2016, STATE_SEN_ID, "General").rename(columns={"District": "State_Sen"})
ush_pcts = get_office_precincts(2016, US_CONG_ID, "General").rename(columns={"District": "US_House"})

# Combine the presidential result data with all of the district tags
pvi = p.join(ush_pcts).join(ss_pcts).join(sr_pcts).reset_index()
pvi.head(20)

Unnamed: 0,City/Town,Ward,Pct,Obama_12,Romney_12,Clinton_16,Trump_16,US_House,State_Sen,State_Rep
0,Abington,-,1,844,705,818,717,8th Congressional,Norfolk and Plymouth,7th Plymouth
1,Abington,-,2,742,741,739,785,8th Congressional,Norfolk and Plymouth,7th Plymouth
2,Abington,-,3,815,801,773,808,8th Congressional,Norfolk and Plymouth,7th Plymouth
3,Abington,-,4,899,904,877,878,8th Congressional,Norfolk and Plymouth,7th Plymouth
4,Abington,-,5,844,910,908,829,8th Congressional,Norfolk and Plymouth,7th Plymouth
5,Acton,-,1,1228,583,1459,438,3rd Congressional,Middlesex and Worcester,14th Middlesex
6,Acton,-,2,1216,722,1550,480,3rd Congressional,Middlesex and Worcester,14th Middlesex
7,Acton,-,3,1378,699,1580,435,3rd Congressional,Middlesex and Worcester,37th Middlesex
8,Acton,-,4,1462,636,1675,414,3rd Congressional,Middlesex and Worcester,37th Middlesex
9,Acton,-,5,1454,565,1580,387,3rd Congressional,Middlesex and Worcester,37th Middlesex


## Calculate PVI for each office/district

For each office—State Representative, State Senate, and U.S. House—group by district, summing up the presidential votes and then calculate the numeric PVI and PVI string.

Begin with State Representative.

In [9]:
sr_pvi = pvi.groupby("State_Rep").sum().reset_index()
sr_pvi["PVI_N"] = calc_pvi(sr_pvi)
sr_pvi["PVI"] = sr_pvi["PVI_N"].map(pvi_string)
sr_pvi.sort_values("PVI_N", ascending=False)

Unnamed: 0,State_Rep,Obama_12,Romney_12,Clinton_16,Trump_16,PVI_N,PVI
133,6th Suffolk,15073,644,15327,744,44.098303,D+44
125,5th Suffolk,13404,681,13467,736,43.455539,D+43
14,11th Suffolk,17136,1036,18292,920,43.232546,D+43
141,7th Suffolk,10657,806,10230,574,42.267219,D+42
72,25th Middlesex,16802,1778,17350,846,41.329643,D+41
22,12th Suffolk,17172,1923,16953,1673,38.931579,D+39
2,10th Hampden,10931,1200,10464,1266,38.129875,D+38
39,15th Suffolk,15597,2258,16631,1556,37.882632,D+38
74,27th Middlesex,17118,2580,19929,1990,37.483641,D+37
73,26th Middlesex,14454,2212,16361,1911,36.663826,D+37


Next, State Senate.

In [10]:
ss_pvi = pvi.groupby("State_Sen").sum().reset_index()
ss_pvi["PVI_N"] = calc_pvi(ss_pvi)
ss_pvi["PVI"] = ss_pvi["PVI_N"].map(pvi_string)
ss_pvi.sort_values("PVI_N", ascending=False)

Unnamed: 0,State_Sen,Obama_12,Romney_12,Clinton_16,Trump_16,PVI_N,PVI
17,2nd Suffolk,61429,5511,64278,4208,41.288118,D+41
7,1st Suffolk,55653,12790,59805,11511,31.076942,D+31
29,Middlesex and Suffolk,48613,12190,53276,10312,30.374997,D+30
14,2nd Middlesex,65521,18456,74590,15118,29.134334,D+29
5,1st Middlesex and Norfolk,59893,22633,68176,13300,26.554632,D+27
18,2nd Suffolk and Middlesex,48768,18089,55191,12163,25.924104,D+26
28,"Hampshire, Franklin and Worcester",58698,17217,55922,17212,25.365616,D+25
27,Hampden,42046,12097,38681,13463,24.416634,D+24
24,"Berkshire, Hampshire and Franklin",58586,18448,52576,21547,22.005486,D+22
8,1st Suffolk and Middlesex,44155,18341,49333,17278,20.875986,D+21


And finally, U.S. House.

In [11]:
ush_pvi = pvi.groupby("US_House").sum().reset_index()
ush_pvi["PVI_N"] = calc_pvi(ush_pvi)
ush_pvi["PVI"] = ush_pvi["PVI_N"].map(pvi_string)
ush_pvi.sort_values("PVI_N", ascending=False)

Unnamed: 0,US_House,Obama_12,Romney_12,Clinton_16,Trump_16,PVI_N,PVI
6,7th Congressional,233382,44275,254037,36018,34.321468,D+34
4,5th Congressional,235984,119934,258908,95922,18.094473,D+18
0,1st Congressional,213423,114339,194036,123953,11.5632,D+12
7,8th Congressional,213364,150825,231356,131624,9.622453,D+10
3,4th Congressional,211423,152699,225976,133705,8.8954,D+9
1,2nd Congressional,199549,133195,197492,129437,8.652278,D+9
2,3rd Congressional,189461,137869,202952,123347,8.500776,D+9
5,6th Congressional,212003,169966,224858,153244,5.941072,D+6
8,9th Congressional,212701,165212,205581,163643,4.449376,D+4
