![](../additional_materials/logos/darden_rice_logo_SM.png)

### 2017 MP Election Day Doc Processing

This notebook contains code to process and format data according to Adrienne Bogen's [E Day Doc](https://docs.google.com/spreadsheets/d/1M6EKaDWyVTHzpNTi2cdLXDYZfKgGVtChcbCmEbIla4k/edit#gid=0) for the 2017 Pinellas County municipal primary election on Google Sheets.

Data sources: 
* [Pinellas County SOE](https://www.votepinellas.com/Election-Results) (specific: [Pinellas County SOE 2017 Municipal Primary Reports](https://enr.votepinellas.com/FL/Pinellas/71078/188313/en/reports.html))
* NGP VAN

---
---

In [1]:
import pandas as pd
from pandas.tseries.offsets import BDay
pd.set_option('display.max_columns', None)

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

import datetime

In [2]:
vbm_df = pd.read_csv('../data/raw_eday_2017/VAN/2017_mp_turnout_vbm.csv')
polls_df = pd.read_csv('../data/raw_eday_2017/VAN/2017_mp_turnout_polls.csv')

In [3]:
vbm_df.head(3)

Unnamed: 0,Precinct,Democrats,Independent,Republicans,Unknown,Totals
0,101,459,61,79,0.0,599
1,102,259,28,56,0.0,343
2,103,107,36,60,0.0,203


In [4]:
vbm_df.tail(3)

Unnamed: 0,Precinct,Democrats,Independent,Republicans,Unknown,Totals
90,275,50.0,30.0,60.0,0.0,140.0
91,Total People,16440.0,4805.0,10048.0,0.0,31293.0
92,,,,,,


In [5]:
polls_df.head(3)

Unnamed: 0,Precinct,Democrats,Independent,Republicans,Unknown,Totals
0,101,297,29,52,0.0,378
1,102,156,19,37,0.0,212
2,103,63,14,31,0.0,108


In [6]:
polls_df.tail(3)

Unnamed: 0,Precinct,Democrats,Independent,Republicans,Unknown,Totals
89,275,27.0,11.0,23.0,0.0,61.0
90,Total People,9240.0,1915.0,4489.0,0.0,15644.0
91,,,,,,


In [7]:
# Drop last 2 rows of each df
vbm_df.drop(vbm_df.tail(2).index, inplace=True)
polls_df.drop(polls_df.tail(2).index, inplace=True)

In [8]:
# Merge VBM and polls dfs on precinct
van_df = vbm_df.merge(polls_df, how='left', left_on='Precinct', right_on='Precinct', suffixes=('_vbm', '_polls'))

In [9]:
print(vbm_df.shape)
print(polls_df.shape)
print(vbm_df.shape)

(91, 6)
(90, 6)
(91, 6)


In [10]:
van_df

Unnamed: 0,Precinct,Democrats_vbm,Independent_vbm,Republicans_vbm,Unknown_vbm,Totals_vbm,Democrats_polls,Independent_polls,Republicans_polls,Unknown_polls,Totals_polls
0,101,459,61,79,0.0,599,297,29,52,0.0,378
1,102,259,28,56,0.0,343,156,19,37,0.0,212
2,103,107,36,60,0.0,203,63,14,31,0.0,108
3,104,261,41,41,0.0,343,212,23,18,0.0,253
4,105,514,80,91,0.0,685,303,29,37,0.0,369
...,...,...,...,...,...,...,...,...,...,...,...
86,237,23,8,29,0.0,60,18,1,4,0.0,23
87,239,246,96,178,0.0,520,137,37,113,0.0,287
88,240,104,53,148,0.0,305,66,15,76,0.0,157
89,241,174,58,162,0.0,394,80,30,62,0.0,172


In [11]:
van_df.isnull().sum()

Precinct             0
Democrats_vbm        0
Independent_vbm      0
Republicans_vbm      0
Unknown_vbm          0
Totals_vbm           0
Democrats_polls      1
Independent_polls    1
Republicans_polls    1
Unknown_polls        1
Totals_polls         1
dtype: int64

In [12]:
# Precinct 165 is not represented in polls_df. Impute precinct 165 values as 0 for polls.
van_df.fillna(0, inplace=True)

In [13]:
# Casting all numeric values as integers for easy operation
van_df = van_df.astype(int).copy()

In [14]:
van_df.dtypes

Precinct             int64
Democrats_vbm        int64
Independent_vbm      int64
Republicans_vbm      int64
Unknown_vbm          int64
Totals_vbm           int64
Democrats_polls      int64
Independent_polls    int64
Republicans_polls    int64
Unknown_polls        int64
Totals_polls         int64
dtype: object

In [15]:
# Sum columns for total 2017 TO (all parties)
van_df['Total TO'] = van_df['Totals_vbm'] + van_df['Totals_polls']

In [16]:
# Drop parties other than Dem and Rep
van_df.drop(['Independent_vbm', 'Unknown_vbm', 'Independent_polls', 'Unknown_polls'], axis=1, inplace=True)

In [17]:
# Sum VBM and Polls TO for Totals column
van_df['Total Dem & Rep TO'] = van_df['Democrats_vbm'] + van_df['Democrats_polls'] + van_df['Republicans_vbm'] + van_df['Republicans_polls']

# Drop individual totals
van_df.drop(columns=['Totals_vbm', 'Totals_polls'], inplace=True)

# Reorder columns to match e day spreadsheet
van_df = van_df[['Precinct', 'Democrats_vbm', 'Republicans_vbm', 'Democrats_polls', 'Republicans_polls', 'Total Dem & Rep TO', 'Total TO']]

In [18]:
van_df['pct_of_total'] = round(van_df['Total Dem & Rep TO'] / van_df['Total TO'], 4)

In [19]:
van_df

Unnamed: 0,Precinct,Democrats_vbm,Republicans_vbm,Democrats_polls,Republicans_polls,Total Dem & Rep TO,Total TO,pct_of_total
0,101,459,79,297,52,887,977,0.9079
1,102,259,56,156,37,508,555,0.9153
2,103,107,60,63,31,261,311,0.8392
3,104,261,41,212,18,532,596,0.8926
4,105,514,91,303,37,945,1054,0.8966
...,...,...,...,...,...,...,...,...
86,237,23,29,18,4,74,83,0.8916
87,239,246,178,137,113,674,807,0.8352
88,240,104,148,66,76,394,462,0.8528
89,241,174,162,80,62,478,566,0.8445


In [20]:
van_df.to_csv('../data/processed_eday_2017/2017_mp_turnout.csv', index=False)

---
---

#### SOE Vote Breakdown

This is based on each candidates' party, and does not necessarily represent that a voter is registered as a Democrat or Republican. Only the Democratic and Republican candidates are included, as they comprised 96.59% of the votes cast in the 2017 Municipal Primary election.

Candidate party affiliation for the 2017 Municipal Primary Mayoral race is as follows:
* **Rick Baker:** Republican
* **Rick Kriseman:** Democrat

In [21]:
soe_df = pd.read_csv('../data/raw_eday_2017/SOE/2017_primary_detail.csv')

In [22]:
soe_df

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Rick Baker,Unnamed: 3,Unnamed: 4,Unnamed: 5,Anthony Cates III,Unnamed: 7,Unnamed: 8,Unnamed: 9,"Paul ""The Truth"" Congemi",Unnamed: 11,Unnamed: 12,Unnamed: 13,Rick Kriseman,Unnamed: 15,Unnamed: 16,Unnamed: 17,"Theresa ""Momma Tee"" Lassiter",Unnamed: 19,Unnamed: 20,Unnamed: 21,Jesse Nevel,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26
0,County,Registered Voters,Election Day Votes,Mail Ballot Votes,Provisional Votes,Total Votes,Election Day Votes,Mail Ballot Votes,Provisional Votes,Total Votes,Election Day Votes,Mail Ballot Votes,Provisional Votes,Total Votes,Election Day Votes,Mail Ballot Votes,Provisional Votes,Total Votes,Election Day Votes,Mail Ballot Votes,Provisional Votes,Total Votes,Election Day Votes,Mail Ballot Votes,Provisional Votes,Total Votes,Total
1,101,3646,165,299,0,464,8,6,0,14,0,0,0,0,219,381,0,600,5,12,0,17,18,12,0,30,1125
2,102,1768,94,165,0,259,1,3,0,4,0,0,0,0,132,190,0,322,4,4,0,8,7,8,0,15,608
3,103,691,42,107,0,149,1,2,0,3,0,1,0,1,75,140,0,215,0,1,0,1,2,3,0,5,374
4,104,1700,82,175,0,257,8,4,0,12,1,0,0,1,172,200,0,372,7,5,0,12,4,5,0,9,663
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,240,1812,96,219,0,315,0,4,0,4,1,2,0,3,70,129,0,199,1,4,0,5,3,5,0,8,534
90,241,1954,112,268,0,380,0,1,0,1,2,1,0,3,99,224,0,323,4,1,0,5,2,2,0,4,716
91,275,784,30,107,0,137,0,0,0,0,0,5,0,5,34,72,0,106,0,1,0,1,0,0,0,0,249
92,401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [23]:
# Drop last row of soe_df that represents totals
soe_df.drop(index=soe_df.tail(1).index, inplace=True)

In [25]:
soe_df.tail(3)

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Rick Baker,Unnamed: 3,Unnamed: 4,Unnamed: 5,Anthony Cates III,Unnamed: 7,Unnamed: 8,Unnamed: 9,"Paul ""The Truth"" Congemi",Unnamed: 11,Unnamed: 12,Unnamed: 13,Rick Kriseman,Unnamed: 15,Unnamed: 16,Unnamed: 17,"Theresa ""Momma Tee"" Lassiter",Unnamed: 19,Unnamed: 20,Unnamed: 21,Jesse Nevel,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26
90,241,1954,112,268,0,380,0,1,0,1,2,1,0,3,99,224,0,323,4,1,0,5,2,2,0,4,716
91,275,784,30,107,0,137,0,0,0,0,0,5,0,5,34,72,0,106,0,1,0,1,0,0,0,0,249
92,401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
