# Parsing Clearfield County Election results

The following notebook parses the archived 2018 precinct level election results from Clearfield County from plain text into a dataframe (spreadsheet like data structure) to be used for matching.

Steps:

1. Copy the plain text data from Clearfield County's [website]( https://clearfieldco.org/Election_Files/Archive/18GEPBP.HTM)
2. Store the plain text locally at `clearfield_raw.txt`
3. Use Excel's column parsing feature to create `clearfield_county.csv`
4. Use this script to parse the primitive csv into a csv matching Open Elections style for its statewide precinct level resutls. 
5. Store the resulting file locally at `clearfield_county_parsed.csv`

In [1]:
import pandas as pd
import numpy as np
import math
import os
import re
os.getcwd()

'/Users/baxterdemers/pa-2018/parsing_election_results/clearfield'

In [2]:
df = pd.read_csv('clearfield_county.csv')
df.head()

Unnamed: 0,candidate,votes
0,RUN DATE:11/16/18,
1,RUN TIME:10:51 AM,
2,,
3,0001 Brisbin Borough,
4,,VOTES


### Main parsing script

In [3]:
d = {
    'Straight Party':'Straight Party', 
    'United States Senator':'U.S. Senate',
    'Governor and Lieutenant Governor':'Governor',
    'REPRESENTATIVE IN CONG':'U.S. House', 
    'REPRESENTATIVE IN THE':'State House',
}
output = pd.DataFrame(columns=['county', 'precinct', 'office', 'district', 'candidate', 'party','votes', 'absentee', 'election_day'])
prev_blank = False
lst = []
first = True
for idx, row in df.iterrows():
    can = row.candidate
    if type(can) != str and math.isnan(can):
        prev_blank = True
        continue
    elif str(can)[:4].isnumeric():
        prec = can
    elif can.split()[0] in {'REGISTERED', 'PRECINCT', 'BALLOTS', 'VOTER', 'VOTE','TOTAL', 'Total', 'DISTRICT', 'Vote', 'WRITE-IN.', 'PREC', 'Run','RUN'}:
        continue
    elif prev_blank:
        if "CONGRESS" in can.upper():
            office = 'U.S. House'
            temp = re.findall(r'\d+', can.upper().split('CONGRESS')[1]) 
            district = list(map(int, temp))[0]
        elif "GENERAL ASSEMBLY" in can.upper():
            office = 'State House'
            temp = re.findall(r'\d+', can.upper().split('GENERAL ASSEMBLY')[1]) 
            district = list(map(int, temp))[0]
        else:
            district = np.nan
            office = d[can.strip()]
        prev_blank = False
    else:
        splits = can.split('(')
        can_name = splits[0].strip()
        party = splits[1].split(')')[0]
        if can_name == 'EBERT G BILL BEEMAN':
            party = 'LIB'
        res = {
            'county':'Clearfield',
            'precinct':prec,
            'office':office,
            'district':district,
            'candidate':can_name,
            'party':party,
            'votes':row.votes, 
            'absentee':np.nan, 
            'election_day':np.nan,
        }
        lst.append(res)
        if first:
            print(res)
            first = False
            
output = output.append(lst)

{'county': 'Clearfield', 'precinct': '0001 Brisbin Borough', 'office': 'Straight Party', 'district': nan, 'candidate': 'Democratic', 'party': 'DEM', 'votes': '12', 'absentee': nan, 'election_day': nan}


### Validation

In [4]:
output[output.party==''].candidate.unique()

array([], dtype=object)

In [5]:
output.party.unique()

array(['DEM', 'REP', 'GR', 'LIB', 'REP/DEM'], dtype=object)

In [6]:
output.head(20)

Unnamed: 0,county,precinct,office,district,candidate,party,votes,absentee,election_day
0,Clearfield,0001 Brisbin Borough,Straight Party,,Democratic,DEM,12,,
1,Clearfield,0001 Brisbin Borough,Straight Party,,Republican,REP,64,,
2,Clearfield,0001 Brisbin Borough,Straight Party,,Green,GR,0,,
3,Clearfield,0001 Brisbin Borough,Straight Party,,Libertarian,LIB,0,,
4,Clearfield,0001 Brisbin Borough,U.S. Senate,,BOB CASEY JR,DEM,52,,
5,Clearfield,0001 Brisbin Borough,U.S. Senate,,LOU BARLETTA,REP,88,,
6,Clearfield,0001 Brisbin Borough,U.S. Senate,,NEAL GALE,GR,2,,
7,Clearfield,0001 Brisbin Borough,U.S. Senate,,DALE R KERNS JR,LIB,0,,
8,Clearfield,0001 Brisbin Borough,Governor,,TOM WOLF,DEM,41,,
9,Clearfield,0001 Brisbin Borough,Governor,,SCOTT R WAGNER,REP,99,,


In [7]:
output.to_csv('clearfield_county_parsed.csv',index=False)