## Georgia 2022 Primary Election Returns

### Sections
- <a href="#ETL">Cleaning Election Results</a><br>
- <a href="#check">Vote Totals Check</a><br>
- <a href="#readme">Creating README</a><br>
- <a href="#exp">Exporting Cleaned Datasets</a><br>


### Sources
- [Georgia Secretary of State Certified Results by County - XML format](https://results.enr.clarityelections.com/GA/113667/web.285569/#/access-to-races)
- [Georgia Secretary of State Statewide Summary Report - CSV format](https://results.enr.clarityelections.com//GA//113667/294374/reports/summary.zip) - #TODO: replace this with statewide xml or xls, csv is inaccurate
-[Georgia Secretary of State County Summary Report - XML format](https://results.enr.clarityelections.com//GA//113667/294374/reports/detailxml.zip)

In [2]:
import geopandas as gp
import pandas as pd
import os
import xml.etree.ElementTree as et
import numpy as np
import re
import GA22_primary_helper as hlp
pd.set_option("display.max_rows", None)

<p><a name="ETL"></a></p>

## Cleaning Election Results

In [3]:
df_primary = hlp.ph_clarityelec_xml("./raw-from-source/counties/", 'primary')

Concatenate into DF, check for 159 counties

In [4]:
ga_22_primary = pd.concat([df_primary])
#check for 159 counties
print(len(ga_22_primary["county"].unique()) == 159)

True


Check for duplicates

In [5]:
#ga_22_primary[ga_22_primary.duplicated(keep=False)]

Check data types of df, cast votes column as an integer

In [6]:
#ga_22_primary.info()
ga_22_primary.num_votes = ga_22_primary.num_votes.astype(int)
ga_22_primary.num_votes.dtype

dtype('int32')

## Subset dataframe by contests of interest

__State wide primary contests__
  1. US Senate
  2. US House
  3. Governor
  4. Lieutenant Governor
  5. Attorney General
  6. Secretary of State
  7. Superintendent of Public Instruction
  8. Agricultural Commissioner
  9. Labor Commissioner
  10. Insurance Commissioner
  11. Public Service Commissioners
  12. State Court of Appeals

__State wide Special General Election__
  1. State Supreme Court

__Primary contests for state-wide legislative bodies__
  1. State Senate
  2. State House

In [7]:
keywords_list = ['US Senate', 'US House', 'Governor', 'Attorney General', 'Secretary of State', 'State School Superintendent', 'Agriculture', 'Labor', 'Insurance', 'PSC', 'Public Service', 'Supreme Court', 'Court of Appeals', 'State Senate', 'State House']

Create list of all contests, and subset by keywords

In [8]:
all_contests = list(ga_22_primary["contest"].unique())

In [9]:
keep_contests = hlp.contests_to_keep(all_contests, keywords_list)
print('All contests:'+ str(len(all_contests)))
print('Subset of contests to keep:' + str(len(keep_contests)))

All contests:1367
Subset of contests to keep:578


Visual inspection of remaining contests, as a check

In [10]:
# #check remaining contests
# other_contests = set(all_contests) - set(keep_contests)
# other_contests

Subset dataframe by contests of interest only

In [11]:
ga_22_primary_sw = ga_22_primary[ga_22_primary['contest'].isin(keep_contests)].copy()

Check number of unique contests in new dataframe

In [12]:
# of contests in new df == len of keep_contests list
ga_22_primary_sw['contest'].nunique() == len(keep_contests)

True

## Add FIPS Column, Create Unique ID Column

Add county FIPS

In [13]:
def create_fips_col(csv_path, state_name_string, df, county_col_string):
    fips_file = pd.read_csv(csv_path)
    fips_file = fips_file[fips_file["State"] == state_name_string]
    fips_file["FIPS County"] = fips_file["FIPS County"].astype(str)
    fips_file["FIPS County"] = fips_file["FIPS County"].str.zfill(3)
    fips_file['County Name'] = fips_file['County Name'].apply(lambda x: x.replace(' ', ''))
    fips_file['County Name'] = fips_file['County Name'].apply(lambda x: str(x).lower())
    fips_dict = dict(zip(fips_file['County Name'], fips_file['FIPS County']))
    df['COUNTYFP'] = df[county_col_string].apply(lambda x: str(x).lower())
    df['COUNTYFP'] = df['COUNTYFP'].map(fips_dict).fillna(df[county_col_string])
    df['COUNTYFP'] = df['COUNTYFP'].astype(str)
    df['COUNTYFP'] = df['COUNTYFP'].str.zfill(3)
    return df

In [15]:
ga_22_primary_sw = create_fips_col("./FIPS/US_FIPS_Codes.csv", 'Georgia', ga_22_primary_sw, 'county')

Create Unique_Id col, check if 2707 unique values

In [16]:
ga_22_primary_sw['UNIQUE_ID'] = ga_22_primary_sw['COUNTYFP'] + '-' +ga_22_primary['precinct']

Looking for 2707 unique precincts in GA

In [17]:
ga_22_primary_sw['UNIQUE_ID'].nunique()

2707

## Standardize contests

In [18]:
orig_contest_names = ga_22_primary_sw.contest.unique()

Create dictionary to rename contests

In [19]:
contest_dict = {}
dem = ' - Dem'
rep = ' - Rep'

create list of contest types based on keywords list

In [20]:
#TODO: simplify this by looping over keywords_list, and then unpacking into separate contest lists?!
senate = hlp.contests_to_keep(keep_contests, [keywords_list[0]])
house = hlp.contests_to_keep(keep_contests, [keywords_list[1]])
governor = hlp.contests_to_keep(keep_contests, [keywords_list[2]])
attorneygen = hlp.contests_to_keep(keep_contests, [keywords_list[3]])
secofstate = hlp.contests_to_keep(keep_contests, [keywords_list[4]])
stateschoolsuper = hlp.contests_to_keep(keep_contests, [keywords_list[5]])
c_ag = hlp.contests_to_keep(keep_contests, [keywords_list[6]])
c_labor = hlp.contests_to_keep(keep_contests, [keywords_list[7]])
c_insurance = hlp.contests_to_keep(keep_contests, [keywords_list[8]])
psc = hlp.contests_to_keep(keep_contests, [keywords_list[9], keywords_list[10]])
statesupremecourt = hlp.contests_to_keep(keep_contests, [keywords_list[11]])
statecourtofappeals = hlp.contests_to_keep(keep_contests, [keywords_list[12]])
statesenate = hlp.contests_to_keep(keep_contests, [keywords_list[13]])
statehouse = hlp.contests_to_keep(keep_contests, [keywords_list[14]])

In [21]:
##TESTING simplified code
# contests = []

# for keyword in keywords_list:
#     contests.append(hlp.contests_to_keep(keep_contests, [keyword]))

# senate, house, governor, attorneygen, secofstate, stateschoolsuper, c_ag, c_labor, c_insurance, psc, statesupremecourt, statecourtofappeals, statesenate, statehouse = contests


In [22]:
def clean_contest_simple(contest_list, cleaned_contest_str, contest_dict):
    for i in contest_list:
        cleaned_contest = cleaned_contest_str
        if 'dem' in i.lower():
            cleaned_contest += dem
        elif 'rep' in i.lower():
            cleaned_contest += rep
        #print(cleaned_contest)
        contest_dict[i] = cleaned_contest

In [23]:
clean_contest_simple(senate, keywords_list[0], contest_dict)
clean_contest_simple(attorneygen, keywords_list[3], contest_dict)
clean_contest_simple(secofstate, keywords_list[4], contest_dict)
clean_contest_simple(stateschoolsuper, keywords_list[5], contest_dict)
clean_contest_simple(c_ag, 'Commissioner of Agriculture', contest_dict)
clean_contest_simple(c_labor, 'Commissioner of Labor', contest_dict)
clean_contest_simple(c_insurance, 'Commissioner of Insurance', contest_dict)

In [24]:
def clean_contest_districts(contest_list, cleaned_contest_str, contest_dict):
    for i in contest_list:
        if '/' in i:
            k = i.split('/')[0]
        else:
            k = i
        if 'dem' in i.lower():
            party = dem
        elif 'rep' in i.lower():
                party = rep
        dist = ''.join(filter(str.isdigit, k))
        cleaned_contest = cleaned_contest_str + dist + party
        #print(cleaned_contest)
        contest_dict[i] = cleaned_contest

In [25]:
clean_contest_districts(house, 'US House - District ', contest_dict)
clean_contest_districts(psc, 'PSC - District ', contest_dict)
clean_contest_districts(statesenate, 'State Senate - District ', contest_dict)
clean_contest_districts(statehouse, 'State House - District ', contest_dict)

In [26]:
for i in statecourtofappeals:
    cleaned_contest = 'Court of Appeals -'
    if 'barnes' in i.lower():
        cleaned_contest += 'Barnes'
    elif 'mcfadden' in i.lower():
        cleaned_contest += 'McFadden'
    elif 'pipkin' in i.lower():
        cleaned_contest += 'Pipkin'
    contest_dict[i] = cleaned_contest

In [27]:
for i in statesupremecourt:
    cleaned_contest = 'Supreme Court - '
    if 'colvin' in i.lower():
        cleaned_contest += 'Colvin'
    elif 'lagrua' in i.lower() or 'lagura' in i.lower():
        cleaned_contest += 'LaGrua'
    elif 'mcmillian' in i.lower():
        cleaned_contest += 'McMillian'
    contest_dict[i] = cleaned_contest

In [28]:
for i in governor:
    cleaned_contest = 'Governor'
    if 'lieutenant' in i.lower() or 'liutenant' in i.lower():
        if 'dem' in i.lower():
            cleaned_contest = 'Lieutenant ' + cleaned_contest + dem
        elif 'rep' in i.lower():
            cleaned_contest = 'Lieutenant ' + cleaned_contest + rep
    elif 'lieutenant' not in i.lower():
        if 'dem' in i.lower():
            cleaned_contest += dem
        elif 'rep' in i.lower():
            cleaned_contest+= rep
    contest_dict[i] = cleaned_contest

In [29]:
ga_22_primary_sw['contest'] = ga_22_primary_sw['contest'].map(contest_dict).fillna(ga_22_primary_sw['contest'])

Check to make sure all contests name variations captured

In [30]:
ga_22_primary_sw.contest.nunique()

406

visually inspect standardized contest names

In [31]:
#ga_22_primary_sw['contest'].value_counts()

## Clean columns, create pivot column

Add column to indicate incumbency, remove incumbency status from candidate name

In [32]:
incumbency_mask = ga_22_primary_sw['choice'].str.contains(r'\(I\)')
ga_22_primary_sw['Incumbent'] = 0
ga_22_primary_sw.loc[incumbency_mask, 'Incumbent'] = 1

In [33]:
# removing incumbent status from candidate name
ga_22_primary_sw['choice'] = ga_22_primary_sw['choice'].apply(lambda x: x.replace('(I)', ''))

In [35]:
def create_pivot_col(df, name_string, contest_string, pivot_string):
    df[name_string] = df[name_string].apply(lambda x: str(x).strip())
    df[contest_string] = df[contest_string].apply(lambda x: str(x).strip())
    df[name_string] = df[name_string].apply(lambda x:' '.join(str(x).split())) # This removes extra spaces between first and last name
    substrings_to_remove = ['.', "'", '"', ',', '(I)']
    for substring in substrings_to_remove:
        df[name_string] = df[name_string].apply(lambda x: x.replace(substring, ''))
        df[contest_string] = df[contest_string].apply(lambda x: x.replace(substring, ''))
    #Anomalies specific to this election
    df[name_string] = df[name_string].apply(lambda x: x.replace('Deloach', 'DeLoach'))
    df[name_string] = df[name_string].apply(lambda x: str(x).strip())
    df[contest_string] = df[contest_string].apply(lambda x: str(x).strip())
    df[pivot_string]= df[name_string]+ ' -:- ' + df[contest_string]
    return df

In [36]:
ga_22_primary_sw = create_pivot_col(ga_22_primary_sw, 'choice', 'contest', 'pivot')

In [37]:
ga_22_primary_sw.head(1)

Unnamed: 0,county,contest,choice,voting_method,precinct,num_votes,election,COUNTYFP,UNIQUE_ID,Incumbent,pivot
0,Appling,US Senate - Rep,Gary W Black,Election Day Votes,1B,41,primary,1,001-1B,0,Gary W Black -:- US Senate - Rep


## Pivot Data

Create pivot table

In [39]:
ga_22_primary_sw_pvt =pd.pivot_table(ga_22_primary_sw,index=["UNIQUE_ID","county","COUNTYFP","precinct"],columns=["pivot"],values=['num_votes'],aggfunc=sum).fillna(0)

Clean up index

In [40]:
ga_22_primary_sw_pvt.columns = ga_22_primary_sw_pvt.columns.droplevel(0)
ga_22_primary_sw_pvt.reset_index(inplace = True)

In [41]:
#check
ga_22_primary_sw_pvt.head()

pivot,UNIQUE_ID,county,COUNTYFP,precinct,Adam Petty -:- State Senate - District 38 - Dem,Afoma Eguh Okafor -:- State House - District 71 - Dem,Al Williams -:- State House - District 168 - Dem,Al Wynn -:- State House - District 153 - Dem,Alan Powell -:- State House - District 33 - Rep,Alan Sims -:- US House - District 10 - Rep,...,William C Harris -:- State House - District 126 - Rep,William Harris -:- State House - District 74 - Dem,William Park Freeman -:- State House - District 88 - Rep,William Will Boddie Jr -:- Commissioner of Labor - Dem,Willie Mae Oyogoa -:- State House - District 44 - Dem,Winfred Dukes -:- Commissioner of Agriculture - Dem,Yasmin Neal -:- State House - District 79 - Dem,Yg Nyghtstorm -:- US House - District 7 - Rep,Zach Procter -:- State House - District 101 - Rep,Zeph Baker -:- State House - District 140 - Dem
0,001-1B,Appling,1,1B,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,10.0,0.0,5.0,0.0,0.0,0.0,0.0
1,001-1C,Appling,1,1C,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,3.0,0.0,6.0,0.0,0.0,0.0,0.0
2,001-2,Appling,1,2,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,48.0,0.0,86.0,0.0,0.0,0.0,0.0
3,001-3A1,Appling,1,3A1,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,5.0,0.0,3.0,0.0,0.0,0.0,0.0
4,001-3C,Appling,1,3C,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,18.0,0.0,33.0,0.0,0.0,0.0,0.0


Write functions to rename columns
Georgia Primary specific modifications:
- general election for supreme court and court of appeals races
- 180 seats in state house, requiring 3 character integers
- at-large public service commissioner race included

In [45]:
#functions to rename columns
def get_election_type_year(race_string):
    if "supreme court" in race_string.lower():
         electype = "S"
    else:
        electype = "P"
    if any(word in race_string.lower() for word in ['us house', 'state house', 'state senate', 'public service commissioner', 'psc']):
        return electype
    else:
        return electype +"22"
    
def get_race(race_string):
    race_string = race_string.lower()
    if '/' in race_string:
        race_string = race_string.split('/')[0]
    race = ''
    if "u.s. house" in race_string or 'us house' in race_string:
        race = "CON"
    elif "state house" in race_string:
        race =  "SL"
    elif "state senate" in race_string:
        race = "SU"
    elif "president" in race_string:
        race = "PRE"
    elif "us senate" in race_string or "u.s senate" in race_string:
        race = "USS"
    elif "public service" in race_string:
        race = "PSC"
    elif "attorney general" in race_string:
        race = "ATG"
    elif "auditor general" in race_string:
        race = "AUD"
    elif "treasurer" in race_string:
        race = "TRE"
    elif "superintendent" in race_string:
        race = "SUP"
    elif "secretary of state" in race_string:
        race = "SOS"
    elif "lieutenant governor" in race_string:
        race = "LTG"
    elif "governor" in race_string:
        race = "GOV"
    elif "commissioner of labor" in race_string:
        race = "LAB"
    elif "commissioner of agriculture" in race_string:
        race = "AGR"
    elif "commissioner of insurance" in race_string:
        race = "INS"
    elif "state school superintendent" in race_string:
        race = "SUP"
    elif "public service commissioner" in race_string or 'psc' in race_string:
        race = "PSC"
    elif "supreme court" in race_string:
        race = "SSC"
    elif "court of appeals" in race_string:
        race = "COA"
    if any(word in race_string for word in ['us house', 'state senate', 'public service commissioner', 'psc']):
        district = ''.join(filter(str.isdigit, race_string)).zfill(2)
    elif 'state house' in race_string:
        district = ''.join(filter(str.isdigit, race_string)).zfill(3)
    else:
        district = ''
    return race + district

def get_party(race_string):
    if "rep" in race_string.lower():
        return "R"
    elif "dem" in race_string.lower():
        return "D"
    elif "supreme court" in race_string.lower() or "court of appeals" in race_string.lower():
        return "N"
    
def get_name(name_string):
    name_string = name_string.split("-:-")[0]
    name_string = name_string.replace("'","")
    name_string = name_string.replace('"','')
    name_string = name_string.replace(',','')
    name_string = name_string.strip()
    if name_string.split(" ")[-1] in ['II', 'III', 'Jr', 'Jr.', 'Sr.', 'JR.', "JR", "IV"]:
            likely_last = name_string.split(" ")[-2]
    else:
        likely_last = name_string.split(" ")[-1]
    return likely_last[:3].upper()

def get_VEST(race_string):
    electype = get_election_type_year(race_string)
    contest = get_race(race_string)
    party = get_party(race_string)
    candidate = get_name(race_string)
    vest_name = electype+contest+party+candidate
    if len(vest_name) > 10:
        print(vest_name)
    return vest_name

Create function that takes creates dictionary to rename columns in pivoted df

In [48]:
def create_column_rename_dicts(df, exclude_columns):
    contest_columns = [i for i in df.columns if i not in exclude_columns]

    contest_updates_dict = {}
    contest_updates_reversed = {}
    clean_dups = {}
    new_names = []
    
    for val in contest_columns:
        new_name = get_VEST(val)  # get_VEST
        contest_updates_dict[val] = new_name
        
        if new_name not in new_names:
            new_names.append(new_name)
            contest_updates_reversed[new_name] = val
        else:
            print("Duplicate", new_name)
            print(contest_updates_reversed[new_name])
            print(val)
            clean_dups[val] = contest_updates_reversed[new_name]
    
    return contest_updates_dict, contest_updates_reversed, clean_dups

Create a renaming dictionary for pivoted df

In [49]:
exclude_columns = ['UNIQUE_ID', 'county', 'COUNTYFP', 'precinct']
contest_updates_dict, contest_updates_reversed, clean_dups = create_column_rename_dicts(ga_22_primary_sw_pvt, exclude_columns)

Duplicate P22LTGDBRO
Tony Brown -:- Lieutenant Governor - Dem
Tyrone Brooks Jr -:- Lieutenant Governor - Dem


Manually correct naming convention for candidates with similar last name in Lieutenant Governor - Dem contest

In [50]:
contest_updates_dict['Tony Brown -:- Lieutenant Governor - Dem'] = 'P22LTGDBTO'
contest_updates_dict['Tyrone Brooks Jr -:- Lieutenant Governor - Dem'] = 'P22LTGDBTY'
contest_updates_reversed['P22LTGDBTO'] = 'Tony Brown -:- Lieutenant Governor - Dem'
contest_updates_reversed['P22LTGDBTY'] = 'Tyrone Brooks, Jr -:- Lieutenant Governor - Dem'

Check if all dictionary values are under 10 characters, and over 7 characters

In [51]:
for item in contest_updates_dict.values():
    if len(item) > 10 or len(item) < 8:
        print(item)
        print(contest_updates_reversed[item])

rename columns using dictionary

In [52]:
ga_22_primary_sw_pvt.rename(columns = contest_updates_dict, inplace = True)
ga_22_primary_sw_pvt.reset_index(inplace = True, drop = True)

set columns with votes as integer type

In [53]:
for item in contest_updates_dict.values():
    ga_22_primary_sw_pvt[item] = ga_22_primary_sw_pvt[item].astype(int)

In [54]:
ga_22_primary_sw_pvt.head(2)

pivot,UNIQUE_ID,county,COUNTYFP,precinct,PSU38DPET,PSL071DOKA,PSL168DWIL,PSL153DWYN,PSL033RPOW,PCON10RSIM,...,PSL126RHAR,PSL074DHAR,PSL088RFRE,P22LABDBOD,PSL044DOYO,P22AGRDDUK,PSL079DNEA,PCON07RNYG,PSL101RPRO,PSL140DBAK
0,001-1B,Appling,1,1B,0,0,0,0,0,0,...,0,0,0,10,0,5,0,0,0,0
1,001-1C,Appling,1,1C,0,0,0,0,0,0,...,0,0,0,3,0,6,0,0,0,0


In [55]:
precinct_names = list(ga_22_primary_sw_pvt["precinct"].unique())
precinct_names.sort()

Export to CSV

In [56]:
ga_22_primary_sw_pvt.to_csv("./ga_22_primary_prec.csv", index = False)

<p><a name="check"></a></p>

## Vote Totals Check
### Statewide

<b> #TODO : leave this in for now to show Spencer discrepancies, then delete this section </b>

Read in statewide csv summary file

In [57]:
primary_sos_totals = pd.read_csv("./raw-from-source/summary/summary.csv")
combined_sos_totals = pd.concat([primary_sos_totals])

Filter for races of interest

In [58]:
combined_sos_totals['contest name'] = combined_sos_totals['contest name'].apply(lambda x: str(x).strip().replace('(Vote For 1)', ''))

In [1]:
filtered_sos_totals = combined_sos_totals[~combined_sos_totals['contest name'].str.contains('superior court|party question|state court judge|district attorney|board of education', case=False, na=False)].copy()

NameError: name 'combined_sos_totals' is not defined

Clean columns to match precinct-wise df

In [60]:
sos_totals = create_pivot_col(filtered_sos_totals, 'choice name', 'contest name', 'pivot_col')

In [61]:
sos_totals['pivot_col'].nunique()

655

use function to rename VEST columns

In [62]:
sos_totals

Unnamed: 0,line number,contest name,choice name,party name,total votes,percent of votes,registered voters,ballots cast,num Precinct total,num Precinct rptg,over votes,under votes,pivot_col
0,1,US Senate - Rep,Gary W Black,REP,157370,13.35,0,0,159,159,0,100,Gary W Black -:- US Senate - Rep
1,2,US Senate - Rep,Josh Clark,REP,46693,3.96,0,0,159,159,0,100,Josh Clark -:- US Senate - Rep
2,3,US Senate - Rep,Kelvin King,REP,37930,3.22,0,0,159,159,0,100,Kelvin King -:- US Senate - Rep
3,4,US Senate - Rep,Jonathan Jon McColumn,REP,28601,2.43,0,0,159,159,0,100,Jonathan Jon McColumn -:- US Senate - Rep
4,5,US Senate - Rep,Latham Saddler,REP,104471,8.86,0,0,159,159,0,100,Latham Saddler -:- US Senate - Rep
5,6,US Senate - Rep,Herschel Junior Walker,REP,803560,68.18,0,0,159,159,0,100,Herschel Junior Walker -:- US Senate - Rep
6,7,US Senate - Dem,Tamara Johnson-Shealey,DEM,28984,3.96,0,0,159,159,0,23,Tamara Johnson-Shealey -:- US Senate - Dem
7,8,US Senate - Dem,Raphael Warnock,DEM,702610,96.04,0,0,159,159,0,23,Raphael Warnock -:- US Senate - Dem
8,9,Governor - Rep,Catherine Davis,REP,9788,0.81,0,0,159,159,0,22,Catherine Davis -:- Governor - Rep
9,10,Governor - Rep,Brian Kemp,REP,888078,73.72,0,0,159,159,0,22,Brian Kemp -:- Governor - Rep


In [63]:
sos_totals['VEST'] = sos_totals['pivot_col'].apply(lambda x: get_VEST(str(x).strip()))

In [64]:
sos_totals['VEST'].nunique()

654

Check rdh data against state summary

In [65]:
statewide_check_list = []
doesnt_check = []
for item in contest_updates_dict.values():
    official_ls = list(sos_totals.loc[sos_totals["VEST"] == item, "total votes"])
    if len(official_ls)<1:
        doesnt_check.append(item)
#         print(item)
#         print(contest_updates_reversed[item])
    else:
        official = official_ls[0]
    rdh = ga_22_primary_sw_pvt[item].sum()
    if official != rdh:
        statewide_check_list.append(item)
        print(contest_updates_reversed[item])
        print(f"{item}\n\tOfficial: {official}\n\tRDH: {rdh}")

Al Wynn -:- State House - District 153 - Dem
PSL153DWYN
	Official: 3983
	RDH: 1315
Angela Moore -:- State House - District 91 - Dem
PSL091DMOO
	Official: 989
	RDH: 8255
Anne Allen Westbrook -:- State House - District 163 - Dem
PSL163DWES
	Official: 4209
	RDH: 4021
Anne Elizabeth Barnes -:- Court of Appeals -Barnes
P22COANBAR
	Official: 1626523
	RDH: 1629284
Anthony Dickson -:- State House - District 134 - Dem
PSL134DDIC
	Official: 1626523
	RDH: 2283
Ariel Phillips -:- State House - District 147 - Dem
PSL147DPHI
	Official: 4040
	RDH: 3078
Becky Evans -:- State House - District 89 - Dem
PSL089DEVA
	Official: 1812
	RDH: 10065
Benjamin Stahl -:- State House - District 43 - Dem
PSL043DSTA
	Official: 21070
	RDH: 1383
Bentley Hudgins -:- State House - District 90 - Dem
PSL090DHUD
	Official: 21070
	RDH: 1627
Billie Boyd-Cox -:- State House - District 113 - Dem
PSL113DBOY
	Official: 8406
	RDH: 1555
Billy Mitchell -:- State House - District 88 - Dem
PSL088DMIT
	Official: 21331
	RDH: 4576
Brian L

PCON10DJOH
	Official: 63646
	RDH: 15015
Tamarre Pierre -:- State House - District 39 - Dem
PSL039DPIE
	Official: 28984
	RDH: 1355
Teri Anulewicz -:- State House - District 42 - Dem
PSL042DANU
	Official: 4311
	RDH: 3535
Terry Cummings -:- State House - District 39 - Dem
PSL039DCUM
	Official: 4311
	RDH: 1752
Thomas Casez -:- State House - District 40 - Dem
PSL040DCAS
	Official: 2110
	RDH: 2281
Tony Brown -:- Lieutenant Governor - Dem
P22LTGDBTO
	Official: 3228
	RDH: 27905
Traci Acree George -:- State House - District 132 - Dem
PSL132DGEO
	Official: 18681
	RDH: 2624
Trea Pipkin -:- Court of Appeals -Pipkin
P22COANPIP
	Official: 1606449
	RDH: 1609183
Tyrone Brooks, Jr -:- Lieutenant Governor - Dem
P22LTGDBTY
	Official: 1692
	RDH: 74855
Verda M Colvin -:- Supreme Court - Colvin
S22SSCNCOL
	Official: 1168175
	RDH: 1170137
Veronica Brinson -:- Supreme Court - Colvin
S22SSCNBRI
	Official: 541628
	RDH: 542561
Viola Davis -:- State House - District 87 - Dem
PSL087DDAV
	Official: 9108
	RDH: 6739


Manually checking discrepant vote totals against official excel file. RDH totals match official totals in excel in all but 8 contests. In some contests, official csv undercounts are because of exclusion of early, and provisional ballots. In many contests, I'm unable to trace the discrepancies.

Tabitha Johnson-Green - manually combine -> PCON10DJOH and PCON10DGRE
Keith L Jenkins Sr - PSL173DSR but should end with JEN with 650 votes

In [66]:
minus_sw_list = set(statewide_check_list)
#Sanford Bishop, CON2D, rdh overcount by 2,405
rdh_error = ['PCON02DBIS', 'PSL173RTAY', 'PSL173DSR','P22COANBAR', 'P22COANMCF', 'P22COANPIP', 'S22SSCNMCM','S22SSCNBRI', 'S22SSCNCOL', 'S22SSCNLAG',]

In [67]:
len(statewide_check_list)
#len(doesnt_check)

132

In [68]:
set(statewide_check_list) - set(doesnt_check)

{'P22COANBAR',
 'P22COANMCF',
 'P22COANPIP',
 'PSL173RTAY',
 'S22SSCNBRI',
 'S22SSCNCOL',
 'S22SSCNLAG',
 'S22SSCNMCM'}

### Checking against official County level results

Moving to the XML county totals file from Georgia SOS office.

In [69]:
loaded_counties = os.listdir("./raw-from-source/summary/county_checks")
z=[]
for locale in loaded_counties:
    if locale.endswith('.xml'):
        file_string = "./raw-from-source/summary/county_checks/"+locale
        xtree = et.parse(file_string)
        xroot = xtree.getroot()
        state_area = xroot.findall(".//Region")
        for i in state_area:
            state = i.text
        contests = xroot.findall(".//Contest")
        for i in contests:
            contest = i.attrib.get('text')
            lower = i.findall("./Choice")
            for j in lower:
                choice = j.attrib.get('text')
                lower_2 = j.findall("./VoteType")
                for k in lower_2:
                    voting_method = k.attrib.get('name')
                    lower_3 = k.findall("./County")
                    for l in lower_3:
                        county_name = l.attrib.get('name')
                        num_votes = l.attrib.get('votes')
                        if locale == "detail 2.xml":
                            elec_type = "special"
                        else:
                            elec_type = "primary"
                        z.append([state,contest,choice,voting_method,county_name,num_votes, elec_type])
dfcols = ['state','contest','choice','voting_method','county','num_votes',"type"]
df_county = pd.DataFrame(z,columns=dfcols)

In [70]:
filtered_county_totals = df_county[~df_county['contest'].str.contains('superior court|party question|state court judge|district attorney|board of education', case=False, na=False)].copy()

In [71]:
filtered_county_totals.head()

Unnamed: 0,state,contest,choice,voting_method,county,num_votes,type
0,GA,US Senate - Rep,Gary W. Black,Election Day Votes,Appling,401,primary
1,GA,US Senate - Rep,Gary W. Black,Election Day Votes,Atkinson,102,primary
2,GA,US Senate - Rep,Gary W. Black,Election Day Votes,Bacon,158,primary
3,GA,US Senate - Rep,Gary W. Black,Election Day Votes,Baker,104,primary
4,GA,US Senate - Rep,Gary W. Black,Election Day Votes,Baldwin,279,primary


In [72]:
filtered_county_totals["num_votes"] = filtered_county_totals["num_votes"].astype(int)

In [73]:
sos_county_fips = create_fips_col("./FIPS/US_FIPS_Codes.csv", 'Georgia', filtered_county_totals, 'county')

In [74]:
sos_county_pvt = create_pivot_col(sos_county_fips, 'choice', 'contest', 'pivot')

In [75]:
sos_county_pvt.head()

Unnamed: 0,state,contest,choice,voting_method,county,num_votes,type,COUNTYFP,pivot
0,GA,US Senate - Rep,Gary W Black,Election Day Votes,Appling,401,primary,1,Gary W Black -:- US Senate - Rep
1,GA,US Senate - Rep,Gary W Black,Election Day Votes,Atkinson,102,primary,3,Gary W Black -:- US Senate - Rep
2,GA,US Senate - Rep,Gary W Black,Election Day Votes,Bacon,158,primary,5,Gary W Black -:- US Senate - Rep
3,GA,US Senate - Rep,Gary W Black,Election Day Votes,Baker,104,primary,7,Gary W Black -:- US Senate - Rep
4,GA,US Senate - Rep,Gary W Black,Election Day Votes,Baldwin,279,primary,9,Gary W Black -:- US Senate - Rep


In [76]:
sos_county_pvt['VEST'] = sos_county_pvt['pivot'].apply(lambda x: get_VEST(str(x).strip()))

In [77]:
sos_county_pvt['VEST'].nunique()

654

In [79]:
sos_county_totals_pvt =pd.pivot_table(sos_county_pvt,index=['county'],columns=['VEST'],values=['num_votes'],aggfunc=sum)
sos_county_totals_pvt = sos_county_totals_pvt.fillna(0)
sos_county_totals_pvt.columns = sos_county_totals_pvt.columns.droplevel(0)
sos_county_totals_pvt.reset_index(inplace = True)

In [80]:
#looking for 159
sos_county_totals_pvt.shape

(159, 655)

In [81]:
sos_county_totals_pvt.head()

VEST,county,P22AGRDDUK,P22AGRDHEM,P22AGRDSWA,P22AGRRHAR,P22ATGDJOR,P22ATGDSMI,P22ATGRCAR,P22ATGRGOR,P22COANBAR,...,PSU54RKEE,PSU54RPAY,PSU55DBUT,PSU55DODI,PSU56DTHO,PSU56RALB,S22SSCNBRI,S22SSCNCOL,S22SSCNLAG,S22SSCNMCM
0,Appling,153.0,200.0,119.0,3161.0,307.0,173.0,2310.0,1206.0,3438.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1113.0,2578.0,3439.0,3452.0
1,Atkinson,76.0,83.0,51.0,888.0,145.0,69.0,628.0,278.0,993.0,...,0.0,0.0,0.0,0.0,0.0,0.0,335.0,704.0,983.0,993.0
2,Bacon,45.0,64.0,46.0,2147.0,109.0,42.0,1468.0,669.0,2021.0,...,0.0,0.0,0.0,0.0,0.0,0.0,504.0,1660.0,2019.0,2024.0
3,Baker,230.0,18.0,15.0,370.0,138.0,87.0,369.0,75.0,573.0,...,0.0,0.0,0.0,0.0,0.0,0.0,197.0,416.0,570.0,573.0
4,Baldwin,735.0,1659.0,387.0,4171.0,1956.0,919.0,3527.0,1002.0,6638.0,...,0.0,0.0,0.0,0.0,0.0,0.0,2244.0,4731.0,6586.0,6623.0


In [None]:
cols_list = ga_22_primary_sw_pvt.columns.tolist()[4:]
cols_list_temp = cols_list[:3]
cols_list_temp

In [None]:
print("***Countywide Totals Check***")
print("")
print("")
diff_counties=[]
for race in cols_list_temp:
    rdh = ga_22_primary_sw_pvt.groupby('county')[race].sum().sum
    partner = sos_county_pvt
    
#         diff = partner_df.groupby([county_col]).sum()[race]-source_df.groupby([county_col]).sum()[race]
#         for val in diff[diff != 0].index.values.tolist():
#             if val not in diff_counties:
#                 diff_counties.append(val)
#         if len(diff[diff != 0]!=0):   
#             print(race + " contains differences in these counties:")
#             for val in diff[diff != 0].index.values.tolist():
#                 county_differences = diff[diff != 0]
#                 print("\t"+val+" has a difference of "+str(county_differences[val])+" votes")
#                 print("\t\t"+ partner_name + ": "+str(partner_df.groupby([county_col]).sum().loc[val,race])+" votes")
#                 print("\t\t"+ source_name +": "+str(source_df.groupby([county_col]).sum().loc[val,race])+" votes")
#             if (full_print):
#                 for val in diff[diff == 0].index.values.tolist():
#                     county_similarities = diff[diff == 0]
#                     print("\t"+val + ": "+ str(partner_df.groupby([county_col]).sum().loc[val,race])+" votes")
#         else:
#             print(race + " is equal across all counties")
#             if (full_print):
#                 for val in diff[diff == 0].index.values.tolist():
#                     county_similarities = diff[diff == 0]
#                     print("\t"+val + ": "+ str(partner_df.groupby([county_col]).sum().loc[val,race])+" votes")
#     if (len(diff_counties)>0):
#         print()
#         print(diff_counties)

In [82]:
ga_22_primary_sw_pvt.groupby('county')['PSU38DPET'].sum()

county
Appling            0
Atkinson           0
Bacon              0
Baker              0
Baldwin            0
Banks              0
Barrow             0
Bartow             0
Ben Hill           0
Berrien            0
Bibb               0
Bleckley           0
Brantley           0
Brooks             0
Bryan              0
Bulloch            0
Burke              0
Butts              0
Calhoun            0
Camden             0
Candler            0
Carroll            0
Catoosa            0
Charlton           0
Chatham            0
Chattahoochee      0
Chattooga          0
Cherokee           0
Clarke             0
Clay               0
Clayton            0
Clinch             0
Cobb             483
Coffee             0
Colquitt           0
Columbia           0
Cook               0
Coweta             0
Crawford           0
Crisp              0
Dade               0
Dawson             0
DeKalb             0
Decatur            0
Dodge              0
Dooly              0
Dougherty          0
Dougla

In [83]:
sos_county_pvt.head()

Unnamed: 0,state,contest,choice,voting_method,county,num_votes,type,COUNTYFP,pivot,VEST
0,GA,US Senate - Rep,Gary W Black,Election Day Votes,Appling,401,primary,1,Gary W Black -:- US Senate - Rep,P22USSRBLA
1,GA,US Senate - Rep,Gary W Black,Election Day Votes,Atkinson,102,primary,3,Gary W Black -:- US Senate - Rep,P22USSRBLA
2,GA,US Senate - Rep,Gary W Black,Election Day Votes,Bacon,158,primary,5,Gary W Black -:- US Senate - Rep,P22USSRBLA
3,GA,US Senate - Rep,Gary W Black,Election Day Votes,Baker,104,primary,7,Gary W Black -:- US Senate - Rep,P22USSRBLA
4,GA,US Senate - Rep,Gary W Black,Election Day Votes,Baldwin,279,primary,9,Gary W Black -:- US Senate - Rep,P22USSRBLA


In [85]:
# #sos_county_pvt.groupby('VEST', 'county')[num_votes].agg('sum')
# sosgroup = sos_county_pvt.groupby('county').sum()['']
# sosgroup.head()

In [None]:
df = ga_22_primary_sw_pvt
sos = sos_county_totals_pvt
grouped_rdh = df.groupby('county')[cols_list].agg('sum') 

In [None]:
sos = sos_county_totals_pvt
sos.head()

In [None]:
sos.columns.to_list()

In [None]:
sos[['county', 'P22DJAC']].sum()

In [None]:
grouped_rdh[['P22DJAC']].sum()
#PSL132DPRI

In [116]:
def county_totals_check(partner_df, partner_name, source_df, source_name, column_list, county_col, full_print=False, method="race"):
    """Compares the totals of two election result dataframes at the county level

    Args:
      partner_df: DataFrame of election results we are comparing against
      partner_name: String of what to call the partner in the print statement
      source_df: DataFrame of election results we are comparing to
      source_name: String of what to call the source in the print statement
      column_list: List of races that there are votes for
      county_col: String of the column name that contains county information
      full_print: Boolean specifying whether to print out everything, including counties w/ similarities

    Returns:
      Nothing, only prints out an analysis
    """
    
    print("***Countywide Totals Check***")
    print("")
    
    if method == "race":
        diff_counties=[]
        for race in column_list:
            diff = partner_df.groupby([county_col]).sum()[race]-source_df.groupby([county_col]).sum()[race]
            for val in diff[diff != 0].index.values.tolist():
                if val not in diff_counties:
                    diff_counties.append(val)
            if len(diff[diff != 0]!=0):   
                print(race + " contains differences in these counties:")
                for val in diff[diff != 0].index.values.tolist():
                    county_differences = diff[diff != 0]
                    print("\t"+val+" has a difference of "+str(county_differences[val])+" votes")
                    print("\t\t"+ partner_name + ": "+str(partner_df.groupby([county_col]).sum().loc[val,race])+" votes")
                    print("\t\t"+ source_name +": "+str(source_df.groupby([county_col]).sum().loc[val,race])+" votes")
                if (full_print):
                    for val in diff[diff == 0].index.values.tolist():
                        county_similarities = diff[diff == 0]
                        print("\t"+val + ": "+ str(partner_df.groupby([county_col]).sum().loc[val,race])+" votes")
            else:
                print(race + " is equal across all counties")
                if (full_print):
                    for val in diff[diff == 0].index.values.tolist():
                        county_similarities = diff[diff == 0]
                        print("\t"+val + ": "+ str(partner_df.groupby([county_col]).sum().loc[val,race])+" votes")
        if (len(diff_counties)>0):
            print()
            diff_counties.sort()
            print(diff_counties)
    elif method == "county":
        if set(source_df[county_col].unique()) != set(partner_df[county_col].unique()):
            raise ValueError("Not all counties will be checked")
        diff_counties=[]
        good_counties=[]
        holder_1 = partner_df.groupby(county_col).sum()
        holder_2 = source_df.groupby(county_col).sum()
        for county in list(partner_df[county_col].unique()):
            no_diff = True
            for race in column_list:
                partner_val = holder_1.loc[county][race]
                source_val =  holder_2.loc[county][race]
                diff = partner_val - source_val
                if diff != 0:
                    if no_diff:
                        print(f"{county} contains differences in these races:")
                        no_diff = False
                    print(f"\t{race} has a difference of {diff} vote(s)")
                    print(f"\t\t{partner_name}: {partner_val} vote(s)")
                    print(f"\t\t{source_name}: {source_val} vote(s)")
            if no_diff:
                good_counties.append(county)
            else:
                diff_counties.append(county)
        if (len(diff_counties)>0):
            print()
            diff_counties.sort()
            print(diff_counties)
        print("Counties that match:")
        if (len(good_counties)>0):
            print()
            good_counties.sort()
            print(good_counties)
    else:
        raise ValueError("Enter a correct method: race or county")

In [118]:
rdh = ga_22_primary_sw_pvt
sos = sos_county_totals_pvt
partner_name = 'SOS'
source_name = 'RDH'
county_col = 'county'
county_totals_check(sos,partner_name, rdh, source_name, both_cols, county_col,full_print=False, method='race')

***Countywide Totals Check***

PSU44DDAV is equal across all counties
PSL174RCOR is equal across all counties
PSU43DAND is equal across all counties
PSL014RSCO is equal across all counties
PSU22RDAN is equal across all counties
PSL050RTRA is equal across all counties
PSL017RWOL is equal across all counties
PCON08RSCO is equal across all counties
PSL051DPAN is equal across all counties
PCON14RSTR is equal across all counties
PCON09RLON is equal across all counties
PSL030RSAN is equal across all counties
PSL077DBUR is equal across all counties
PSL074RMAT is equal across all counties
PPSC02DEDW is equal across all counties
PCON10RBRO is equal across all counties
PSU06DEST is equal across all counties
PSL164RSTE is equal across all counties
PSU47RCHA is equal across all counties
PSL096RLOW is equal across all counties
PSL089RSHE is equal across all counties
PSU11RBUR is equal across all counties
PSL175RLAH is equal across all counties
PSL068DNAG is equal across all counties
PCON07DMCB is e

PSL006RCOK is equal across all counties
PCON14RLUT is equal across all counties
P22LTGRMIL is equal across all counties
PSU25RSUL is equal across all counties
PSL018DRHU is equal across all counties
P22LTGRJON is equal across all counties
PSU06RMOO is equal across all counties
PSU32RKIR is equal across all counties
PSU34RSMI is equal across all counties
PSL139RSMI is equal across all counties
PSL006RRID is equal across all counties
PSL050DAU is equal across all counties
PSL178RCAR is equal across all counties
PCON04RCHA is equal across all counties
PCON14RSYN is equal across all counties
PSU03RJON is equal across all counties
PSL011RJAS is equal across all counties
PSL116DHOL is equal across all counties
PSL135RBRU is equal across all counties
P22SUPDMOR is equal across all counties
PCON13RHAW is equal across all counties
P22SOSDDAW is equal across all counties
PSL158DSMI is equal across all counties
PSL086RKIN is equal across all counties
P22LTGDHAY is equal across all counties
P22USS

PSL166RPET is equal across all counties
PSL061DKEM is equal across all counties
PSL024DWAL is equal across all counties
PSU14RHAU is equal across all counties
PSL049DGIL is equal across all counties
PSU27DBIN is equal across all counties
PSL157RWER is equal across all counties
P22GOVRKEM is equal across all counties
PSL056DCHA is equal across all counties
PSU18RVAN is equal across all counties
PCON10RSIM is equal across all counties
P22LABRBHA is equal across all counties
PPSC03DEDW is equal across all counties
PSL070RNUN is equal across all counties
PCON02RCHI is equal across all counties
PSL079DNEA is equal across all counties
PCON02RWES is equal across all counties
PSL018RSMI is equal across all counties
PSU37DPAR is equal across all counties
PSL005RBAR is equal across all counties
PSL009RWAD is equal across all counties
P22LABRCOA is equal across all counties
P22USSDWAR is equal across all counties
PSU29DWRI is equal across all counties
PSU38DCAR is equal across all counties
PSL169

In [122]:
rdh = ga_22_primary_sw_pvt
sos = sos_county_totals_pvt
partner_name = 'SOS'
source_name = 'RDH'
county_col = 'county'
county_totals_check(sos,partner_name, rdh, source_name, precincts_cols, county_col,full_print=False, method='county')

***Countywide Totals Check***

Cook contains differences in these races:
	S22SSCNMCM has a difference of -2773.0 vote(s)
		SOS: 0.0 vote(s)
		RDH: 2773 vote(s)
	S22SSCNCOL has a difference of -1962.0 vote(s)
		SOS: 0.0 vote(s)
		RDH: 1962 vote(s)
	P22COANMCF has a difference of -2762.0 vote(s)
		SOS: 0.0 vote(s)
		RDH: 2762 vote(s)
	S22SSCNBRI has a difference of -933.0 vote(s)
		SOS: 0.0 vote(s)
		RDH: 933 vote(s)
	P22COANPIP has a difference of -2734.0 vote(s)
		SOS: 0.0 vote(s)
		RDH: 2734 vote(s)
	S22SSCNLAG has a difference of -2763.0 vote(s)
		SOS: 0.0 vote(s)
		RDH: 2763 vote(s)
	P22COANBAR has a difference of -2761.0 vote(s)
		SOS: 0.0 vote(s)
		RDH: 2761 vote(s)
Pickens contains differences in these races:
	S22SSCNMCM has a difference of 9.0 vote(s)
		SOS: 6704.0 vote(s)
		RDH: 6695 vote(s)
	S22SSCNLAG has a difference of -9.0 vote(s)
		SOS: 6695.0 vote(s)
		RDH: 6704 vote(s)
Thomas contains differences in these races:
	PSL173RTAY has a difference of -5016.0 vote(s)
		SOS: 0.0

In [121]:
contest_updates_reversed['PSL105DMUG']

'Farooq Mughal -:- State House - District 105 - Dem'

In [99]:
rdh_cols = rdh.columns.to_list()[4:]

In [107]:
both_cols = set(rdh_cols) - set(diff)
len(both_cols)

532

In [101]:
len(rdh_cols)

656

In [102]:
len(sos_cols)

654

In [105]:
diff = set(rdh_cols) - set(sos_cols)
len(diff)

124

In [100]:
sos_cols = sos.columns.to_list()[1:]

In [89]:
sos.head()

VEST,county,P22AGRDDUK,P22AGRDHEM,P22AGRDSWA,P22AGRRHAR,P22ATGDJOR,P22ATGDSMI,P22ATGRCAR,P22ATGRGOR,P22COANBAR,...,PSU54RKEE,PSU54RPAY,PSU55DBUT,PSU55DODI,PSU56DTHO,PSU56RALB,S22SSCNBRI,S22SSCNCOL,S22SSCNLAG,S22SSCNMCM
0,Appling,153.0,200.0,119.0,3161.0,307.0,173.0,2310.0,1206.0,3438.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1113.0,2578.0,3439.0,3452.0
1,Atkinson,76.0,83.0,51.0,888.0,145.0,69.0,628.0,278.0,993.0,...,0.0,0.0,0.0,0.0,0.0,0.0,335.0,704.0,983.0,993.0
2,Bacon,45.0,64.0,46.0,2147.0,109.0,42.0,1468.0,669.0,2021.0,...,0.0,0.0,0.0,0.0,0.0,0.0,504.0,1660.0,2019.0,2024.0
3,Baker,230.0,18.0,15.0,370.0,138.0,87.0,369.0,75.0,573.0,...,0.0,0.0,0.0,0.0,0.0,0.0,197.0,416.0,570.0,573.0
4,Baldwin,735.0,1659.0,387.0,4171.0,1956.0,919.0,3527.0,1002.0,6638.0,...,0.0,0.0,0.0,0.0,0.0,0.0,2244.0,4731.0,6586.0,6623.0


In [None]:
print('Number of unique contest+candidates in SOS statewide CSV: ' + str(sos_totals['VEST'].nunique()))
print('Number of unique contest+candidates in SOS counties XML: ' + str(sos_county_pvt['VEST'].nunique()))
print('Number of unique contests+candidates in RDH compiled precincts XML: ' + str(ga_22_primary_sw['VEST'].nunique()))
print('Number of unique contests in RDH compiled precincts XML: ' + str(ga_22_primary_sw['contest'].nunique()))
print('Number of unique candidates in RDH compiled precincts XML: ' + str(ga_22_primary_sw['choice'].nunique()))

In [None]:
ga_22_primary_sw['VEST'] = ga_22_primary_sw['pivot'].apply(lambda x: get_VEST(str(x).strip()))

In [None]:
set((sos_totals['VEST'].unique())) - set((sos_county_pvt['VEST'].unique()))

In [None]:
set(ga_22_primary_sw_pvt['VEST'].unique()) - set((sos_totals['VEST'].unique()))

<p><a name="readme"></a></p>

## Create README

## Export Dataset

double checking summary csv

In [123]:
check_state = pd.read_csv("./raw-from-source/summary/summary.csv")

In [130]:
check_sos = pd.concat([primary_sos_totals])

In [131]:
sos_totals = create_pivot_col(check_sos, 'choice name', 'contest name', 'pivot_col')

In [134]:
sos_totals.

(803, 13)

In [137]:
choice_list = df_county['choice'].to_list()

In [139]:
'Darlene Taylor' in choice_list
'Darlene Taylor (I)' in choice_list

True

In [160]:
df173 = df_county[df_county['choice'] == 'Darlene Taylor (I)']

In [161]:
df173

Unnamed: 0,state,contest,choice,voting_method,county,num_votes,type
53692,GA,State House of Representatives - District 173 ...,Darlene Taylor (I),Election Day Votes,Grady,969,primary
53693,GA,State House of Representatives - District 173 ...,Darlene Taylor (I),Absentee by Mail Votes,Grady,43,primary
53694,GA,State House of Representatives - District 173 ...,Darlene Taylor (I),Advanced Voting Votes,Grady,716,primary
53695,GA,State House of Representatives - District 173 ...,Darlene Taylor (I),Provisional Votes,Grady,2,primary


In [154]:
ga_22_primary_sw_pvt[ga_22_primary_sw_pvt['county'] == 'Pickens'][]

pivot,UNIQUE_ID,county,COUNTYFP,precinct,PSU38DPET,PSL071DOKA,PSL168DWIL,PSL153DWYN,PSL033RPOW,PCON10RSIM,...,PSL126RHAR,PSL074DHAR,PSL088RFRE,P22LABDBOD,PSL044DOYO,P22AGRDDUK,PSL079DNEA,PCON07RNYG,PSL101RPRO,PSL140DBAK
2266,227-Appalachian,Pickens,227,Appalachian,0,0,0,0,0,0,...,0,0,0,13,0,10,0,0,0,0
2267,227-Hill City,Pickens,227,Hill City,0,0,0,0,0,0,...,0,0,0,8,0,11,0,0,0,0
2268,227-Hinton,Pickens,227,Hinton,0,0,0,0,0,0,...,0,0,0,1,0,5,0,0,0,0
2269,227-Jasper,Pickens,227,Jasper,0,0,0,0,0,0,...,0,0,0,9,0,12,0,0,0,0
2270,227-Jerusalem,Pickens,227,Jerusalem,0,0,0,0,0,0,...,0,0,0,1,0,5,0,0,0,0
2271,227-Ludville,Pickens,227,Ludville,0,0,0,0,0,0,...,0,0,0,4,0,3,0,0,0,0
2272,227-Nelson,Pickens,227,Nelson,0,0,0,0,0,0,...,0,0,0,11,0,8,0,0,0,0
2273,227-Refuge,Pickens,227,Refuge,0,0,0,0,0,0,...,0,0,0,6,0,8,0,0,0,0
2274,227-Sharptop,Pickens,227,Sharptop,0,0,0,0,0,0,...,0,0,0,11,0,13,0,0,0,0
2275,227-Talking Rock,Pickens,227,Talking Rock,0,0,0,0,0,0,...,0,0,0,7,0,8,0,0,0,0


In [149]:
ga_22_primary_sw_pvt[['county']]

pivot,county,precinct,S22SSCNMCM
0,Appling,1B,460
1,Appling,1C,345
2,Appling,2,451
3,Appling,3A1,299
4,Appling,3C,500
5,Appling,4B,324
6,Appling,4D,533
7,Appling,5A,267
8,Appling,5B,273
9,Atkinson,Axson,164


In [167]:
prec_173 = ga_22_primary_sw_pvt[ga_22_primary_sw_pvt['PSL173RTAY'] != 0]

In [173]:
d173[['county','precinct','PSL173RTAY']]

pivot,county,precinct,PSL173RTAY
1663,Grady,Cairo 4th District,120
1664,Grady,Cairo 5th District,407
1665,Grady,Duncanville,77
1668,Grady,Midway,328
1669,Grady,Pine Park,160
1674,Grady,Woodland,638
2484,Thomas,Barwick,156
2485,Thomas,Boston,287
2486,Thomas,Central,691
2488,Thomas,Douglass,107


In [175]:
d173.groupby('county')['PSL173RTAY'].agg(sum)

county
Grady     1730
Thomas    5016
Name: PSL173RTAY, dtype: int32