# C11 get C11 voter data

For the city-only data, I hand'cleaned the file.
* I removed these partial cities: Boonton NJ (34027066, Chatham NJ (34027121, Mendham NJ (34027453, Rockaway NJ (34027640

I cleaned up placenames (all corrections were done in the voter data preprocessing so we could reuse the C11 lists): these places were in the C11 list, but not on the list of towns in the votes file: 
* ESSEX: totally clean
* MORRIS: Missing Hanover, Jefferson, Morris, Victory Gardens.  No idea why; will look for them later. 
* PASSAIC: totally clean
* SUSSEX: Missing Byram.  No idea why; will look for it later.

Totals for all parties: 506961
* Found 199255 people in Essex
* Found 187874 people in Morris
* Found 88037 people in Passaic
* Found 31795 people in Sussex

Totals for just DEMs: 160993
* Found 83302 people in Essex
* Found 47634 people in Morris
* Found 23301 people in Passaic
* Found 6756 people in Sussex

In [1]:
import pandas as pd
import csv
import zipfile
import os
import glob
import csv 
import math
import njvotes
pd.set_option("display.max_columns", 999)


# Get data headers
nametypes = njvotes.get_voteheadertypes()

In [12]:
# Set these flags to change what comes out.  
parties = 'DEM' # Parties = "DEM" gives only democrats; ditto "REP" for republicans
areas = 'town' # Areas=town gives only towns (that's the only option at the moment)
phones = '10digit' # Set to 'all' if you want all phone numbers; 10 if you want only 10digit ones

# Create master list of name, phone number, county for this one district
dfcities = pd.read_csv('../2017 Other data/ncjd-2011-district11cities-handcleaned.csv')
dfcities['City'] = dfcities['City'].str.title().str.strip()

allvoters = pd.DataFrame([])
for county in dfcities['County'].unique():
    
    # Get voting data from file
    print('Getting voter data for {}'.format(county))
    df = pd.read_csv('../2017 Voting data/'+county.upper()+'/ElectionHistory_cleaned.csv', 
                     dtype=nametypes)
    df = df.fillna('')
    df['county'] = county.title()

    # Only keep data for the cities we're interested in
    getcities = dfcities[dfcities['County'] == county]['City'].unique().tolist()
    df = df[df['city'].isin(getcities)]
    print('{}. Looking for these cities: {}\nFound: {}'.format(county, getcities, df['city'].unique()))
    
    # Only keep one record per voter (we don't care about how they voted at this point)
    df.drop_duplicates(subset='voter id', inplace=True, keep='last')
    print('{} Found {} people'.format(county, len(df)))

    # Do any asked-for filtering
    if parties != 'all':
        df = df[df['party code'] == parties]

    if phones == '10digit':
        df.loc[df['phone number'].str.len() < 10, ['phone number']] = ''
    
    # Format and add to stack
    df = njvotes.format_for_nationbuilder(df)
    allvoters = allvoters.append(df)

outfile = '../2017 voter data/C11voters_{}parties_{}area_{}phone.csv'.format(parties, areas, phones)
allvoters.to_csv(outfile, index=False)

Getting voter data for Essex
Essex. Looking for these cities: ['Bloomfield', 'Caldwell', 'Cedar Grove', 'Essex Fells', 'Fairfield', 'Livingston', 'Montclair', 'North Caldwell', 'Nutley', 'Roseland', 'Verona', 'West Caldwell', 'West Orange']
Found: ['Bloomfield' 'Caldwell' 'Cedar Grove' 'Essex Fells' 'Fairfield'
 'Livingston' 'Montclair' 'North Caldwell' 'Nutley' 'Roseland' 'Verona'
 'West Caldwell' 'West Orange']
Essex Found 199256 people
Getting voter data for Morris
Morris. Looking for these cities: ['Boonton', 'Butler', 'Chatham', 'Denville', 'East Hanover', 'Florham Park', 'Hanover', 'Harding', 'Jefferson', 'Kinnelon', 'Lincoln Park', 'Madison', 'Mendham', 'Montville', 'Morris', 'Morris Plains', 'Morristown', 'Mountain Lakes', 'Parsippany-Troy Hills', 'Pequannock', 'Randolph', 'Riverdale', 'Rockaway', 'Victory Gardens']
Found: ['Boonton' 'Denville' 'Butler' 'Kinnelon' 'Chatham' 'Morristown' 'Madison'
 'Randolph' 'Mendham' 'Rockaway' 'Mountain Lakes' 'Morris Plains'
 'East Hanover' 

# Code used to check and clean for code above

In [13]:
dfcities = pd.read_csv('../2017 Other data/ncjd-2011-district11cities-handcleaned.csv')
counties = dfcities['County'].unique()
print('All counties in C11: {}'.format(counties))

for county in counties:
    
    # Show data for this county
    print('{}'.format(dfcities[dfcities['County'].isin([county])]))
    
    # Get list of towns in C11 for this county
    c11towns = dfcities[dfcities['County'] == 'Sussex']['City'].unique().tolist()
    print('{} towns in C11: {}'.format(county, c11towns))
    
    # Get list of towns in this dataset
    dfcounty = pd.read_csv('../2017 Voting data/'+county.upper()+'/ElectionHistory_cleaned.csv', 
                           dtype=nametypes)
    towns = dfcounty['city'].astype(str).unique().tolist()
    towns.sort()
    print('{} towns in voting data: {}'.format(county, towns))

All counties in C11: ['Essex' 'Morris' 'Passaic' 'Sussex']
    District County             City  Population
0         11  Essex       Bloomfield       22835
1         11  Essex         Caldwell        7822
2         11  Essex      Cedar Grove       12411
3         11  Essex      Essex Fells        2113
4         11  Essex        Fairfield        7466
5         11  Essex       Livingston       29366
6         11  Essex        Montclair       11299
7         11  Essex   North Caldwell        6183
8         11  Essex           Nutley       28370
9         11  Essex         Roseland        5819
10        11  Essex           Verona       13332
11        11  Essex    West Caldwell       10759
12        11  Essex      West Orange       28085
Essex towns in C11: [' Byram', ' Hopatcong', ' Ogdensburg', ' Sparta', ' Stanhope']
Essex towns in voting data: ['Avenel', 'Belleville', 'Bloomfield', 'Caldwell', 'Cedar Grove', 'Clark', 'Columbia', 'Denville', 'East Orange', 'Eatontown', 'Elizabeth', 'En

In [11]:
df2 = pd.read_csv('../2017 Voting data/MORRIS/ElectionHistory_cleaned.csv', dtype=nametypes)
print('{}'.format(df2.columns))
df2.head()

Index(['voter id', 'status code', 'party code', 'last name', 'first name',
       'middle name', 'prefix', 'suffix', 'sex', 'street number', 'suffix a',
       'suffix b', 'street name', 'apt/unit no.', 'address line 1',
       'address line 2', 'city', 'state', 'zip5', 'zip4',
       'mailing street number', 'mailing suffix a', 'mailing suffix b',
       'mailing street name', 'mailing apt/unit no.', 'mailing address line 1',
       'mailing address line 2', 'mailing city', 'mailing state',
       'mailing country', 'mailing zip code', 'birth date', 'date registered',
       'county precinct', 'municipality', 'ward', 'district', 'phone number',
       'election date', 'election name', 'election type', 'election category',
       'ballot type'],
      dtype='object')


Unnamed: 0,voter id,status code,party code,last name,first name,middle name,prefix,suffix,sex,street number,suffix a,suffix b,street name,apt/unit no.,address line 1,address line 2,city,state,zip5,zip4,mailing street number,mailing suffix a,mailing suffix b,mailing street name,mailing apt/unit no.,mailing address line 1,mailing address line 2,mailing city,mailing state,mailing country,mailing zip code,birth date,date registered,county precinct,municipality,ward,district,phone number,election date,election name,election type,election category,ballot type
0,116002492,A,REP,Afonso,Angela,,,,N,3,,,Park Rd,,,,Boonton Township,NJ,7005,,,,,,,,,,,,,04/22/1964,08/10/2004,11600040,Boonton Township,0,1,9733165874,11/06/2007,State-General Election,GEN,S,M
1,116002492,A,REP,Afonso,Angela,,,,N,3,,,Park Rd,,,,Boonton Township,NJ,7005,,,,,,,,,,,,,04/22/1964,08/10/2004,11600040,Boonton Township,0,1,9733165874,11/04/2008,General Election,GEN,S,M
2,116002492,A,REP,Afonso,Angela,,,,N,3,,,Park Rd,,,,Boonton Township,NJ,7005,,,,,,,,,,,,,04/22/1964,08/10/2004,11600040,Boonton Township,0,1,9733165874,04/15/2008,State-School Election,ANS,S,M
3,116002492,A,REP,Afonso,Angela,,,,N,3,,,Park Rd,,,,Boonton Township,NJ,7005,,,,,,,,,,,,,04/22/1964,08/10/2004,11600040,Boonton Township,0,1,9733165874,11/03/2009,State General Election,GEN,S,M
4,116002492,A,REP,Afonso,Angela,,,,N,3,,,Park Rd,,,,Boonton Township,NJ,7005,,,,,,,,,,,,,04/22/1964,08/10/2004,11600040,Boonton Township,0,1,9733165874,04/20/2010,State-School Election,ANS,S,M
