# Election 2016
___
2018 | Bernard Kung
___

A fun exploration of 2016 US election turnout data using Voting-Age Population (VAP) and building some choropleths!

Here I explore both the number and percentage of VAP that cast ballots with a vote for President.

### Initializing Workspace
___

In [83]:
import pandas as pd
import numpy as np
import plotly.plotly as py
import plotly.graph_objs as go 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

In [84]:
init_notebook_mode(connected=True) 

### Loading and Cleaning Data
___

Data for this project is from the United States Elections Project [[1](#Sources)]. As a piece of trivia, I realized that Jose Portilla uses the 2012 version of this data in his choropleth lectures in Python for Data Science and Machine Learning Bootcamp on [Udemy](https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/)


When reading in the data:

* Some variables are multi-indexed; only the second header is necessary.
* Numeric columns have the commas filtered out in _read\_csv()_ call.
* State column name is manually added in because of multi-index.
* White spaces are removed from column names.

In [85]:
election_data = pd.read_csv(r'..\data\2016_November_General_Election.csv',
                            header= 1, nrows= 52, thousands=r',')
election_data.rename(columns={'Unnamed: 0':'State'}, inplace= True)
election_data.rename(columns=lambda x: x.replace(' ',''), inplace= True)

In [86]:
election_data.head()

Unnamed: 0,State,StateResultsWebsite,Status,VEPTotalBallotsCounted,VEPHighestOffice,VAPHighestOffice,TotalBallotsCounted(Estimate),HighestOffice,Voting-EligiblePopulation(VEP),Voting-AgePopulation(VAP),%Non-citizen,Prison,Probation,Parole,TotalIneligibleFelon,OverseasEligible,StateAbv
0,United States,,,60.20%,59.30%,54.70%,138846571.0,136700729,230585915,250055734,8.40%,1456032,2254727,508576,3249802,4739596.0,
1,Alabama,http://www.alabamavotes.gov/downloads/election...,Official,59.30%,59.00%,56.30%,2134061.0,2123372,3601361,3770142,2.60%,30627,56700,8138,71084,,AL
2,Alaska,http://www.elections.alaska.gov/results/16GENR/,Official,61.80%,61.30%,57.40%,321271.0,318608,519849,555367,4.30%,5338,7077,2210,11582,,AK
3,Arizona,http://apps.azsos.gov/election/2016/General/Of...,Official,56.20%,55.00%,48.90%,2661497.0,2604657,4734313,5331034,9.50%,38068,76005,7379,88770,,AZ
4,Arkansas,http://results.enr.clarityelections.com/AR/639...,Official,53.10%,52.80%,49.40%,1137772.0,1130635,2142571,2286625,3.80%,17405,28900,23093,56971,,AR


The problem I want to deal with is removing the % sign from entries in columns for turnout and non-citizens (VEPTotalBallotsCounted, VEPHighestOffice, VAPHighestOffice, %Non-citizen). The problem is further exacerbated by NaN values.

My strategy to do so involves:

1. Select the columns needed into a dataframe to improve legibility. 
2. Use _.notnull()_ to avoid NaN entries.
3. Use _.apply()_ to apply _.replace()_ to replace % signs. 

In [87]:
percent_data = election_data[['VEPTotalBallotsCounted',
                              'VEPHighestOffice',
                              'VAPHighestOffice',
                              '%Non-citizen']]

In [88]:
for cols in  percent_data.columns:
    percent_data.loc[percent_data[cols].notnull(),cols] = percent_data.loc[percent_data[cols].notnull(), cols].apply(lambda x: x.replace('%',''))    



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [89]:
percent_data = percent_data.astype(np.float64)

In [90]:
election_data[percent_data.columns] = percent_data

In [91]:
election_data.dtypes

State                              object
StateResultsWebsite                object
Status                             object
VEPTotalBallotsCounted            float64
VEPHighestOffice                  float64
VAPHighestOffice                  float64
TotalBallotsCounted(Estimate)     float64
HighestOffice                       int64
Voting-EligiblePopulation(VEP)      int64
Voting-AgePopulation(VAP)           int64
%Non-citizen                      float64
Prison                              int64
Probation                           int64
Parole                              int64
TotalIneligibleFelon                int64
OverseasEligible                  float64
StateAbv                           object
dtype: object

In [92]:
election_data.head()

Unnamed: 0,State,StateResultsWebsite,Status,VEPTotalBallotsCounted,VEPHighestOffice,VAPHighestOffice,TotalBallotsCounted(Estimate),HighestOffice,Voting-EligiblePopulation(VEP),Voting-AgePopulation(VAP),%Non-citizen,Prison,Probation,Parole,TotalIneligibleFelon,OverseasEligible,StateAbv
0,United States,,,60.2,59.3,54.7,138846571.0,136700729,230585915,250055734,8.4,1456032,2254727,508576,3249802,4739596.0,
1,Alabama,http://www.alabamavotes.gov/downloads/election...,Official,59.3,59.0,56.3,2134061.0,2123372,3601361,3770142,2.6,30627,56700,8138,71084,,AL
2,Alaska,http://www.elections.alaska.gov/results/16GENR/,Official,61.8,61.3,57.4,321271.0,318608,519849,555367,4.3,5338,7077,2210,11582,,AK
3,Arizona,http://apps.azsos.gov/election/2016/General/Of...,Official,56.2,55.0,48.9,2661497.0,2604657,4734313,5331034,9.5,38068,76005,7379,88770,,AZ
4,Arkansas,http://results.enr.clarityelections.com/AR/639...,Official,53.1,52.8,49.4,1137772.0,1130635,2142571,2286625,3.8,17405,28900,23093,56971,,AR


### VAP Highest Office Choropleth
___
This plot shows the percentage of Voter-Age Population who cast ballots for the Presidential Election (VAPHighestOffice).

In [131]:
percent_data = dict(type = 'choropleth',
            colorscale = 'Blues',
            locations = election_data['StateAbv'],
            locationmode= 'USA-states',
            text= election_data['State'],
            z= election_data['VAPHighestOffice'],
            reversescale = True,
            colorbar = {'title': '% VAP'})
percent_layout = dict (geo= {'scope':'usa'}, title= '2016 US Presidential Election Turnout Rate')
percent_choromap = go.Figure(data = [percent_data],layout = percent_layout)

In [132]:
iplot(percent_choromap)

### Voter-Age Person Choropleth
___
By comparison, this plot lays out the actual number of Voter-Age Population by state. 

Since the data contains a 'total' row, United States, that needs to be removed. Optionally, District of Columbia can also be removed. 

In [95]:
remove_rows = ['United States']
election_data2 = election_data[~election_data['State'].isin(remove_rows)]

In [137]:
VAP_data = dict(type = 'choropleth',
            colorscale = 'Reds',
            reversescale = False,
            locations = election_data2['StateAbv'],
            locationmode= 'USA-states',
            text= election_data2['State'],
            z= election_data2['HighestOffice'],
            colorbar = {'title': 'VAP'})
VAP_layout = dict (geo= {'scope':'usa'}, title= '2016 US Presidential Election Turnout Numbers')
VAP_choromap = go.Figure(data = [VAP_data],layout = VAP_layout)

In [138]:
iplot(VAP_choromap)

### Archive Code
___

Original data featured partial multi-index; here the structure is preserved for reference to provide insight into column meanings.

In [98]:
column_key = {'Turnout Rates':['VEPTotalBallotsCounted','VEPHighestOffice','VAPHighestOffice'],
              'Numerators':['TotalBallotsCounted(Estimate)''HighestOffice'],
              'Denominators':['Voting-EligiblePopulation(VEP)','Voting-AgePopulation(VAP)'],
              'VEPComponents':['%Non-citizen','Prison','Probation','Parole','TotalIneligibleFelon','OverseasEligible']}

### Sources 
___
McDonald, Michael P. "2016 November General Election Turnout Rates" United States Elections Project. http://www.electproject.org/2016g



___
2018 | Bernard Kung
___