# U.S. Presidential Election–Choropleth

## <span style='color:red'>NOTE: In order to upload this file to GitHub, I had to supress the outputs of the choropleth plots to reduce the file size. Please feel free to download and run on your own.</span>

* **Author:** Brian P. Josey
* **Date Created:** 2021-01-06
* **Date Modified:** 2021-05-06
* **Language:** Python 3.8.3

This is a quick data visualization script that generates choropleths of the vote margins from each of the U.S. Presidential Elections. A positive margin means that the Republican candidate recieved more votes than the Democratic candidate. These visualizations make extensive use of the `.choropleth()` method from Plotly express, for which a good tutorial is located [here](https://plotly.com/python/choropleth-maps/).

In [1]:
# Import essential packages
import numpy as np
import pandas as pd
from urllib.request import urlopen
import json

# Data visualization and plotting
import matplotlib.pyplot as plt
import plotly.express as px

# Filter warnings
import warnings
warnings.filterwarnings('ignore')

## <span style='color:blue'>Choropleth for the 2020 Election</span>

The vote margins were calculated in the "Data Wrangling" notebook, where I defined them as

$$
\mathrm{Margin} = \frac{V_{\mathrm{GOP}} - V_{\mathrm{DNC}}}{V_{\mathrm{GOP}} + V_{\mathrm{DNC}}}
$$

where $V_{\mathrm{GOP}}$ is the number of votes for the Republican candidate and $V_{\mathrm{DNC}}$ is for the Democratic candidate. Note that I discounted third-party candidates because the last election where a third party candidate had a significant impact was when Ross Perot siphoned votes from George H.W. Bush allowing Bill Clinton to win in 1992, which is outside the scope of this project.

First I will create a choropleth for the 2020 election as a way to troubleshoot the process, and then generalize it for any election.

In [2]:
# Read Data
vote_margin = pd.read_csv('../data/processed/margins.csv')

# Append zeroes to FIPS codes
vote_margin['fips'] = vote_margin['fips'].map("{:05}".format)

vote_margin.sample(15)

Unnamed: 0,fips,margin_08,margin_12,margin_16,margin_20
208,39041,0.196124,0.237195,0.160598,0.068383
177,47077,0.42902,0.487713,0.62151,0.641015
1336,48113,-0.153006,-0.154164,-0.262413,-0.316285
2060,48227,0.462793,0.581825,0.55994,0.584473
265,48051,0.374194,0.459489,0.550756,0.57556
2493,51735,0.492711,0.511834,0.490582,0.451954
2944,45033,-0.114333,-0.162748,-0.016672,0.011144
2899,18135,0.087805,0.240479,0.484793,0.523848
1126,21235,0.476576,0.577671,0.671631,0.652188
2626,47137,0.348933,0.407195,0.568417,0.633231


In [3]:
# Create Choropleth for a single election
# Load map of counties
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)


# Plot election results

## COMMENTED OUT BELOW HERE!
#fig = px.choropleth(vote_margin, geojson = counties, locations = "fips", color = "margin_20",
#                   color_continuous_scale="bluered",
#                   range_color = (-1, 1),
#                   scope = "usa",
#                   labels = {"margin_20":"Margin"}
#                   )
#fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
#fig.show()

There are four blank spots on the map that represent political or geopolitical anomalies. Two of the spots, the convex hexagon in Utah and the square in Florida, are the Great Salt Lake and Lake Okeechobee, respectively, which Plotly treats as being separate from the counties that contain them. The other blank spot in the contiguous United States is Oglala Lakota County, SD. This county changed its name from Shannon County in 2015 to reflect its inhabitants; the entire county is within the Pine Ridge Indian Reservation and was renamed after a band of the Sioux that live there. With the name change, Oglala Lakota County recieved a new FIPS code (46102) to replace the one for Shannon County (46113). Of the three raw datasets, only the one from 2020 reflects this change while the other two use the old name and FIPS code. It would be valuable to trace the data in the other notebooks to correct this issue, which I will return to later.

The final blank spot is the entire state of Alaska. At 17.9% of the area of the entire United States, this is no small error. The source of this error is the unique way the state reports its election results. While most states tally their votes in each county, they do it based on the districts in the Alaska House of Representatives. These districts do not correspond with their county-equivalent buroughs, so organizing the data by FIPS code is impossible. Instead, it requires the more onerous task of creating a new choropleth and feature engineering to reflect the districts.

Whelp, I better get cracking! But first, I'll generalize the choropleth to a function.

In [4]:
def plot_choropleth(year='20'):
    '''
    Function plots a county-level choropleth of the vote margin for
    the US Presidential Election in the specified year.
    
    Args:
        year (int): The last two digits of the election year
    Outputs:
        figure: interactive choropleth of the county-level margins
    '''
    # Load data into a dataframe
    margin = pd.read_csv('../data/processed/margins.csv')
    margin['fips'] = margin['fips'].map('{:05}'.format)
    
    # Load map of counties
    with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
        counties = json.load(response)
    
    # Create plot
    fig = px.choropleth(margin, geojson=counties, locations='fips',
                       color=f'margin_{year}', color_continuous_scale='bluered',
                       range_color=(-1,1), scope='usa',
                       labels = {f'margin_{year}':'Margin'}
                       )
    fig.update_layout(margin={'r':0, 't':0, 'l':0,'b':0})
    fig.show()
    

In [5]:
#plot_choropleth(year='08')

## <span style='color:blue'>Takeaways and Next Steps</span>

Creating a choropleth for county-level data is fairly straight-forward is you use the Plotly library. This type of plot has one major issue, however, votes are loosely based on population but a choropleth plots the data based on size. This is deceptive because Democratic candidates tend to do better in urban environments than in rural ones. For instance, Nevada in the 2020 choropleth is primarily red, but most of its population lives in Clark County. The state voted for Joe Biden in 2020 because 2.3 million of Nevada's 3.1 million people live in Clark County, which had a margin of about 10% in favor for Biden. But this is a well known issue with these types of maps, and nothing new.

What is new is that Oglala Lakota County and Alaska need special treatment before they can be accurately represented in the plots. Dealing with Oglala Lakota County will be as simple as matching its current FIPS code with its predecessor. How I will address the Alaska data, which is keyed to both FIPS codes and districts in the House of Representatives is still up in the air.