# WAYS demo #2:
## <u>USA Presidential Poll Choropleth</u>

In this demo, we a load presidential polling dataset and generate a choropleth map plot using the `altair` package in Python. We then use the WAYS package, to add meta-visualisation to the plots and interactive widgets.

### Install requirements

In [None]:
%%capture
# capture suppresses output of the below:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install -r cartopy

### Import WAYS and required packages for this notebook

In [None]:
import geopandas as gpd 
import pandas as pd
import altair as alt

# Import WAYS
import sys; sys.path.insert(0, '..')
from ways_py.ways import meta_hist, WAlt, altair_widgets

### Load data

Here we load two data files, the first is a geometry dataset for the USA by states and the second is polling for candidates in the 2020 presendential election, also by states.

In [None]:
geo_states = gpd.read_file('choropleth_teaching/gz_2010_us_040_00_500k.json')
df_polls = pd.read_csv('choropleth_teaching/presidential_poll_averages_2020.csv')

In [None]:
geo_states.head()

In [None]:
df_polls.head()

Filter our poll data to remove third party candidates:

In [None]:
trump_biden_data = df_polls[
    df_polls['candidate_name'].isin(['Donald Trump', 'Joseph R. Biden Jr.'])
]

Our spatial and poll data have the name of the state in common. We will change the name of the state to NAME to match our geospatial dataframe.

In [None]:
trump_biden_data.columns = ['cycle', 'NAME', 'modeldate', 'candidate_name', 'pct_estimate', 'pct_trend_adjusted']

We can join the geospatial and poll data using the NAME column (the name of the state).

In [None]:
geo_states_trump_biden = geo_states.merge(trump_biden_data, on='NAME')

Now we have a geographical polling datasets that look like this:

In [None]:
geo_states_trump_biden.head()

## Plotting the data

Here is an example of a choropleth plot we can generate with the `altair` plotting package:

In [None]:
# Get Biden's data for a specific date
biden_march = geo_states_trump_biden[
    (geo_states_trump_biden.modeldate == '11/03/2020')
][
    (geo_states_trump_biden.candidate_name == 'Joseph R. Biden Jr.')
]

### Create a choropleth with Altair

In [None]:
# Choose some options for color binning:
color_obj = alt.Color('pct_estimate',
                      bin=alt.Bin(maxbins=10, extent=[0,100]),
                      scale=alt.Scale(scheme='blues')
                     )
# Plot a choropleth of the US states with his poll percentage mapped to the color
biden_choropleth = alt.Chart(biden_march, title='Poll estimate for Biden on 11/03/2020').mark_geoshape().encode(
    color=color_obj,
    tooltip=['NAME', 'pct_estimate']
).properties(
    width=500,
    height=300
).project(
    type='albersUsa'
)
display(biden_choropleth)

## WAYS

### Create choropleth plot function, decorated with WAYS meta histogram

The meta histogram clearly shows the color binning options chosen.

In [None]:
@meta_hist # decorators modify the output of usa_choro when called to be more than just the return
def usa_choro(data):
    # We'll display Biden's data for 11/03/2020 again
    chart = alt.Chart(data, title='Poll estimate for Biden on 11/03/2020').mark_geoshape()
    chart = chart.encode(
        alt.Color('pct_estimate',
                      bin=alt.Bin(maxbins=10, extent=[0,100]),
                      legend=None,
                      scale=alt.Scale(scheme='blues')
                     ),
        tooltip=['NAME', 'pct_estimate']
    ).properties(
        width=500,
        height=300
    ).project(
        type='albersUsa'
    )
    return chart

In [None]:
display(usa_choro(biden_march))

## Adding interactivity with Jupyter interact widgets

Here the user working on the the US presidential poll choropleth visualisation has options related to the colour-binning controlled via widgets defined in WAYS.

Re-create the chart function, this time with widgets:

In [None]:
@altair_widgets() # will modify the ratings_scatter func to take args: data, column, title
@meta_hist
def usa_choro(data, color):
    chart = alt.Chart(data, title='Poll estimate for Biden on 11/03/2020').mark_geoshape()
    chart = chart.encode(
        color, # Add color as an arg, widgets will pass in an alt.Color object to the function
        tooltip=['NAME', 'pct_estimate']
    ).properties(
        width=500,
        height=300
    ).project(
        type='albersUsa' # TODO: use this choice to determine which scale/color options get included in dropdown
    )
    return chart

In [None]:
usa_choro(biden_march, 'pct_estimate')

## Create custom Jupyter widgets to control which parts of the data to visualise

In [None]:
from ipywidgets import widgets
import datetime

# Simple dropdown to switch between Biden and Trump's data
candidate = widgets.Dropdown(value='Biden', options=['Trump', 'Biden'], description = 'Candidate')

# Get an ordered list of the dates (as strings) on which polling occured
unsorted_datestrings = list(set(list(geo_states_trump_biden['modeldate'])))
dates = sorted(unsorted_datestrings, key=lambda x: datetime.datetime.strptime(x, '%m/%d/%Y'))

# Choose the polling date to visualise
date = widgets.SelectionSlider(value='11/03/2020', options=dates, description='Date', continuous_update=False)

# create dictionary for widgets (we'll need this later)
data_widgets = {
        'candidate': candidate,
        'date': date
}

### Display the interactive plot with the added data widgets

In [None]:
@altair_widgets(custom_widgets=data_widgets) # will modify the ratings_scatter func to take args: data, column
@meta_hist
def usa_choro(data, color):
    # Add in some logic for the defined widgets:
    # Select the data for the candidate chosen
    if candidate.value == 'Trump':
        data = data[(data.candidate_name=='Donald Trump')]
    elif candidate.value == 'Biden':
        data = data[(data.candidate_name=='Joseph R. Biden Jr.')]

    # Choose which polling date to display
    data = data[
        (data.modeldate == date.value)
    ]

    # Give the choropleth plot a title
    title = 'Poll estimate for ' + candidate.value + ' on ' + date.value
    
    
    chart = alt.Chart(data, title='Poll estimate for Biden on 11/03/2020').mark_geoshape()
    chart = chart.encode(
        color,
        tooltip=['NAME', 'pct_estimate']
    ).properties(
        width=500,
        height=300
    ).project(
        type='albersUsa' # TODO: use this choice to determine which scale/color options get included in dropdown
    )
    return chart

In [None]:
usa_choro(geo_states_trump_biden, 'pct_estimate')

Notes:
1. Setting the scale to `log` doesn't work unless `bin` is **unchecked**
2. Because the `extent` data range plugs into the `alt.Bin` object, this will only work if `bin` is **checked**

TODO:
1. Add a way of choosing the number of colors for the color range