# WAYS: What Aren't You Seeing?
## <u>A demo of the WAYS philosophy and python package</u>

In this demo, we demonstrate the WAYS philosophy for exploring the decisions made during the process of plotting a dataset, via the WAYS Python package, developed at The Alan Turing Institute.

In this notebook, we load polling dataset from the 2020 USA presidential race and generate a choropleth map plot using the `altair` package in Python. We then use the WAYS package, to add meta-visualisation to the plots and interactive widgets. These features encourage the developer of the plot to question and evaluate the design choices made and the story being told, in advance of sharing their plot more widely.

### Install requirements for this notebook

In [None]:
%%capture
!poetry install

### Import WAYS and required packages for this notebook

In [None]:
import geopandas as gpd 
import pandas as pd
import altair as alt

# Import function that loads data and pre-processes
from process_USA_polling import get_choropleth_data 
import sys; sys.path.insert(0, '..')
from ways_py.ways import altair_color_viz, AltairColorWidgets, altair_color_widgets

### Load presidential polling dataset

The dataset we plot here has been created from a polls taken in the 2020 USA presidential race, for candidates Joe Biden and Donald Trump, along with geospatial data for the United States.

In [None]:
geo_states_trump_biden = get_choropleth_data()

In [None]:
geo_states_trump_biden.head()

In the first instance, we'll look at Biden's polling data for particular date and use that to plot a choropleth map plot.

In [None]:
# Get Biden's data for a specific date
biden_march = geo_states_trump_biden[
    (geo_states_trump_biden.modeldate == '11/03/2020')
][
    (geo_states_trump_biden.candidate_name == 'Joseph R. Biden Jr.')
]

## Plotting the data

Here is an example of a choropleth plot to visualise this data that can be generated with the `altair` plotting package, showing Biden's poll percentage estimate for each state. Note the choices made in the code here. Is this the most informative way of visualising this data?

In [None]:
biden_choropleth = alt.Chart(biden_march, title='Poll estimate for Biden on 11/03/2020').mark_geoshape().encode(
    alt.Color('pct_estimate'),
    tooltip=['NAME', 'pct_estimate']
).properties(
    width=500,
    height=300
).project(
    type='albersUsa'
)
display(biden_choropleth)

## How could this plot be improved?

One modification we could make to the plot above would be to to acknowledge the fact we are plotting percentages and extend the colour scheme over a 0-100% range, rather than between the lowest and highest data points.

Another thing could be to choose a colour scale more appropriate for US political polling, such as a red to blue (though for this dataset, the assumption that subtracting Biden's percentage from the total gives you Trump's percentage is incorrect).

Perhaps we'd also like to bin the colour scale rather than keep the continuous scale, so there are a smaller number of colours for the states in the choropleth?

Altair gives us the option to do all of these things:

In [None]:
biden_choropleth = alt.Chart(biden_march, title='Poll estimate for Biden on 11/03/2020').mark_geoshape().encode(
    alt.Color('pct_estimate',
              bin=alt.Bin(maxbins=10, extent=[0,100]),
              scale=alt.Scale(scheme='redblue')
              ),
    tooltip=['NAME', 'pct_estimate']
).properties(
    width=500,
    height=300
).project(
    type='albersUsa'
)
display(biden_choropleth)

## but... What Aren't You Seeing?

As we continue to work on this plot, the changes we make will influence how the information the plot seeks to convey is interpreted by the viewer. If our goal is to represent the data in the most informative way we can think of, it's possible that more iterations will be needed before we reach that point. What can we do to help us decide what the best options are for the various parameters of the plotting package?

With the development of the WAYS (What Aren't You Seeing) Python package, we have attempted to implement some tools that will help the data scientists and other professionals creating such plots to understand the decisions they are making when setting plot parameters.

We have focussed this proof-of-concept package in on the specific case of exploring the colour scale choices for plots in `Altair`, for which it can be used with our USA polling choropleth example.

## Creating a choropleth plot function, decorated with WAYS "metavisualisation"

Lets create a function out of our choropleth chart code and use the `altair_color_viz` decorator, which provides a more in-depth look at how our colour scale is being used in the plot than the default legend does.

As you can see from the histogram metavisualisation added to the left of the original plot, much of the colour scale is currently unused. Would seeing this information make you recondiser the choices made for colour scale and binning used by the plot?

In [None]:
@altair_color_viz 
def usa_choro(data):
    chart = alt.Chart(data, title='Poll estimate for Biden on 11/03/2020').mark_geoshape()
    chart = chart.encode(
        alt.Color('pct_estimate',
              bin=alt.Bin(maxbins=10, extent=[0,100]),
              scale=alt.Scale(scheme='redblue'),
              legend=None
              ),
        tooltip=['NAME', 'pct_estimate']
    ).properties(
        width=500,
        height=300
    ).project(
        type='albersUsa'
    )
    return chart
# We'll display Biden's data for 11/03/2020 again
display(usa_choro(biden_march))

## Exploring the possibilities for the colour scale via WAYS

In order to explore further what the best options are for the plot parameters, we could spend a lot of time editing the plot code in our notebook, each time re-running (or duplicating) the code cell to compare and contrast different options.

The WAYS package gives us the option to load some pre-defined Jupyter widgets that control some of the key paramaters for the `altair.Color` object that we use in our plot. 

Let's re-create the choropleth function, this time with the `altair_widgets` decorator, which gives us some interactivity to explore Altair's colour scale options:

In [None]:
@altair_color_widgets() # will modify the usa_choro func to take args: data, column
@altair_color_viz
def usa_choro(data, color):
    chart = alt.Chart(data, title='Poll estimate for Biden on 11/03/2020').mark_geoshape()
    chart = chart.encode(
        color, # Add color as an arg, widgets will pass in an alt.Color object to the function
        tooltip=['NAME', 'pct_estimate']
    ).properties(
        width=500,
        height=300
    ).project(
        type='albersUsa'
    )
    return chart
usa_choro(biden_march, 'pct_estimate')

Take some time to explore these options for the colour scale and find a combination that seems like the most informative way to represent the data. These widgets do not cover the full scope of Altair's functionality, but do you think that as the creator of this plot you would have explored as much of Altair's functionality without them?

The idea here is to expose functionality that would otherwise only be noticed if the plot creator took the time to trawl through the plotting package's documentation.

## Create custom Jupyter widgets to control which parts of the data to visualise

To finish off this demo of the WAYS Python package, we'll create some additional Jupyter widgets to control which parts of the full dataset to visualise.

In [None]:
from ipywidgets import widgets
import datetime

# Simple dropdown to switch between Biden and Trump's data
candidate = widgets.Dropdown(value='Biden', options=['Trump', 'Biden'], description = 'Candidate')

# Get an ordered list of the dates (as strings) on which polling occured
unsorted_datestrings = list(set(list(geo_states_trump_biden['modeldate'])))
dates = sorted(unsorted_datestrings, key=lambda x: datetime.datetime.strptime(x, '%m/%d/%Y'))

# Choose the polling date to visualise
date = widgets.SelectionSlider(value='11/03/2020', options=dates, description='Date', continuous_update=False)

# create dictionary for widgets (we'll need this later)
data_widgets = {
        'candidate': candidate,
        'date': date
}

These data widgets can be passed into the `altair_widgets` decorator function, which will automatically add them into the widget grid. This gives us the possibility to try out the different colour scale options on both Biden and Trump's polling data and switch between different polling dates.

In [None]:
@altair_color_widgets(custom_widgets=data_widgets) # will modify ratings_scatter func to take args: data, column
@altair_color_viz
def usa_choro(data, color):
    # Add in some logic for the defined widgets:
    # Select the data for the candidate chosen
    if candidate.value == 'Trump':
        data = data[(data.candidate_name=='Donald Trump')]
    elif candidate.value == 'Biden':
        data = data[(data.candidate_name=='Joseph R. Biden Jr.')]

    # Choose which polling date to display
    data = data[
        (data.modeldate == date.value)
    ]

    # Give the choropleth plot a title
    title = 'Poll estimate for ' + candidate.value + ' on ' + date.value
    
    
    chart = alt.Chart(data, title=title).mark_geoshape()
    chart = chart.encode(
        color,
        tooltip=['NAME', 'pct_estimate']
    ).properties(
        width=500,
        height=300
    ).project(
        type='albersUsa' # TODO: use this choice to determine which scale/color options get included in dropdown
    )
    return chart
usa_choro(geo_states_trump_biden, 'pct_estimate')