## Data Visualization Strategies  - Harvard DataFest 2021

#### Elizabeth Piette, PhD MPH
#### Research Computing Services, Harvard Business School

This session will provide a brief introduction to plotting with seaborn and plotly to visualize the temporal and spatial trends in the COVID-19 case data.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context("poster")
import plotly.express as px
import ipywidgets as widgets
from ipywidgets import interact

### Visualizing case rates over time with line plots

In [2]:
# read in data cleaned previously



In [3]:
# line plot of covid case rates 



In [4]:
# grid of covid case rates by state



In [5]:
# interactive plot of covid case rates



In [6]:
# interactive plots of covid case rates with a dropdown menu for selecting state



### Visualizing case rates by state with chloropleth maps

In [7]:
# read previously cleaned weekly dataset



In [8]:
# chloropleth maps in plotly take two-letter states codes, so here's a dictionary of names to abbreviations

state_codes = {'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA',
               'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'District of Columbia': 'DC', 'Florida': 'FL',
               'Georgia': 'GA', 'Hawaii': 'HI', 'Idaho': 'ID', 'Illinois': 'IL', 'Indiana': 'IN',
               'Iowa': 'IA', 'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME',
               'Maryland': 'MD', 'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS',
               'Missouri': 'MO', 'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV', 'New Hampshire': 'NH',
               'New Jersey': 'NJ', 'New Mexico': 'NM', 'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND',
               'Ohio': 'OH', 'Oklahoma': 'OK', 'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI',
               'South Carolina': 'SC', 'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT',
               'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV', 'Wisconsin': 'WI',
               'Wyoming': 'WY'}




In [9]:
# interactive chloropleth map - let's just look at the most recent week for now



In [10]:
# interactive chloropleth map with a slider for selecting the week



In [11]:
# animated chloropleth map



### Visualizing relationships between variables

In [12]:
# read in data with demographic info



In [13]:
# let's focus on the most recent week



In [14]:
# interactive scatter plot of population count vs cumulative cases



In [15]:
# animated interactive scatter plot of population count vs cumulative cases over time



In [16]:
# unfortunately, our data set is rather lacking in binary and categorical variables
# let's create some just so we can practice using them to add more information to plots

# we'll create a variable for 'older' vs. 'younger' states based on the median of percent_age65over



In [17]:
# animated interactive scatter plot of population count vs cumulative cases over time, 'older' vs. 'younger' states



In [18]:
# let's also create a categorical variable grouping states into regions. here's a handy dictionary

state_regions = {'Alabama': 'south', 'Alaska': 'west', 'Arizona': 'west', 'Arkansas': 'south', 'California': 'west',
               'Colorado': 'west', 'Connecticut': 'northeast', 'Delaware': 'south', 'District of Columbia': 'south', 'Florida': 'south',
               'Georgia': 'south', 'Hawaii': 'west', 'Idaho': 'west', 'Illinois': 'midwest', 'Indiana': 'midwest',
               'Iowa': 'midwest', 'Kansas': 'midwest', 'Kentucky': 'south', 'Louisiana': 'south', 'Maine': 'northeast',
               'Maryland': 'south', 'Massachusetts': 'northeast', 'Michigan': 'midwest', 'Minnesota': 'midwest', 'Mississippi': 'south',
               'Missouri': 'midwest', 'Montana': 'west', 'Nebraska': 'midwest', 'Nevada': 'west', 'New Hampshire': 'northeast',
               'New Jersey': 'northeast', 'New Mexico': 'west', 'New York': 'northeast', 'North Carolina': 'south', 'North Dakota': 'midwest',
               'Ohio': 'midwest', 'Oklahoma': 'south', 'Oregon': 'west', 'Pennsylvania': 'northeast', 'Rhode Island': 'northeast',
               'South Carolina': 'south', 'South Dakota': 'midwest', 'Tennessee': 'south', 'Texas': 'south', 'Utah': 'west',
               'Vermont': 'northeast', 'Virginia': 'south', 'Washington': 'west', 'West Virginia': 'south', 'Wisconsin': 'midwest',
               'Wyoming': 'west'}



In [19]:
# let's look at the distributions of case rates by region for the most recent week



In [20]:
# we can also combine scatter plots with marginal distribution plots to produce a very information-rich graphic



### Practice session

For the remainder of the session, we will practice making our own visualizations. At the end, we will regroup to share what we've made and discuss any questions that may have arisen. 

Here is a bit of inspiration - if we look at line plots of case rates grouped by region, we can clearly see differences in the case rates and timing of surges by region.

In [21]:
# interactive plots of covid case rates with a dropdown menu for selecting regions

