# Introduction to [Plotly](https://plotly.com/)

Plotly is a versatile interactive plotting package that can be used with Python and Javascript and also through an online editor (without the need for coding). 

## Why/When to use Plotly (my 2 cents)

If you already know Python and you don't really want to learn another coding language, but you do want to create interactive figures (e.g., within a Jupyter notebook and/or for use on a website), you should look into Plotly.  

In particular, [Plotly express](https://plotly.com/python/plotly-express/) is a fantastic tool for generating quick interactive figures without much code.  Plotly express covers a good amount of ground, and you may be able to do all/most your work within Plotly express, depending on your specific needs.  In this workshop, I'll show you Plotly express, but then move beyond it for the majority of the content.  

Though you can do a lot with Plotly, it definitely has limitations (some of which we'll see in this workshop). Also, as with all of the ready-made interactive plot solutions (e.g., [Bokeh](https://docs.bokeh.org/en/latest/), [Altair](https://altair-viz.github.io/), [Glue](https://glueviz.org/), etc.), Plotly has a specific look, which can only be tweaked to a certain extent.  If you like the look well enough and you don't mind the limitations, then it's a good choice. 

##  In this tutorial... 

We will explore the basics of the Python version, using COVID-19 data from the following sources:

- COVID-19 data from the WHO: https://covid19.who.int/info/ 
- GDP Data from the World Bank: https://data.worldbank.org/indicator/NY.GDP.MKTP.CD

I will make two plots, one comparing COVID-19 data to GDPs and another showing COVID-19 data as a function of time.

## Installation

I recommend installing Python using [miniforge](https://github.com/conda-forge/miniforge).  Then you can create and activate a new environment for this workshop by typing the following commands into your (bash) terminal.

```
$ conda create -n plotly-env python=3.9 jupyter pandas plotly statsmodels
$ conda activate plotly-env
```

## Import the relevant packages that we will use.

In [None]:
import pandas as pd
import numpy as np
import scipy.stats

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

## 1. Read in the data.



### 1.1 GDP and vaccine data

I will join multiple data tables together, using the `pandas` package so that I have one DataFrame containing all values for a given country.

In [None]:
# for pandas to read from url
storage_options = {'User-Agent': 'Mozilla/5.0'}

In [None]:
# Current cumulative COVID-19 data from the WHO. 
# dfCT = pd.read_csv('../data/WHO-COVID/WHO-COVID-19-global-table-data.csv') # in case the WHO server goes down
dfCT = pd.read_csv('https://covid19.who.int/WHO-COVID-19-global-table-data.csv', storage_options=storage_options)
dfCT

In [None]:
# Current vaccination data from the WHO
# dfV = pd.read_csv('../data/WHO-COVID/vaccination-data.csv') # in case the WHO server goes down
dfV = pd.read_csv('https://covid19.who.int/who-data/vaccination-data.csv', storage_options=storage_options)
dfV

In [None]:
# Vaccination metadata from the WHO; this file contains the start dates (and end dates) for vaccines for each country. 
# dfVM = pd.read_csv('../data/WHO-COVID/vaccination-metadata.csv') # in case the WHO server goes down
dfVM = pd.read_csv('https://covid19.who.int/who-data/vaccination-metadata.csv', storage_options=storage_options)

# drop columns without a start date
dfVM.dropna(subset = ['START_DATE'], inplace = True)

# convert the date columns to datetime objects for easier plotting and manipulation later on
dfVM['AUTHORIZATION_DATE'] = pd.to_datetime(dfVM['AUTHORIZATION_DATE'])
dfVM['START_DATE'] = pd.to_datetime(dfVM['START_DATE'])
dfVM['END_DATE'] = pd.to_datetime(dfVM['END_DATE'])

# I will simplify this table to just take the earliest start date for a given country
# sort by the start date and country code
dfVM.sort_values(['START_DATE', 'ISO3'], ascending = (True, True), inplace = True)
# take only the first entry for a given country
dfVM.drop_duplicates(subset = 'ISO3', keep = 'first', inplace = True)

dfVM

In [None]:
# GDP data from the World Bank (the first three rows do not contain data)
# I don't think there's a direct link to this data on their server (but I didn't look very hard)
dfM = pd.read_csv('data/WorldBank/API_NY.GDP.MKTP.CD_DS2_en_csv_v2_6011335.csv', skiprows = 3)
dfM

In [None]:
# Join these 4 tables so that I have one DataFrame with all values for a given country.
# I will start by joining the two vaccination data tables.
dfJ1 = dfV.join(dfVM.set_index('ISO3'), on = 'ISO3', how = 'left', rsuffix = '_meta')

# Next I will join this with the COVID-19 data table.
# First rename this column in the COVID-19 data so that it is the same as the vaccine data.  Then I will join on that column.
dfCT.rename(columns = {'Name':'COUNTRY'}, inplace = True)
dfJ2 = dfJ1.join(dfCT.set_index('COUNTRY'), on = 'COUNTRY', how = 'left')

# Finally, I will join in the GDP data from the World Bank.
# I will rename a column in the World Bank data to match a column in the joined data above.
dfM.rename(columns = {'Country Code':'ISO3'}, inplace = True)
dfJoinedCOVID = dfJ2.join(dfM.set_index('ISO3'), on = 'ISO3', how = 'left')

dfJoinedCOVID

In [None]:
print(dfJoinedCOVID.columns.tolist())

### 1.2 COVID cases and deaths as a function of time 

In [None]:
# COVID-19 cases and deaths as a function of time for multiple countries
# dfC = pd.read_csv('../data/WHO-COVID/WHO-COVID-19-global-data.csv') # in case the WHO server goes down
dfC = pd.read_csv('https://covid19.who.int/WHO-COVID-19-global-data.csv', storage_options=storage_options)

# convert the date column to datetime objects for easier plotting and manipulation later on
dfC['Date_reported'] = pd.to_datetime(dfC['Date_reported'])
dfC['Date_reported_yr'] = dfC['Date_reported'].dt.year + dfC['Date_reported'].dt.dayofyear/365
dfC

In [None]:
# Select only the data that is from a single country
country = 'United States of America'
dfCCountry = dfC.loc[dfC['Country'] == country]
dfCCountry

## 2. Create a few figures using [Plotly express](https://plotly.com/python/plotly-express/).

Plotly express is a simplified version of the Plotly interface for Python that allows users to create many types of Plotly figures with single lines of code.  This greatly simplifies the workflow for some kinds of Plotly figures.  We will start with Plotly express (and for some of your use cases, that may be enough), but we will move on to full blown Plotly for the rest of this workshop.

In this plot, I will show total vaccinations vs. GDP with the point size scaled by the total cumulative COVID-19 cases.  

In [None]:
# Note: We imported plotly.express as px
# I will create a simple scatter plot using the DataFrame I created above, 
# x will be the total vaccinations per 100 people
# y will be the 2020 GDP, and since it spans a very wide range in values, I will plot y in the log
fig = px.scatter(dfJoinedCOVID, x = 'TOTAL_VACCINATIONS_PER100', y = '2020', log_y = True)
fig.show()

There are a lot of options that you can apply to a plotly express scatter plot (e.g., see [here](https://plotly.com/python/line-and-scatter/)).  I will do the following:
- size each data point by the number of COVID-19 cases.
- add a trend line (A nice part of plotly express is that you can add a trend line very easily.)

In [None]:
# The sizes will behave better if I set a minimum value (using np.clip)
# I also want to remove any nan values
size =  np.clip(np.nan_to_num(dfJoinedCOVID['Cases - cumulative total per 100000 population']/500.), 5, None)

fig = px.scatter(dfJoinedCOVID, x = 'TOTAL_VACCINATIONS_PER100', y = '2020', log_y = True, 
    size = size,
    trendline = 'ols', trendline_options = dict(log_y = True)
)
fig.show()

Lets also plot the first vaccination start date vs. GDP, with the size based on the total vaccionations.  In this example, I will also modify the hover and axis attributes.

In [None]:
# The command is similar to that from the previous cell, but here I'm also defining the data shown on hover in the tooltips.
# (It's not quite as easy to add a trendline here when plotting dates, though it is possible.)

size = np.clip(np.nan_to_num(dfJoinedCOVID['TOTAL_VACCINATIONS_PER100']),5,None)

fig = px.scatter(dfJoinedCOVID, x = 'START_DATE', y = '2020', log_y = True, 
    size = size,
    hover_name = 'COUNTRY', 
    hover_data = ['2020', 
      'START_DATE', 
      'TOTAL_VACCINATIONS_PER100',
      'Cases - cumulative total per 100000 population'
    ]
)

# a few manipulations to the axes 
fig.update_xaxes(title = 'Vaccine Start Date', range = [np.datetime64('2020-07-01'), np.datetime64('2023-08-01')])
fig.update_yaxes(title = '2020 GDP (USD)')

fig.show()

As an alternative example, let's also [create a histogram using plotly express](https://plotly.com/python/histograms/).

I will plot a histogram of all the total vaccinations per 100 people, separated (colored) by vaccination name.

*Note that this automatically comes with an interactive legend.*

In [None]:
fig = px.histogram(dfJoinedCOVID.fillna('None'), x = 'TOTAL_VACCINATIONS_PER100', nbins = 20, 
                   color = 'VACCINE_NAME', barmode = 'stack')
fig.show()

### *Exercise 1: Create your own plot using Plotly express.*

Use the data we read in above (or your own data).  You can start with one of the commands above or choose a different style of plot.  Whichever format you use, choose different columns to plot than above.  Try to also add a new option to the command to change the plot.  

Hint: Go to the [Plotly express homepage](https://plotly.com/python/plotly-express/), and click on a link to see many examples (e.g., [here's the page for the scatter plot](https://plotly.com/python/line-and-scatter/))

In [None]:
# Create a plot using Plotly express


## 3. Create a Ploty express [animation](https://plotly.com/python/animations/)

Let's show the cumulative cases and deaths over time

In [None]:
px.scatter(dfC, x  ="New_cases", y = "New_deaths", animation_frame = "Date_reported_yr", animation_group = "Country",
    size = "Cumulative_deaths", color = "WHO_region", hover_name = "Country", size_max = 70,
    log_x = True, log_y = True,  range_x = [10,1e7], range_y = [1,1e5], height = 600
)

## 4. Create the plot using the standard Plotly [Graph Object](https://plotly.com/python/graph-objects/).

For the remainder of the workshop we will use Graph Objects for our Plotly figures.  Plotly Graph Objects give you much more flexibility in how the figure looks and the type of interactions you can include.  For instance, Plotly express will only make an individual figure, and does not support arbitrary subplots.  If you want a multi-panel figure (e.g., to share an axis and zoom together), you can do this with Graph Objects.  Also, Graph Objects enable more complex interaction with custom widgets, e.g., buttons and dropdowns. 

First you create a <b>"trace"</b>, which holds the data.  There are many kinds of traces available in Plotly. (e.g., bar, scatter, etc.).  For this first example, we will use a scatter trace.  (Interestingly, the scatter trace object also includes line traces, accessed by changing the "mode" key.)

Then you create a figure and add the trace to that figure.  A single figure can have multiple traces.

In [None]:
# Create a plot using Plotly Graph Objects(s)

# Note: We imported the plotly.graph_objects as go.
# create the trace
trace1 = go.Scatter(x = dfJoinedCOVID['TOTAL_VACCINATIONS_PER100'], y = dfJoinedCOVID['2020'], # x and y values for the plot
    mode = 'markers', # setting mode to markers produces a typical scatter plot
)

# create the figure 
fig = go.Figure()

# add the trace and update a few parameters for the axes
fig.add_trace(trace1)
fig.update_xaxes(title = 'Total Vaccionations Per 100 People', range=[0,400])
fig.update_yaxes(title = 'GDP (USD)', type = 'log')
fig.update_layout(height = 600)

fig.show()

### *Exercise 2: Create your own plot using Plotly Graph Object(s).*

Use the data we read in above (or your own data).  You can start with one of the commands above or choose a different style of plot.  Whichever format you use, choose different columns to plot than above.  Try to also add a new option to the command to change the plot.  

*Hint*: The Plotly help pages usually contain examples for both Plotly express and Graph Object.  If you go to the [Plotly express homepage](https://plotly.com/python/plotly-express/) and click on a link (e.g., [the page for the scatter plot](https://plotly.com/python/line-and-scatter/)), you can scroll down to see Graph Object examples.

In [None]:
# Create a plot using Plotly Graph Objects(s)

# First, create the trace

# Second, create the figure and show it


### Let's add more customizations to the figure above.

In [None]:
# Note: We imported the plotly.graph_objects as go.
# create the trace and set various parameters

trace1 = go.Scatter(x = dfJoinedCOVID['TOTAL_VACCINATIONS_PER100'], y = dfJoinedCOVID['2020'], # x and y values for the plot
    mode = 'markers', # setting mode to markers produces a typical scatter plot
    showlegend = False, # I do not need a legend
    # set various parameters for the markers in the following dict, e.g., color, opacity, size, outline, etc.
    marker = dict( 
        color = 'rgba(0, 0, 0, 0.2)',
        opacity = 1,
        size = np.nan_to_num(np.clip(dfJoinedCOVID['Cases - cumulative total per 100000 population']/1000., 5, 100)),
        line = dict(
            color = 'rgba(0, 0, 0, 1)',
            width = 1
        ),
    ),
    # set a template for the tooltips below.  
    # hovertemplate can accept the x and y data and additional "text" as defined by a separate input
    # Note, the "<extra></extra>" is included to remove some formatting that plotly imposes on tooltips
    hovertemplate = '%{text}' + 
        'Total Vaccinations / 100 people: %{x}<br><extra></extra>' +
        'GDP: $%{y}<br>',
    # additional text to add to the hovertemplate.  This needs to be a list with the same length and the x and y data.
    text = ['Country: {}<br>Total COVID Cases / 100,000 people: {}<br>Vaccine start date: {}<br>'.format(x1, x2, x3) 
        for (x1, x2, x3) in zip(dfJoinedCOVID['COUNTRY'], 
            dfJoinedCOVID['Cases - cumulative total per 100000 population'], 
            dfJoinedCOVID['START_DATE'].dt.strftime('%b %Y'))
    ],
    # style the tooltip as desired                
    hoverlabel = dict(
        bgcolor = 'white',
    )
)

In [None]:
# create the figure
fig = go.Figure()

# add the trace and update a few parameters for the axes
fig.add_trace(trace1)
fig.update_xaxes(title = 'Total Vaccionations Per 100 People', range=[0,400])
fig.update_yaxes(title = 'GDP (USD)', type = 'log')
fig.update_layout(height = 600)

fig.show()

In [None]:
# Add a trendline
# I will use scipy.stats.linregress (and fit to the log of the GDP)
dfFit1 = dfJoinedCOVID.dropna(subset = ['TOTAL_VACCINATIONS_PER100', '2020'])
slope1, intercept1, r1, p1, se1 = scipy.stats.linregress(dfFit1['TOTAL_VACCINATIONS_PER100'], np.log10(dfFit1['2020']))
xFit1 = np.linspace(0, 400, 100)
yFit1 = 10.**(slope1*xFit1 + intercept1)
trace1F = go.Scatter(x = xFit1, y = yFit1, 
    mode = 'lines', # Set the mode the lines (rather than markers) to show a line.
    opacity = 1, 
    marker_color = 'black',
    showlegend = False,
    hoverinfo='skip' # Don't show anything on hover.  (We could show the trendline info, but I'll leave that out for now.)
)

In [None]:
# create the figure
fig = go.Figure()

# add the trace and update a few parameters for the axes
fig.add_trace(trace1)
fig.add_trace(trace1F)
fig.update_xaxes(title = 'Total Vaccionations Per 100 People', range=[0,400])
fig.update_yaxes(title = 'GDP (USD)', type = 'log')
fig.update_layout(
    autosize = False,
    width = 500,
    height = 500,
    margin = dict(l = 50, r = 10, b = 50, t = 30, pad = 4)
)
fig.show()

## 5. Create a plot showing COVID-19 cases and deaths vs. time for a given country.

Recall that we read in this dataset above. 

I will also include [custom buttons](https://plotly.com/python/custom-buttons/) to toggle between various ways of viewing the data and a [custom dropdown](https://plotly.com/python/dropdowns/) to select the country.

In [None]:
# Create the trace.
# In this example I will use a bar chart.
trace3 = go.Bar(x = dfCCountry['Date_reported'], y = dfCCountry['New_cases'], 
    opacity = 1, 
    marker_color = 'black',
    showlegend = False,
    name = 'COVID Cases'
)

# Create the figure.
fig = go.Figure()

# Add the trace and update a few parameters for the axes.
fig.add_trace(trace3)
fig.update_xaxes(title = 'Date')
fig.update_yaxes(title = 'Total COVID-19 Cases')
fig.show()

#### Let's improve this plot.

- I want to take a rolling average (this is easily done with `pandas`).
- I'd prefer a filled region rather than bars.

In [None]:
# Define the number of days to use for the rolling average.
rollingAve = 7

In [None]:
# Create the trace, using Scatter to create lines and fill the region between the line and y=0.
trace3 = go.Scatter(x = dfCCountry['Date_reported'], y = dfCCountry['New_cases'].rolling(rollingAve).mean(), 
    mode = 'lines', # Set the mode the lines (rather than markers) to show a line.
    opacity = 1, 
    marker_color = 'black',
    fill = 'tozeroy',  # This will fill between the line and y=0.
    showlegend = False,
    name = 'COVID Count',
    hovertemplate = 'Date: %{x}<br>Number: %{y}<extra></extra>', #Note: the <extra></extra> removes the trace label.
)

# Create the figure.
fig = go.Figure()

# Add the trace and update a few parameters for the axes.
fig.add_trace(trace3)
fig.update_xaxes(title = 'Date')
fig.update_yaxes(title = 'Total COVID-19 Cases')
fig.show()

### *Exercise 3: Create your own plot showing COVID-19 deaths vs. time.*

You can use either Plotly express or Graph Objects.  Try to pick a different country than I used above.  Also try to use a different style than I plotted above.  

In [None]:
# Create a Plotly figure showing COVID-19 deaths vs. time


### 5.1. Add buttons to interactively change the plot.

I want to be able to toggle between cumulative vs. total as well as cases vs. death.  We can do this with [custom buttons](https://plotly.com/python/custom-buttons/) that will "restyle" the plot.  

You can also create interactions with buttons and other "widgets" using [dash](https://plotly.com/dash/), but we won't go there in this workshop. 

In [None]:
columns = ['New_cases', 'New_deaths', 'Cumulative_cases', 'Cumulative_deaths']

#I'm going to write this as a function so that I can reuse it below
def createTraces(columns):
    # For this scenario, I am going to add each of the 4 traces to the plot but only show one at a time
    # Add traces for each column
    
    traces = [
        go.Scatter(x = dfCCountry['Date_reported'], y = dfCCountry[c].rolling(rollingAve).mean(), 
            mode = 'lines', # Set the mode the lines (rather than markers) to show a line.
            opacity = 1, 
            marker_color = 'black',
            fill = 'tozeroy',  # This will fill between the line and y=0.
            showlegend = False,
            name = 'COVID Count',
            hovertemplate = 'Date: %{x}<br>Number: %{y}<extra></extra>', #Note: the <extra></extra> removes the trace label.
            visible = i == 0
        ) for i, c in enumerate(columns)
    ]

    
    return traces


# I'm going to write this as a function so that I can reuse it below
# x,y args to position the buttons
def createButtons(columns, x = 0.0, y = 1.17):
    # create an "updatemenu" with buttons for choosing the data to plot that I will add to the figure later

    updatemenu = dict(
            type = 'buttons',
            direction = 'left', # This defines what orientation to include all buttons.  'left' shows them in one row.
            buttons = list([
                dict(
                    # 'args' tells the button what to do when clicked.  
                    #     In this case it will change the visibility of the traces
                    # 'label' is the text that will be displayed on the button
                    # 'method' is the type of action the button will take.
                    #    method = 'restyle' allows you to redefine certain preset plot styles (including the visible key).  
                    #    See  https://plotly.com/python/custom-buttons/ for different methods and their uses
                    args = [{'visible': [i == j for j in range(len(columns))]}], 
                    label = label.replace('_',' '),
                    method = 'restyle' 
                ) for i, label in enumerate(columns)]),
        
            showactive = True, # Highlight the active button
            # Below is for positioning
            x = x, 
            xanchor = 'left',
            y = y,
            yanchor = 'top'
        )
    
    return updatemenu

In [None]:
# Create the figure.
fig = go.Figure()

# create the traces
traces = createTraces(columns)

# add the traces to the figure
for t in traces:
    fig.add_trace(t)
    
# create the buttons and add them to the figure below
buttons = createButtons(columns)

# Update a few parameters for the axes and add the buttons
#   Note: I added a margin to the top ('t') of the plot within fig.update_layout to make room for the buttons.
fig.update_xaxes(title = 'Date')#, range = [np.datetime64('2020-03-01'), np.datetime64('2022-01-12')])
fig.update_yaxes(title = 'COVID-19 Count')
fig.update_layout(
    title_text = 'COVID-19 Data Explorer : '+ country + '<br>(' + str(rollingAve) +'-day rolling average)',
    margin = dict(t = 200),
    height = 600,
    updatemenus = [buttons]
)

fig.show()

### 5.2. Create a dropdown menu to choose the country.

[Here are examples of how to include dropdown menus in Plotly](https://plotly.com/python/dropdowns/).  

The procedure will be similar to the buttons, but we will use the "update" mode (rather than "restyle") for the dropdown menu.  Update will allow us to change the data being plotted.

In [None]:
# I am going to create the dropdown list here and then add it to the figure below
# I will need to update the x and y data for the time series plot 

# Identify the countries to use 
# I will but The United States of America first so that it can be the default country on load (the first button)
availableCountries = dfC['Country'].unique().tolist()
availableCountries.insert(0, availableCountries.pop(availableCountries.index('United States of America'))) 

# I will write this as a function as well and then create a new figure in the next cell that uses this function
# x,y args to position the dropdown
def createDropdown(availableCountries, columns, x = 0.0, y = 1.1):
    # create an "updatemenu" with a dropdown for choosing the data to plot that I will add to the figure later

    dropdown = []
    for c in availableCountries:
        dropdown.append(dict(
            args = [{'x': [dfC.loc[dfC['Country'] == c]['Date_reported']]*len(columns), # the same x values for each trace
                     'y': [dfC.loc[dfC['Country'] == c][col].rolling(rollingAve).mean() for col in columns],
            }],
            label = c,
            method = 'update'
        ))

    updatemenu = dict(
        buttons = dropdown,
            direction = 'down',
            showactive = True,
            x = x,
            xanchor = 'left',
            y = y,
            yanchor = 'top'
        )
        

    return updatemenu

In [None]:
# Create the figure.
fig = go.Figure()

# create the traces
traces = createTraces(columns)

# add the traces to the figure
for t in traces:
    fig.add_trace(t)

# generate the menus to be added to the figure below
updatemenus = [createButtons(columns, 0, 1.35), createDropdown(availableCountries, columns, 0, 1.2)]

# Update a few parameters for the axes and add the buttons and dropdown
fig.update_xaxes(title = 'Date')#, range = [np.datetime64('2020-03-01'), np.datetime64('2022-01-12')])
fig.update_yaxes(title = 'COVID-19 Count')
fig.update_layout(
    title_text = 'COVID-19 Data Explorer : '+ country + '<br>(' + str(rollingAve) +'-day rolling average)',
    title_y = 0.95,
    margin = dict(t = 200),
    height = 600,
    updatemenus = updatemenus
)

fig.show()

In [None]:
# You can save the plotly figure as an html file to use on your website.
fig.write_html('plotly_graph.html')

# Extensions

Developing UI elements in `plotly` is cumbersome.  Here are a couple alternative options that can be used with `plotly` to streamline the code and enable a more robust user experience.

- [dash](https://dash.plotly.com/)
- [Shiny for Python](https://shiny.posit.co/py/)