# Visualizations with Plotly

In this chapter, we'll get introduced to the plotly library, which has the ability to create interactive data visualizations for the web. All previous chapters used matplotlib, which is a better tool for static visualizations.

## Plotly vs Dash

Both the plotly and dash libraries are products of the [company Plotly][1]. Both libraries are free and open source with an enterprise version available for extra features and services. The [plotly python library][2] is closely related to the [dash python library][3], but each have different purposes. The plotly library creates the visualizations, producing them as independent HTML and JavaScript files that can be embedded on any page, including Jupyter Notebooks.

The dash library creates the dashboards with tools such as data tables, tabs, dropdowns, radio buttons, and many more widgets. It also runs the application, allowing an interactive experience for the users. All graphs in a dash application are created from the plotly library. We will build our dashboard with dash, but must learn enough plotly first to make our visualizations.

## Introduction to Plotly

The [plotly python library][2] is enormous and covering all details is out of scope for this course. This chapter presents the most relevant components of the library for our specific application. I suggest keeping the documentation open, so that you can have a reference to the official tutorials on all parts of the library. Before we get started, let's read in the `all_data.csv` file which has all of the historical and predicted data for all areas. The exact data that you have will depends on the last time you ran `python update.py` in the notebooks directory.

[1]: https://plotly.com/
[2]: https://plotly.com/python/
[3]: https://plotly.com/dash/

In [1]:
import pandas as pd
df_all = pd.read_csv('data/all_data.csv', parse_dates=['date'])
df_all.tail()

Unnamed: 0,group,date,area,Daily Deaths,Daily Cases,Deaths,Cases
80064,usa,2020-12-05,Virginia,16,894,4189,217527
80065,usa,2020-12-05,Washington,13,762,2768,142733
80066,usa,2020-12-05,West Virginia,6,319,662,36850
80067,usa,2020-12-05,Wisconsin,66,5996,4335,444479
80068,usa,2020-12-05,Wyoming,5,443,226,28675


We'll select the state of Texas for our plotting examples and place the date in the index.

In [2]:
df_texas = df_all.query('group == "usa" and area == "Texas"')
df_texas = df_texas.set_index('date')
df_texas.tail()

Unnamed: 0_level_0,group,area,Daily Deaths,Daily Cases,Deaths,Cases
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-12-01,usa,Texas,140,7282,22872,1206684
2020-12-02,usa,Texas,140,7250,23012,1213934
2020-12-03,usa,Texas,140,7216,23152,1221150
2020-12-04,usa,Texas,139,7181,23291,1228331
2020-12-05,usa,Texas,139,7143,23430,1235474


We'll also read in the summary table which has a column containing the last date of known data.

In [3]:
df_summary = pd.read_csv('data/summary.csv', parse_dates=['date'])
df_summary.head()

Unnamed: 0,group,area,Daily Deaths,Daily Cases,Deaths,Cases,code,population,Deaths per Million,Cases per Million,date
0,world,Afghanistan,4,86,1548,41814,AFG,38.928341,40.0,1070.0,2020-11-05
1,world,Albania,7,421,543,22721,ALB,2.8778,189.0,7900.0,2020-11-05
2,world,Algeria,12,642,2011,60169,DZA,43.851043,46.0,1370.0,2020-11-05
3,world,Andorra,0,90,75,5135,AND,0.077265,971.0,66460.0,2020-11-05
4,world,Angola,3,289,299,12102,AGO,32.866268,9.0,370.0,2020-11-05


We assign this last known date to its own variable and calculate the first predicted date. These values will be useful when graphing the actual and predicted values separately.

In [4]:
last_date = df_summary['date'].iloc[0]
first_pred_date = last_date + pd.Timedelta('1D')
last_date, first_pred_date

(Timestamp('2020-11-05 00:00:00'), Timestamp('2020-11-06 00:00:00'))

### General steps to create a plotly graph

There are multiple ways to create graphs in plotly, but since this is not a comprehensive tutorial, we will show just a single straightforward path and use it for all of our graphs. The following three steps will be used to create our graphs:

1. Create Figure - with `go.Figure` or `make_subplots`
2. Add trace - with `fig.add_*`
3. Update layout - with `fig.update_layout` or `fig.update_*`

## Plotly Figure Object

All of our plots begin with the creation of a plotly figure which is done by importing the `graph_objects` module. Here, it is imported an aliased as `go`. We then create an empty figure, assign it to a variable, and then output it to the screen.

In [6]:
import plotly.graph_objects as go
fig = go.Figure()
fig

### Adding traces

All "traces" can be added to the figure with one of the `add_*` methods, where the `*` references one of the trace names. In plotly, a **trace** is one of several dozen different kinds of visualizations able to be added to a figure (scatter, bar, pie, histogram, etc...). In as few words as possible, a trace is a "type of plot". [Visit this reference page][1] to see a list of all possible traces in the left margin. Click on one of the traces to view a description of each parameter. 

Here, we create a scatter (and line) plot using the `add_scatter` method. We set `x` to be the index (containing the date) and `y` to be the column for deaths from our DataFrame. The `mode` parameter has three common settings:

* `"lines"` - connect the points without showing the markers
* `"markers"` - show just the markers
* `"lines+markers"` - connect the points and show the markers

There is no `add_line` method in plotly. Instead, use `add_scatter` with `mode` set to `"lines"` to create a line plot.

[1]: https://plotly.com/python/reference/index/

In [10]:
x = df_texas.index
y = df_texas['Deaths']
fig = go.Figure()
fig.add_scatter(x=x, y=y, mode="lines+markers")

### Updating the layout

In plotly, the **layout** consists of the following graph properties plus several more:

* height
* width
* title
* xaxis/yaxis
* legend
* margin
* annotations

Here, we plot the same trace as above, but change the height and width (given in pixels) of the figure and provide a title.

In [11]:
fig = go.Figure()
fig.add_scatter(x=x, y=y, mode="lines+markers")
fig.update_layout(height=400, 
                  width=800,
                  title="COVID-19 Deaths in Texas")

### Finding all of the layout properties

The `update_layout` method does not show any of its properties in its docstrings. To view all of the layout properties, visit [this layout reference page][1]. You'll notice that many of the properties are **nested**, meaning that these properties have properties themselves that can be set using a dictionary.

Another way to find the layout properties (while in a Jupyter Notebook) is to access the layout object directory using `fig.layout`. Place a single `.` after it and then **press tab**. A list of all properties will appear in a dropdown menu as seen in the image below.

![2]

From here, choose one of the properties and press **shift + tab + tab** to reveal the docstrings. Below, the docstrings for the `title` property are shown.

![3]

Let's set a more specific title using several of its properties with a dictionary. Notice that `font` is a further nested property with three more properties (color, family, and size). Find more information with `fig.layout.title.font` (pressing **shift + tab + tab**). The coordinates for `x` and `y` use the range 0 to 1 (relative position left to right and bottom to top).

[1]: https://plotly.com/python/reference/layout/
[2]: images/layout_props.png
[3]: images/layout_docs.png

In [12]:
fig.update_layout(title={
    "text": "COVID-19 Deaths in Texas",
    "x": .5,
    "y": .85,
    "font": {
        "color": "blue",
        "family": "dejavu sans",
        "size": 25
    }
})

## Creating a figure with multiple traces

Any number of traces may be added to the same figure. Here, we split the DataFrame into actual and predicted values and make two separate calls to the `add_scatter` method. The `name` parameter is used as a label in the legend. Notice, that the color of the second line will automatically be different than the first. The default color sequence for successive traces is titled "Plotly" and is [found here][1].

[1]: https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express

In [19]:
from plotly.colors import qualitative
COLORS = qualitative.T10

In [21]:
last_date = df_summary ['date'].iloc[0]
first_pred_data = last_date + pd.Timedelta('1D')
df_texas_actual = df_texas[:last_date]
df_texas_pred = df_texas[first_pred_date:]
fig = go.Figure()
fig.add_scatter(x=df_texas_actual.index, 
                y=df_texas_actual['Deaths'], 
                mode="lines+markers", 
                line={'color': COLORS[0]},
                name='actual')
fig.add_scatter(x=df_texas_pred.index, 
                y=df_texas_pred['Deaths'], 
                mode="lines+markers", 
                line={'color': COLORS[1]},
                name='prediction')
fig.update_layout(height=400, width=800)

### Exercise 26

<span style="color:green; font-size:16px">Write a function that accepts a group, area, and kind and returns a bar plot of the actual and predicted kind for that area.</span>

In [40]:
def area_bar_plot(df, group, area, kind, last_date, first_pred_date):
    """
    Creates a bar plot of actual and predicted values for given kind 
    from one area
    
    Parameters
    ----------
    df - All data DataFrame
    
    group - "world" or "usa"
    
    area - A country or US state
    
    kind - "Daily Deaths", "Daily Cases", "Deaths", "Cases"

    last_date - last known date of data

    first_pred_date - first predicted date
    """
    #Selects a specific group and area
    df = df.query("group == @group and area == @area").set_index("date")
    #Splits the trace where the predictions begin
    df_actual = df[:last_date]
    df_pred = df[first_pred_date:]
    #Creates a figure and assigns it a bar trace
    fig = go.Figure()
    fig.add_bar(x=df_actual.index, y=df_actual[kind], name="actual")
    fig.add_bar(x=df_pred.index, y=df_pred[kind], name="prediction")
    return fig

In [41]:
area_bar_plot(df_all, 'world', 'Italy', 'Cases', last_date, first_pred_date)

## Creating subplots

Multiple plots within a single figure can be created with the `make_subplots` function from the `subplots` module. It creates a rectangular grid of subplots using the provided `rows` and `cols` parameters. To add a trace to a specific subplot, use the `row` and `col` parameters in the `add_*` methods. Here, we plot both actual and predicted traces for both daily deaths and cases.

In [46]:
from plotly.subplots import make_subplots
fig = make_subplots(rows=2, cols=1)

# top subplot
fig.add_scatter(x=df_texas_actual.index, 
                y=df_texas_actual['Deaths'], 
                mode="lines+markers", 
                name='actual',
                row=1,
                col=1)
fig.add_scatter(x=df_texas_pred.index, 
                y=df_texas_pred['Deaths'], 
                mode="lines+markers", 
                name='prediction',
                row=1,
                col=1)

# bottom subplot
fig.add_scatter(x=df_texas_actual.index, 
                y=df_texas_actual['Cases'], 
                mode="lines+markers", 
                name='actual',
                row=2,
                col=1)
fig.add_scatter(x=df_texas_pred.index, 
                y=df_texas_pred['Cases'], 
                mode="lines+markers", 
                name='prediction',
                row=2,
                col=1)

### Cleaning up the subplots

While we have our traces plotted correctly, there are a few changes we can make to improve this graph. The colors for the actual/prediction should be the same in each graph and repeated names in the legend should be removed. Below, we write a nested for-loop to iterate over the kinds ("Deaths" and "Cases") and again over the actual and predicted DataFrames, which are stored in a dictionary. We choose the first two colors from the T10 qualitative color sequence (this is Tableau's default colors).

To prevent the legend from repeating the same names, we use the `update_traces` method, which allows us to specify which subplot to hide the legend. The `update_layout` method uses the same parameter `showlegend`, but applies its changes to ALL subplots. There are several other `update_*` methods that allow you to specify the subplot. Use the `update_layout` method when you want to change a property for the entire figure.

In [51]:
df_texas_actual.tail()

Unnamed: 0_level_0,group,area,Daily Deaths,Daily Cases,Deaths,Cases
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-11-01,usa,Texas,52,32867,18840,975717
2020-11-02,usa,Texas,42,6812,18882,982529
2020-11-03,usa,Texas,105,9721,18987,992250
2020-11-04,usa,Texas,140,11092,19127,1003342
2020-11-05,usa,Texas,128,10749,19255,1014091


In [52]:
df_texas_pred.tail()

Unnamed: 0_level_0,group,area,Daily Deaths,Daily Cases,Deaths,Cases
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-12-01,usa,Texas,140,7282,22872,1206684
2020-12-02,usa,Texas,140,7250,23012,1213934
2020-12-03,usa,Texas,140,7216,23152,1221150
2020-12-04,usa,Texas,139,7181,23291,1228331
2020-12-05,usa,Texas,139,7143,23430,1235474


In [47]:
from plotly.colors import qualitative
#Assigns a specific set of colors to the COLORS variable
COLORS = qualitative.T10[:2]
#Specifies the "kinds" that we have seen several times prior
KINDS = 'Deaths', 'Cases'
#Assigns the dataframe variable df to be a dictionary containing 
#actual and prediction values
dfs = {'actual': df_texas_actual, 'prediction': df_texas_pred}
#This creates the figure in a loop so that we can create it twice (like above)
#but with less code and more organized
fig = make_subplots(rows=2, cols=1, vertical_spacing=.1)
for row, kind in enumerate(KINDS, start=1):
    for i, (name, df) in enumerate(dfs.items()):
        fig.add_scatter(x=df.index, 
                        y=df[kind], 
                        mode="lines+markers", 
                        name=name,
                        line={"color": COLORS[i]},
                        row=row,
                        col=1)

fig.update_traces(showlegend=False, row=2, col=1)
fig.update_layout(title={"text": "Texas", "x": 0.5, "y": 0.97, "font": {"size": 20}})
fig

## Adding annotations

The `make_subplots` function allows you to set the titles with the `subplot_titles` parameter, but does not give you control over any of its properties (color, size, font, etc...). You can only provide it text. To create titles with any non-default properties, you'll need to make an annotation using either the `add_annotation` method or the `update_layout` method. We choose the latter below to add two annotations (they act as titles for our subplots).

You must set the `annotations` parameter within `update_layout` to be a list of dictionaries, with each dictionary representing a single annotation. If all annotations share some properties, you can provide all of the shared properties to the `update_annotations` method instead of repeating them in the `update_layout` method. 

The `xref`/`yref` refer to the coordinate system used for `x` and `y`. When set to "paper", the values correspond to the proportion of the figure and must be in the range 0 to 1. Since plotly produces HTML, to make the text bold, we wrap the text in `<b></b>` tags.

The margin is the space between the four edges of the plot and the figure. They default to 80 pixels for the left and right margins and 100 for the top and bottom. We decrease this space so that the plots fill out more of the figure. We also move the legend below the bottom subplot. This graph should now look almost exactly like the one in the dashboard.

In [55]:
fig.update_layout(
    #Set annotation parameters to a list of dictionaries
            annotations=[
                {"y": 0.95, "text": "<b>Deaths</b>"},
                {"y": 0.3, "text": "<b>Cases</b>"},
            ],
    #Sets margins for top, left, right, and back
            margin={"t": 40, "l": 50, "r": 10, "b": 0},
    #Sets where the legend will be, its font and its orientation (Set to between the plots)
            legend={
                "x": 0.5, 
                "y": -0.05, 
                "xanchor": "center", 
                "orientation": "h", 
                "font": {"size": 15}},
        )
annot_props = {
        "x": 0.1,
        "xref": "paper",
        "yref": "paper",
        "xanchor": "left",
        "showarrow": False,
        "font": {"size": 18},
    }
#Method sets parameters for all the annotations in the figure # Essential sets a standard
fig.update_annotations(annot_props)
fig

## Choropleth maps

The [choropleth trace][1] creates a variety of polygons (countries and US states for our project) colored by the value of a given numeric variable. Let's create the default (base) map by creating a figure and then calling `add_choropleth` with no arguments.

[1]: https://plotly.com/python/reference/choropleth/

In [56]:
fig = go.Figure()
fig.add_choropleth()

### Coloring countries by deaths

Let's read in the summary table and select the world group to get a single row of data per country. We also filter for countries with at least 1 million in population.

In [57]:
df_world = df_summary.query("group == 'world' and population > 1")
df_world.head(3)

Unnamed: 0,group,area,Daily Deaths,Daily Cases,Deaths,Cases,code,population,Deaths per Million,Cases per Million,date
0,world,Afghanistan,4,86,1548,41814,AFG,38.928341,40.0,1070.0,2020-11-05
1,world,Albania,7,421,543,22721,ALB,2.8778,189.0,7900.0,2020-11-05
2,world,Algeria,12,642,2011,60169,DZA,43.851043,46.0,1370.0,2020-11-05


Each country has a [standardized ISO-3 code][1] that plotly understands. Let's assign these codes and the deaths column as their own variables.

[1]: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3

In [58]:
locations = df_world['code']
z = df_world['Deaths']

Let's recreate the choropleth with this information, setting the parameter `z` to the total number of deaths. We select a continuous color scale called "orrd". Find [all continuous color scales here][1].

[1]: https://plotly.com/python/builtin-colorscales/

In [59]:
fig = go.Figure()
fig.add_choropleth(locations=locations, z=z, zmin=0, colorscale="orrd")
fig.update_layout(margin={"t": 0, "l": 10, "r": 10, "b": 0})

### Selecting a better range and projection

It's unnecessary to show the very northern and southern areas of the world as well as the swath of emptiness in the Pacific Ocean. There are also a large number of [projections][1] to choose from. Projection "robinson" is chosen below, but feel free to experiment with others. We can select the latitude and longitude range, and the projection by setting the `geo` parameter in `update_layout`.

[1]: https://plotly.com/python/map-configuration/#map-projections

In [61]:
fig = go.Figure()
fig.add_choropleth(locations=locations, z=z, zmin=0, colorscale="orrd",  marker_line_width=0.5)
fig.update_layout(
    geo={
        "showframe": False,
        "lataxis": {"range": [-37, 68]},
        "lonaxis": {"range": [-130, 150]},
        "projection": {"type": "robinson"}
    },
    margin={"t": 0, "l": 10, "r": 10, "b": 0})

### Customizing the hover text

Hovering over each country shows only the value of `z` and the country code like in the image below.

![1]

We can customize this text to be anything we desire by supplying a sequence of the exact string to display for each country. The `hover_text` function below is applied to each row in the `df_world` DataFrame to create a long string of all of the data nicely formatted with line breaks (`<br>`) between each statistic. The DataFrame `apply` method is used to iterate over each row and apply this function to the values. The string for each of the first few rows is outputted below.

[1]: images/hovertext.png

In [64]:
def hover_text(x):
    name = x["area"]
    deaths = x["Deaths"]
    cases = x["Cases"]
    deathsm = x["Deaths per Million"]
    casesm = x["Cases per Million"]
    pop = x["population"]
    return (
        f"<b>{name}</b><br>"
        f"Deaths - {deaths:,.0f}<br>"
        f"Cases - {cases:,.0f}<br>"
        f"Deaths per Million - {deathsm:,.0f}<br>"
        f"Cases per Million - {casesm:,.0f}<br>"
        f"Population - {pop:,.0f}M"
    )

text = df_world.apply(hover_text, axis=1)
text.head()

0    <b>Afghanistan</b><br>Deaths - 1,548<br>Cases ...
1    <b>Albania</b><br>Deaths - 543<br>Cases - 22,7...
2    <b>Algeria</b><br>Deaths - 2,011<br>Cases - 60...
4    <b>Angola</b><br>Deaths - 299<br>Cases - 12,10...
6    <b>Argentina</b><br>Deaths - 32,766<br>Cases -...
dtype: object

Set the hover text with the `text` parameter, and force plotly to just use this provided text by setting `hoverinfo` to "text".

In [65]:
fig = go.Figure()
fig.add_choropleth(locations=locations, z=z, zmin=0, colorscale="orrd", 
                   marker_line_width=0.5, text=text, hoverinfo="text")
fig.update_layout(
    geo={
        "showframe": False,
        "lataxis": {"range": [-37, 68]},
        "lonaxis": {"range": [-130, 150]},
        "projection": {"type": "robinson"}
    },
    margin={"t": 0, "l": 10, "r": 10, "b": 0})

### USA Choropleth

There are two differences when making a similar map for the USA. Set the `locationmode` parameter to "USA-states" so that plotly recognizes the two-character state code and choose the projection to be "albers usa" which moves Alaska and Hawaii near the other 48 states. Here, we color by "Cases per Million".

In [66]:
df_states = df_summary.query("group == 'usa'")
locations = df_states['code']
z = df_states['Cases per Million']
text = df_states.apply(hover_text, axis=1)

fig = go.Figure()
fig.add_choropleth(locations=locations, locationmode='USA-states', z=z, zmin=0, 
                   colorscale="orrd", marker_line_width=0.5, text=text, hoverinfo="text")
fig.update_layout(
    geo={
        "showframe": False,
        "projection": {"type": "albers usa"}
    },
    margin={"t": 0, "l": 10, "r": 10, "b": 0})

## Plotly Summary

Plotly is a great tool for creating interactive data visualizations for the web. The three main steps for creating a visualization are:

1. Create Figure - with `go.Figure` or `make_subplots`
2. Add trace - with `fig.add_*`
3. Update layout - with `fig.update_layout` or `fig.update_*`

### Traces

* A trace is plotly terminology for a "kind of plot" (scatter, bar, pie, box, choropleth, etc...)
* Find the trace you want on [the left side of this page][1]
    * Or type `fig.add_` and press tab
* Read documentation for a specific trace once selected e.g. `fig.add_scatter` -> shift + tab + tab
* Add as many traces as you want to one figure

### Layout

* The layout is where properties such as height, width, title, xaxis/yaxis, legend, annotations, etc... are set
* Use `fig.update_layout` to set properties for entire figure
* Documentation does NOT show parameters with `fig.update_layout`
    * Discover them with `fig.layout.` + tab
    * Read documentation on specific property `fig.layout.title` -> shift + tab + tab
    
### Subplots

* Create grid of subplots with `make_subplots` using `rows` and `cols`
* All trace methods, `fig.add_*`, have `row` and `col` to specify subplot
* Use `fig.update_layout` to change properties on entire figure
* Other `fig.update_*` methods exist that have `row` and `col` parameters to change specific subplot

### Choropleth

* Colored polygons (countries and states for our project)
* Some properties are in `fig.add_choropleth`, others are in `fig.update_layout` using `geo` parameter
* Set `locations` to be code (ISO-3 for countries and two-character abbreviation for states)
* Set `locationmode` to be "USA-States" for USA
* Set projection and range (`latrange`/`lonrange`) for world
* Set projection to be "albers usa" for usa


## More to Plotly

The purpose of this chapter was to provide you with a simple and straightforward approach to using plotly for our project. There is much more to the library and multiple ways to interface with it. One newer and popular way for creating plotly graphs is with [plotly express][2], which is similar to the seaborn libray, in that it automatically groups and aggregates values for you. If you are interested in learning more about plotly, I would recommend waiting until after the completion of this course, as there is already a tremendous number of items covered and getting side tracked on the details of plotly will not help. The methods taught in this chapter (create figure, add trace, update layout) should give you the power to create nearly any plot and style it as you desire.

[1]: https://plotly.com/python/reference/index/
[2]: https://plotly.com/python/plotly-express/