# Plotting with Seaborn and Plotly

Jupyter Notebooks & Pandas Workshop  
HackFrost NL, February 20, 2021

In this notebook, we are going to look at our data using two libraries:
1.  Seaborn
2.  Plotly Express

Seaborn and Plotly are both visualization libraries. 

Both libraries allow you to create good looking plots and figures with relatively little effort, but also give you rich customization so you can modify the plot and figure however you need.

**Seaborn** is built on matplotlib, and allows you to create a good looking plot or figure with relatively little effort. If you need specific customization, you can always drop down to matplotlib where you have complete control over all aspects of the image.

**Plotly Express** is a slimmed down version of Plotly, also allowing you to create a good looking plot or figure with relatively little effort. This interaces with the full power of Plotly if you need more specific control.

## Load Libraries

Seaborn is usually bound to the local name ``sns``.

For Plotly, we are going to focus on using Plotly Express, but will load the full plotly package in case we need it.

In [None]:
import pandas as pd

import seaborn as sns

import plotly
import plotly.express as px
import plotly.graph_objs as go

## Read Data

Load the covid-19 dataset from Our World in Data.

In [None]:
df = pd.read_csv('owid-covid-data.csv')

# Seaborn

Let's first look at Seaborn.

Seaborn API: https://seaborn.pydata.org/api.html

Seaborn have a variety of plots:
- Line plots,
- Scatter plots,
- Histograms,
- Box plots.

**Line plot** of the number of new cases per day.

In [None]:
sns.lineplot(x="date", y="new_cases", data=df[df.iso_code == 'CAN'])

**Scatter plot** of number of tests vs number of new cases.

In [None]:
sns.scatterplot(x="new_tests", y="new_cases", data=df)

**Histogram** of the positivity rate of tests. Each bar shows the count, that is the number, of days with that positivity rate.

In [None]:
sns.histplot(x = 'positive_rate', data=df)

**Box plot** of the life expectancy per continent. 

The box spans the first to third *quartile*, that is from the 25th percentile to the 75th percentile. Half your data will be in that box.

The line across the box is the median, that is, the 50th percentile. Half of your data will be above this line, the other half below it.

In [None]:
sns.boxplot(x="continent", y="life_expectancy", data=df)

### Saving Seaborn plots to File

Saving a plot to file requires using matplotlib, because Seaborn functions return a matplotlib figure. More about matplotlib can be read off their website: https://matplotlib.org/stable/

Without going deep into details, you can get the matplotlib Figure object from the output of the seaborn call, which you can then save to file. See the below code example.

In [None]:
sns_plot = sns.lineplot(x="date", y="new_cases", data=df[df.iso_code == 'CAN'])
sns_fig = sns_plot.get_figure()
sns_fig.savefig('cases.png')

### Increasing Figure Size in Seaborn

As with saving to file, increasing the figure size requires using matplotlib. The code sample below will accomplish this.

We will need to import matplotlib to set up the figure size.

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.figure(figsize = (9,6)) # these are the dimensions of the figure
sns.lineplot(x="date", y="new_cases", data=df[df.iso_code == 'CAN'])

### Seaborn Summary

One line code! Almost all plots use a function syntax of ``sns.plottype(x = 'column', y = 'column', data = dataframe)``.

Most things you want to plot can be accomplished using
- ``sns.lineplot()``
- ``sns.scatterplot()``
- ``sns.histplot()`` also distplot for older versions of seaborn
- ``sns.boxplot()``

Seaborn is built on matplotlib. Seaborn functions return a matplotlib figure (technically a matplotlib Axes object, which is part of a Figure). Increasing the figure size requires using matplotlib, as does saving to file (see examples above).


# Plotly Express

Let's next explore the functionality of Plotly Express.

Plotly Express overiew: https://plotly.com/python/plotly-express/  
The API is available here: https://plotly.com/python-api-reference/

Plotly Express lets you make a number of plot types using one line of code, such as,
- Line plots,
- Scatter plots,
- Histograms,
- Box plots.

Let's explore our data using some of these plots.

**Line plot** of the number of new cases per day.

In [None]:
px.line(df[df.iso_code == 'CAN'], x="date", y="new_cases")

Plotly has two big advantages. 

One is that the plot is interactive. You can mouse over the plot to see values at specific points, and are able to zoom in and pan around.

The second is that you can export this as html, which can then be embedded in a webpage or hosted online.

**Scatter plot** of number of tests vs number of new cases.

In [None]:
px.scatter(df, x="new_tests", y="new_cases")

**Histogram** of the positivity rate of tests. Each bar shows the count, that is the number, of days with that positivity rate.

In [None]:
px.histogram(df, x="positive_rate")

**Box plot** of the life expectancy per continent. 

The box spans the first to third *quartile*, that is from the 25th percentile to the 75th percentile. Half your data will be in that box.

The line across the box is the median, that is, the 50th percentile. Half of your data will be above this line, the other half below it.

In [None]:
px.box(df, x="continent", y="life_expectancy")

### Saving Plotly plots to File

Just click the download plot as png button!

### Saving Plotly plots as HTML

One of the great things about Plotly is that its plots can be saved as html. These can be hosted online, outside of Jupyter, and embedded in web pages.

In [None]:
fig = px.line(df[df.iso_code == 'CAN'], x="date", y="new_cases")

plotly.offline.plot(fig, filename='cases.html')

### Plotly Express Summary

Plotly Express works very similary to Seaborn, except that it creates interactive plots. Most common types of plots can be created using:
- ``px.line()``
- ``px.scatter()``
- ``px.histogram()``
- ``px.box()``

These all use the same syntax of ``px.plottype(data, x = 'column', y = 'column')``.

# Plotting Multiple Data to the Same Plot

So far we have plotted one column vs another column. Let's look at plotting multiple data sets on the same set, for example, multiple lines on one plot.

### Multiple Plots on Seaborn and Plotly

With Seaborn, you can just call plotting functions multiple times. Seaborn will plot them all on the same plot!

In [None]:
plt.figure(figsize = (9,6))
sns.lineplot(x="date", y="new_cases", data=df[df.iso_code == 'CAN'])
sns.lineplot(x="date", y="new_cases_smoothed", data=df[df.iso_code == 'CAN'])

Using this style of Plotting multiple plots with Plotly takes a bit more effort. It requires using the full power of Plotly (not just Plotly Express). Here is a code snippet that will accomplish this, which adds a Plotly graph object to the existing figure. 

Rather than having to dive into the full Plotly library, we will focus down below on changing the way our data is stuctured to allow potting of multiple variables using just Plotly Express.

In [None]:
fig = px.line(df[df.iso_code == 'CAN'], x="date", y="new_cases")
fig.add_trace(
    go.Scatter(x = df[df.iso_code == 'CAN'].date, y = df[df.iso_code == 'CAN'].new_cases_smoothed))


Below is another approach where we change the stucture of how the data is stored to accomplish this in a simpler way.

### Melting Data

Right now, our ``new_cases`` and ``new_cases_smoothed`` data are in separate columns. Seaborn and Plotly Express work better when these can be combined into a single column (that would be twice as long), with a second column that labels whether that row is of the new_cases or new_cases_smoothed type.

In essence, we convert these two columns into key-value pairs.

An example will help if this doesn't make sense yet.

Here is the current dataframe.

In [None]:
df[df.iso_code == 'CAN'].head()

In [None]:
df[['date', 'new_cases', 'new_cases_smoothed']]

Now let's convert the new_cases and new_cases_smoothed columns into a key-value pair of columns.

To do this, we will use Pandas ``melt`` functionality.  
The API for this can be found here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html

In [None]:
df_cases = df[df.iso_code == 'CAN'].melt(id_vars='date', value_vars=['new_cases', 'new_cases_smoothed'])

In [None]:
df_cases.sample(10)

We can see that instead of a column for the number of new_cases and a second column for the number of new_cases_smoothed, we have a column for the variable (new_cases or new_cases_smoothed) and the value (the value of that key on that date).

We can now plot the values, and Seaborn and Plotly Express will use the key (variable) to distinguish multiple lines.

In [None]:
sns.lineplot(x = 'date', y = 'value', hue = 'variable', data = df_cases)

In [None]:
px.line(df_cases, x = 'date', y = 'value', color = 'variable')

# Summary

Seaborn and Plotly Express let you look at your data with one line pieces of code. Who doesn't love one line code?

Their strength is that you can customize all aspects of your plot using matplotlb (which Seaborn is built on) and plotly (for Plotly Express).

One great thing about Plotly is its interactivity, and ability to export to html.