# Plotly Walkthrough
HCDE 411 - Fall 2020

Plotly is an open source graphing library for Python. This module shows basic visualizations that can be done using Plotly. 

>_Built on top of the Plotly JavaScript library (plotly.js), plotly enables Python users to create beautiful interactive web-based visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files, or served as part of pure Python-built web applications using Dash. The plotly Python library is sometimes referred to as "plotly.py" to differentiate it from the JavaScript library._

The specific implementation we are using is Plotly Express ([see info here](https://medium.com/plotly/introducing-plotly-express-808df010143d)), which is a wrapper for the plotly python library, which makes it even easier to use!

## Setup

### Install plotly
To install the library in your server, use the pip tool. Open a terminal session on your server (It is in the Launcher tab. You may need to start a new Launcher from the File menu.). In the terminal session type: `pip insall --user plotly`.


### Rebuild the labextensions
Plotly requires Rebuild your jupyter lab extensions with this command:
`jupyter labextension install jupyterlab-plotly --minimize=False`

Check your installation succeeded with: `jupyter labextension list`:

`jovyan@jupyter-agreatstudent:~$ 
JupyterLab v1.2.16
Known labextensions:
   app dir: /opt/conda/share/jupyter/lab
        @jupyter-widgets/jupyterlab-manager v1.1.0  enabled  OK
        jupyterlab-plotly v4.12.0  enabled  OK
        jupyterlab_bokeh v1.0.0  enabled  OK`

Plotly offers a variety of [visualizations](https://plotly.com/python/). From basic graphs to advanced interactive visualizations. We will go through different type of visualizations plotly is capable of. 

Read the [Plotly Fundamentals tutorials](https://plotly.com/python/plotly-fundamentals/) or dive straight in to some [Basic Charts tutorials.](https://plotly.com/python/basic-charts/)

Plotly Express commands are list in [this reference](https://plotly.com/python-api-reference/plotly.express.html).

Here are some simple examples for you to try. Play around with them to get a sense of how they work.

We will be using numpy and pandas as well, so they'll be imported after plotly:

In [None]:
import plotly.express as px # plotly library
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

## Bar graph

With px.bar, each row of the DataFrame is represented as a rectangular mark. Customization documentation can be found [here](https://plotly.com/python/styling-plotly-express/). 

Every visualization created with plotly offers interactivy right away. If you hover over the bars a tooltip with information appears. Also, on the top right corner there are multiple option to zoom, pan, select, and more! You can even download a png version of the visualization. Try it yourself!

In [None]:
px.bar(y=[1, 1, 2, 3, 5, 8, 13, 21], title = "Example Bar graph", labels = {"x":"X axis", "y":"Y axis"})

You can also create a figure object to have more flexibility later on. 

In [None]:
fig = px.bar(y=[1, 1, 2, 3, 5, 8, 13, 21], title = "Example Bar graph", labels = {"x":"X axis", "y":"Y axis"})
fig.show()

## Scatter Plot

With px.scatter, each data point is represented as a marker point, whose location is given by the x and y columns. Here we are using the [Iris Dataset](https://archive.ics.uci.edu/ml/datasets/iris) to plot sepal width vs length.

In [None]:
df = px.data.iris()
fig = px.scatter(df, 
                 x="sepal_width", 
                 y="sepal_length", 
                 color="species",
                 size='petal_length', 
                 hover_data=['petal_width'], # We can add more data to the tooltip
                 labels={"sepal_length": "Sepal length", 
                         "sepal_width": "Sepal width",
                         "species": "Species"}, 
                 title ="Sepal width vs Sepal length")
fig.show()

## Line graphs

With px.line, each data point is represented as a vertex (which location is given by the x and y columns) of a polyline mark in 2D space. Here we are showing the life expectancy of every country in America.

In [None]:
df = px.data.gapminder().query("continent=='Americas'")
fig = px.line(df, x="year", y="lifeExp", color='country', title="Life expectancy in America countries")
fig.show()

## Histograms

In plotly, a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...). Also, the data to be binned can be numerical data but also categorical or date data. By default, the number of bins is chosen so that this number is comparable to the typical number of samples in a bin. This number can be customized, as well as the range of values.

In [None]:
x = np.random.randn(500)
fig = px.histogram(x=x, histnorm='probability')

fig.show()

## 3D surface Plot

There are a wide variety of visualizations possible with plotly. We will be using mostly plotly express. `plotly.graph_objects`contains more complex visualizations such as this cool 3D surface plot of Mt. Bruno. You can interact with it to see it from all angles.

In [None]:
import plotly.graph_objects as go

z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')

fig = go.Figure(data=[go.Surface(z=z_data.values)])

fig.update_layout(title='Mt Bruno Elevation', autosize=False,
                  width=500, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))

fig.show()

# UFO Sightings dataset

Let's create some visualizations using plotly with a dataset of [UFO sigthings](https://www.kaggle.com/thaddeussegura/ufo-sightings) in the US.

In [None]:
df = pd.read_csv("clean_ufo_data.csv")
df.tail()

The dates seem to have different formats. Let's change them all to datetime and put any unrecognized format as "Not a Time" (NaT) using the `errors='coerce'` option. This can take some time to finish.

In [None]:
df["occurred_date_time"] = pd.to_datetime(df["occurred_date_time"], errors='coerce')

In [None]:
df.tail()

They seem to have datetime format now, but a lot of them are NaT. We will drop any rows where date time is NaT.

In [None]:
df = df.dropna(subset=['occurred_date_time'])
df.tail()

Now it looks much better!

Let's group the data by year and state. Using `groupby` we can count how many ocurrences there are for any combination of year an state.

In [None]:
df_grouped = df.groupby([df["occurred_date_time"].dt.year, df["state"]])["id"].count().reset_index(name="sightings")
df_grouped.rename(columns={'occurred_date_time':'year'}, inplace=True) # we will rename this column to be more representative (this is not necessary)
df_grouped.head()

Now we have a dataframe including year, state and number of sightings. Let's take a look into the data for Washington state.

In [None]:
df_wa = df_grouped[df_grouped["state"]=="WA"]
df_wa.tail(5)

Let's create a bar chart of the sightings per year for Washinton state

In [None]:
fig = px.bar(df_wa,x="year",y="sightings", title="UFO sightings in Washington state")
fig.show()

Creating a scatter plot of the same data is easy!

In [None]:
fig = px.scatter(df_wa, x="year", y="sightings", title="UFO sightings in Washington state")
fig.show()

We want to see this data for all of the states at the same time. Let's create a scatter plot that includes all of the states.

In [None]:
fig = px.scatter(df_grouped, x="year", y="sightings", color="state", title="UFO sightings in USA per state")
fig.show()

We can also create a new column that includes the cumulative sum of all the sightings.

In [None]:
df_grouped['total'] = df_grouped.groupby(['state'])['sightings'].apply(lambda x: x.cumsum())
df_grouped.tail()

Using this data we can plot total UFO sightings per state using a line graph. You can filter data in the visualization using the state key.

In [None]:
fig = px.line(df_grouped, x="year", y="total", color='state', title = "Total UFO Sightings per state")
fig.show()

There's too much occlusion. Let's do the same visualization but with the top 5 states with most sightings.

In [None]:
top5 = df_grouped.groupby("state")["sightings"].sum().sort_values(ascending=False).head(5).reset_index()["state"]
top5

In [None]:
fig = px.line(df_grouped[df_grouped["state"].isin(top5)], 
              x="year", 
              y="total", 
              color='state', 
              title = "Top 5 states with most UFO sightings")
fig.show()

There is still some occlusion. If we have the data per state maybe using a map is better to show all the information at the same time.

## Choropleth map

Using plotly we will create a choropleth map that show sightings per state. Let's show all the sightings in the year 2005. We will be using a color scale. You can read more about plotly built-in color scales [here](https://plotly.com/python/builtin-colorscales/).

In [None]:
year_df = df_grouped[df_grouped["year"] == 2005]

fig = px.choropleth(year_df, 
                    locations = "state",  
                    locationmode="USA-states", 
                    color='sightings',      
                    color_continuous_scale="bluyl",
                    labels={'sightings':'UFO Sightings', "state":"State"},
                    scope="usa")

fig.update_layout(title= "UFO Sightings in 2005", 
                  margin={"r":20,"t":100,"l":20,"b":20})

fig.show()

Let's also show the year. We can do that by adding a slider using the `animation_frame` option.

In [None]:
fig = px.choropleth(df_grouped, 
                    locations=df_grouped['state'],  
                    locationmode="USA-states", 
                    color='sightings',      
                    color_continuous_scale="bluyl",
                    scope="usa",
                    animation_frame="year", 
                    title= "UFO Sightings per year",
                    labels={'sightings':'UFO Sightings', "state":"State"})
fig.show()

Now we can move through the years! You can see that the range of UFO sightings changes for each year. Comparing between years is more difficult that way. To fix this we can add a fixed range for all of the years with the option `range_color`.

In [None]:
fig = px.choropleth(df_grouped, 
                    locations=df_grouped['state'],  
                    locationmode="USA-states", 
                    color='sightings',      
                    color_continuous_scale="bluyl",
                    range_color=(0, df_grouped["sightings"].max()), # Fixed color range
                    scope="usa",
                    animation_frame="year", 
                    title= "UFO Sightings per year",
                    labels={'sightings':'UFO Sightings', "state":"State"})
fig.update_layout(margin={"r":20,"t":100,"l":20,"b":20})
fig.show()

Now we have an interactive visualization that shows all the sightings per year and state at the same time.

# Exercises

Refer to the examples above to guide you in completing the following exercises. You may need to do some research in the Plotly or Pandas documentation to help you out.

For this set of exercises you will be using the [Bird Strikes](https://www.kaggle.com/breana/bird-strikes) dataset. The dataset contains a record of each reported wildlife strike of a military, commercial, or civil aircraft between 1990 and 2015. Each row contains the incident date, aircraft operator, aircraft make and model, airport name and location, species name and quantity, and aircraft damage.

First make sure you can load the dataset:

In [None]:
bird_df = pd.read_csv("bird_strikes.csv")

In [None]:
bird_df.head()

## Exercise 1

1. Plot a bar graph of incidents per year.
2. Plot a bar graph of incidents per year for Air Canada, Delta Air Lines, American Airlines and Hawaiian Airlines in the same visualization.

## Exercise 2
 
1. Make a scatter plot of the top 10 airlines with most incidents
2. Plot a line graph of cumulative incidents per year for Air Canada, Delta, American Airlines and Hawaiian Airlines in the same visualization.

## Exercise 3

1. Create a Choroplet map that shows the total number of incidents per state.
2. Add a slider to change years to your Choropleth map

## Exercise 4 (extra credit)

Create a new (fun) additional visualization of a type that we did not cover in class using the same dataset. You can look for inspirations [here](https://plotly.com/python/)! Explain your chart, including choice of dimensions, values, and encodings in comments or a markdown text box.