# Interactive Data Graphics with Plotly

In this lecture, we'll use the Plotly library to construct engaging interactive graphics using a high-level interface. In this lecture, we'll use Plotly to build out the standard data visualization tools. 

There are a number of plot types not shown here: check the [Plotly Express overview](https://plotly.com/python/plotly-express/) for many other interesting and useful plot types. 

For this lecture, we're going to take a break from the NOAA climate data set. You'll use Plotly to construct visualizations using this data in Homework 1. For now, we're going to use the #BestDataSet: Palmer Penguins! 

<figure class="image" style="width:50%">
  <img src="https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/man/figures/lter_penguins.png" alt="">
  <figcaption><i>Artwork by @allison_horst</i></figcaption>
</figure>

First, let's retrieve and clean up the data a little. These are all standard pandas operations, so we're not going to spend much time here. 

In [None]:
import pandas as pd
filename = "palmer_penguins.csv"
penguins = pd.read_csv(filename)
penguins = penguins.dropna(subset = ["Body Mass (g)", "Sex"])
penguins["Species"] = penguins["Species"].str.split().str.get(0)
penguins = penguins[penguins["Sex"] != "."]

cols = ["Species", "Island", "Sex", "Culmen Length (mm)", "Culmen Depth (mm)", "Flipper Length (mm)", "Body Mass (g)"]
penguins = penguins[cols]

Let's take a look at the simplified data set: 

In [None]:
penguins.head()

As you know from Homework 0, each row corresponds to an individual penguin. The penguin's species, island of encounter, and sex are recorded as qualitative variables. There are also measurements of the penguin's  bill (culmen is the upper ridge of it), as well as its flipper length and body mass. There are some additional columns which we're ignoring for today. 

<figure class="image" style="width:50%">
  <img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png" alt="">
  <figcaption><i>Artwork by @allison_horst</i></figcaption>
</figure>

## Getting Started with Plotly 

Plotly includes a very large catalog of interesting plotting capabilities. We are only going to scratch the surface, using the Plotly Express module, which allows us to create several of the most important kinds of plots using convenient, high-level functions. 

Let's run an example before breaking it down. 

In [None]:
import plotly
plotly.__version__

In [None]:
import plotly.io as pio
pio.templates.default

In [None]:
from plotly import express as px

fig = px.scatter(data_frame = penguins,
                 x = "Culmen Length (mm)",
                 y = "Culmen Depth (mm)",
                 color = "Species",
                 width = 500,
                 height = 300,
                )

#reduce whitespace
fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})
# show the plot
fig.show()

Let's break this down a bit. 

The first line imports the `express` module of `plotly`, which provides a high-level interface to a variety of Plotly tools. One can also work directly with the low-level `graph_objects` module, which allows one a finer level of control over the settings of visualizations. We won't use `graph_objects` in this course. 

```python
fig = px.scatter(data_frame = penguins,    # data set
                 x = "Culmen Length (mm)", # column for x axis
                 y = "Culmen Depth (mm)",  # column for y axis
                 color = "Species",        # column for dot color
                 width = 500,              # width of figure
                 height = 300)             # height of figure
```

Recall that our standard syntax from Matplotlib usually requires us to pass `x` and `y` as *arrays* or *lists*. Here, we do something different: we start by specifying a *data frame*. Then, `x`, `y`, and several other arguments that we'll see in a moment are interpreted as *columns* of the supplied data frame. So, a call like 

```python
fig = px.scatter(data_frame = penguins,    
                 x = "Culmen Length (mm)", 
                 y = "Culmen Depth (mm)") 
```

is somewhat similar to 

```python
ax.scatter(penguins["Culmen Length (mm)"], 
           penguins["Culmen Depth (mm)"])
```

using familiar Matplotlib tools. We'll see that the Plotly approach makes it much easier to construct complex data graphics in situations in which our data is in the form of a data frame. 

*Side note*: the syntax of Plotly Express is similar to that of the [Seaborn package](https://seaborn.pydata.org/), which is a non-interactive library for constructing complex graphics from data frames. 

Let's fancy up our plot a little: 

In [None]:
fig = px.scatter(data_frame = penguins,
                 x = "Culmen Length (mm)",
                 y = "Culmen Depth (mm)",
                 color = "Species",
                 hover_name = "Species",
                 hover_data = ["Island", "Sex"],
                 size = "Body Mass (g)",
                 size_max = 8,
                 width = 500,
                 height = 300,
                 opacity = 0.5
                )

#reduce whitespace
fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})
# show the plot
fig.show()

There are nice marginal plots for the statistically-inclined: 

In [None]:
fig = px.scatter(data_frame = penguins,
                 x = "Culmen Length (mm)",
                 y = "Culmen Depth (mm)",
                 color = "Species",
                 hover_name = "Species",
                 hover_data = ["Island", "Sex"],
                 size = "Body Mass (g)",
                 size_max = 8,
                 width = 500,
                 height = 300,
                 opacity = 0.5,
                 marginal_y = "box",
                 marginal_x = "rug"
                )

#reduce whitespace
fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})
# show the plot
fig.show()

It's possible to exercise fine-grained control over the plot appearance, but an easier way is through themes: 

In [None]:
import plotly.io as pio

# pio.templates.default = "ggplot2"
pio.templates.default = "plotly_white"


fig = px.scatter(data_frame = penguins,
                 x = "Culmen Length (mm)",
                 y = "Culmen Depth (mm)",
                 color = "Species",
                 hover_name = "Species",
                 hover_data = ["Island", "Sex"],
                 size = "Body Mass (g)",
                 size_max = 8,
                 width = 500,
                 height = 300,
                opacity = 0.5,
                marginal_y = "box",
                marginal_x = "rug")

#reduce whitespace
fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})
# show the plot
fig.show()

Modifying the theme this way will change the appearance of all future plots. 

## Facets

*Facetting* refers to creating multiple, small plots, each of which display a subset of the data. Plotly supports the easy creation of facets using the `facet_col` and `facet_row` arguments.

In [None]:
fig = px.scatter(data_frame = penguins,
                 x = "Culmen Length (mm)",
                 y = "Culmen Depth (mm)",
                 color = "Species",
                 hover_name = "Species",
                 hover_data = ["Island", "Sex"],
                 size = "Body Mass (g)",
                 size_max = 8,
                 width = 500,
                 height = 300,
                opacity = 0.5,
                facet_col="Sex")

#reduce whitespace
fig.update_layout(margin={"r":0, "t":50, "l":0, "b":0})
# show the plot
fig.show()

You can even use both at the same time! 

In [None]:
fig = px.scatter(data_frame = penguins,
                 x = "Culmen Length (mm)",
                 y = "Culmen Depth (mm)",
                 color = "Species",
                 hover_name = "Species",
                 hover_data = ["Island", "Sex"],
                 size = "Body Mass (g)",
                 size_max = 8,
                 width = 500,
                 height = 300,
                opacity = 0.5,
                facet_col="Island",
                 facet_row = "Sex"
                )
fig.update_xaxes(tickangle=45, tickfont=dict(family='Rockwell', color='crimson', size=14))
# reduce whitespace
fig.update_layout(
    title="Plot Title",
    xaxis_title="X Axis Title",
    yaxis_title="Y Axis Title",
    legend_title="Legend Title",
    font=dict(
        family="Courier New, monospace",
        size=10,
        color="RebeccaPurple"
    )
)
fig.update_layout(margin={"r":0,"t":50,"l":0,"b":0})
# show the plot
fig.show()

Faceting is an easy way to create complex, scientifically interesting plots using minimal effort. 

## Geographic Data Visualization

How about some interactive *geographic visualization*? 


### Creating Basemaps

In [None]:
import pandas as pd
coords = pd.DataFrame({
    "lon" : [-118.44300984639733], 
    "lat" : [34.0696449790177],
    "message" : ["Wish you were here"]
})
coords

In [None]:
from plotly import express as px

fig = px.scatter_mapbox(coords, 
                        lat = "lat",
                        lon = "lon", 
                        hover_name = "message",
                        zoom = 15,
                        height = 300,
                        mapbox_style="open-street-map")

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Let's break this down a bit. The first line is importing the `express` module of `plotly`.

The magic happens starting on the third line, when we call `px.scatter_mapbox()`. The first argument must be a data frame. The `lat` and `lon` arguments tell `px` which columns contain the latitude and longitude coordinates. The `hover_name` specifies what should appear when we hover over the plotted point with our mouse. `zoom` controls the initial zoom level of the map, which can subsequently be modified by the user. `height` allows one to control the aspect ratio. There are many [other parameters](https://plotly.github.io/plotly.py-docs/generated/plotly.express.scatter_mapbox.html) to `px.scatter_mapbox()`. 

The final two lines control which *map tiles* are used in the visualization and the amount of whitespace around the visualization. The final line actually displays the map. 

Now let's try changing up the zoom level and the map tiles. The `positron` tiles from CartoDB are very low-contrast, which is very helpful when creating plots that use these tiles as backgrounds. 

In [None]:
# different zoom level, use cartoDB tiles
fig = px.scatter_mapbox(coords, 
                        lat = "lat",
                        lon = "lon", 
                        hover_name = "message",
                        zoom = 18,
                        height = 300,
                        mapbox_style="carto-positron")

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
# different zoom level, 
fig = px.scatter_mapbox(coords, 
                        lat = "lat",
                        lon = "lon", 
                        hover_name = "message",
                        zoom = 10,
                        height = 300,
                        mapbox_style="carto-darkmatter")

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Summing up, Plotly makes it unreasonably easy to create attractive, interactive maps in Python. Let's now go from "pretty maps" to "informative, scientific data graphics." 

### Visualizing Climate Measurement Stations

Let's now use our GHCN data on global temperatures to create some interesting visualizations. As a first step, we'll create a set of markers for different climate stations. First, let's grab the data on stations: 

In [None]:
import os
import urllib
# download station-metadata.csv if it does not exist
url = "https://raw.githubusercontent.com/PIC16B-ucla/24F/refs/heads/main/datasets/noaa-ghcn/station-metadata.csv"
if not os.path.exists("station-metadata.csv"): 
    urllib.request.urlretrieve(url, "station-metadata.csv")

In [None]:
import numpy as np

filename = "station-metadata.csv"
stations = pd.read_csv(filename)
stations.head()

For the purposes of geographic plotting, the key columns here are the `LATITUDE` and `LONGITUDE` columns. Let's try plotting! 

Note that it might take a little while for the map to render. There are 27.5k points, which is kind of a lot! 

In [None]:
fig = px.scatter_mapbox(stations,
                        lat = "LATITUDE",
                        lon = "LONGITUDE",
                        hover_name = "NAME",
                        zoom = 1,
                        height = 300)

fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

This is cool and interactive, but there are a few shortcomings if we want to display scientific information. It's hard to make comparisons -- for example, it looks like there might be a higher density of stations in the US than in many other areas, but it's hard to be sure from the map above. For comparing densities, *heatmaps* provided a useful approach. Ploty again makes this unreasonably easy. 

In [None]:
fig = px.density_mapbox(stations,
                        lat = "LATITUDE",
                        lon = "LONGITUDE",
                        radius = 1,
                        zoom = 0,
                        height = 300)

fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

The colors get brighter and more intense the more stations there are in that area. We can notice a few things, such as the very high density of measurement stations in the US and Germany. 

However, it's harder to see patterns when we zoom in much more. If we want to look at patterns within Europe, for example, we might want to increase the radius. 

Experimentation with the [various arguments](https://python-visualization.github.io/folium/latest/reference.html#folium.plugins.HeatMap) of the `HeatMap` function is usually necessary to obtain a good result. 

In [None]:
fig = px.density_mapbox(stations,
                        lat = "LATITUDE",
                        lon = "LONGITUDE",
                        radius = 5,
                        zoom = 2,
                        height = 300,
                       center={"lat": 51, "lon": 10})

fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

## Geographic Scatterplots

Another thing we might want to do is color code the climate stations according to some quantitative measure. Let's compute the average temperature in March for each one over the most recent decade, and use this to color code them. 

In [None]:
interval = "2011-2020"
url = f"https://raw.githubusercontent.com/pic16b-ucla/24F/main/datasets/noaa-ghcn/decades/{interval}.csv"
temps = pd.read_csv(url)

First we'll compute the average in March for each station. 

In [None]:
march_avgs = temps.groupby("ID")[["VALUE3"]].mean() / 100

Next, we'll *merge* the latitude/longitude data from the `stations` data frame. 

In [None]:
march_avgs = march_avgs.reset_index()

march_avgs = pd.merge(march_avgs, stations, on="ID").dropna()

In [None]:
march_avgs[["VALUE3", "LATITUDE", "LONGITUDE"]].head()

Great! This is the data we need. Now we can supply this data to `px.scatter_mapbox`, using as the value of `color` the name variable that we want use to shade the points. 

In [None]:
fig = px.scatter_mapbox(march_avgs,
                        lat = "LATITUDE",
                        lon = "LONGITUDE",
                        hover_name = "NAME",
                        color = "VALUE3",
                        zoom = 1,
                        opacity = 0.2,
                        height = 300,
                       )
                        
fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

This plot makes it easy to see that countries near the equator tend to be warmer (at least in March). 

## Saving and Sharing

To save your visualization as HTML, just use `write_html` from `plotly.io`. 

In [None]:
from plotly.io import write_html
write_html(fig, "geo_scatter.html")

In [None]:
import plotly.io as pio
pio.renderers.default

You can then send this file to people you'd like to impress! 

For Quarto blogs, the most reliable way has been setting renderer as `iframe` and keeping the auto-generated `iframe_figures/` folder with your notebook. The folder contains the generated `.html` file corresponding to each cell number. They will be automatically loaded as long as the folder `iframe_figures/` is in the same folder as the `.ipynb` file.  

In [None]:
import plotly.io as pio
pio.renderers.default="iframe"

In [None]:
fig.show()

## Statistical Graphics: Histograms, Boxplots, and Densities

Scatterplots are probably the most universally useful plot type, but Plotly enables the creation of many other useful plot types. Here, let's focus on plot types for estimating univariate and bivariate densities. 

### Histograms

In [None]:
fig = px.histogram(penguins,
                   x = "Culmen Length (mm)",
                   color = "Species",
                   opacity = 0.5,
                   nbins = 30, 
                   barmode = "overlay",#"stack", # also: "group", "overlay"
                   width = 600,
                   height = 300)

# reduce whitespace
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
# show the plot
fig.show()

### Boxplots

In [None]:
fig = px.box(penguins,
             x = "Species",
             y = "Body Mass (g)",
             color = "Sex",
             width = 600,
             height = 300)

# reduce whitespace
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
# show the plot
fig.show()

## Bivariate Distributions

A density heatmap is just a bivariate histogram: 

In [None]:
fig = px.density_heatmap(penguins,
                         x = "Body Mass (g)",
                         y = "Flipper Length (mm)",
                         facet_row = "Sex",
                         nbinsx = 25, 
                         nbinsy = 25)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Density *contours* provide a nice alternative: 

In [None]:
fig = px.density_contour(penguins,
                         x = "Body Mass (g)",
                         y = "Flipper Length (mm)",
                         facet_row = "Sex", 
                         color = "Species")
fig.update_layout(margin={"r":50,"t":0,"l":0,"b":0})
fig.show()

## Fancy Stuff

Sometimes, we'd like to show relationships between many variables at once. In such cases, standard 2d plots can feel restrictive, and we might seek more complicated plot types. This is sometimes productive, but it's important not to chase complexity for its own sake. 

### 3d Scatterplots

In [None]:
fig = px.scatter_3d(penguins,
                    x = "Body Mass (g)",
                    y = "Culmen Length (mm)",
                    z = "Culmen Depth (mm)",
                    color = "Species",
                    opacity = 0.5)

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

## Alluvial Diagrams

Alluvial diagrams can be used to compare tabulations of categorical variables. 

In [None]:
colors = {"Adelie"    : "#2a9d8f",
          "Chinstrap" : "#e9c46a",
          "Gentoo"    : "#e76f51"}
color_hex = penguins["Species"].map(colors)

fig = px.parallel_categories(penguins, 
                             dimensions = ["Species", "Island", "Sex"],
                             color = color_hex,
                             height = 300)


fig.update_layout(margin={"r":20,"t":0,"l":20,"b":0})
fig.show()

This diagram makes it clear that Gentoo penguins are only found on Biscoe Island, and Chinstraps are only found on Dream Island, while the sexes of penguins are approximately balanced for each species on each island. 



## Parallel Coordinates

One can construct similar visualizations for quantitative variables. These are interesting and support quite entertaining filtering operations, but can also be somewhat challenging to give a clean "look."

In [None]:
spec_ids = penguins["Species"].map({"Adelie"    : 1,
                                    "Chinstrap" : 2,
                                    "Gentoo"    : 3})

fig = px.parallel_coordinates(penguins,
                              dimensions = ["Culmen Depth (mm)",
                                            "Culmen Length (mm)",
                                            "Flipper Length (mm)",
                                            "Body Mass (g)"],
                              color = spec_ids,
                              color_continuous_scale=px.colors.diverging.Tealrose,
                              color_continuous_midpoint=2,
                              height = 400)


fig.show()

## Takeaways For Today

- Plotly Express makes it unreasonably easy to create attractive, sophisticated, and interactive data graphics. 
- Amidst all these tools, it's important to choose the one that's right for the story you want to tell -- if your story is simple, use a simple visualization. 
- Penguins are very good birds. 