# Interactive data visualization with Plotly

Notebook developed by Sam Maurer

**Plotly** is a company that makes a suite of tools for building interactive data visualizations. You can work with the visualizations inside notebooks, publish the notebooks for other people to manipulate, or even embed the visualizations in web pages. https://plot.ly

### Resources

Plotly tutorials: https://plot.ly/python/plotly-fundamentals/

Plotly API reference: https://plot.ly/python-api-reference/

### Background

Matplotlib and Seaborn work by generating an image file representing your visualization. (It's a PNG by default.) **Plotly generates JavaScript and HTML code instead**, which are the same components that most interactive web pages use. This allows visualizations to be interactive, as you'll see below.

If you're interested in more of the technical details: Plotly is first and foremost a *JavaScript* visualization library. The Python library generates a JSON representations of your chart, which is passed along to a different library for rendering. The rendered JavaScript is what's embedded in the notebook. 

How does it actually turn into an image? Plotly is built on top of a library called [D3](https://d3js.org) that provides low-level tools for creating data-driven web visualizations. So Plotly builds your chart out of D3 components, and then D3 generates dynamic SVG images.

### Displaying Plotly widgets in GitHub

If you upload a notebook to GitHub, Matplotlib and Seaborn charts will show up automatically, because the image itself is embedded in the notebook file. Plotly charts will **not show up** -- the information is there, but GitHub doesn't render the JavaScript and HTML. 

There's an easy solution, though. The Jupyter team has a web tool that will fully render and display any notebook with a public URL: https://nbviewer.jupyter.org. So just copy a notebook's GitHub URL and paste it into that form, and you'll be able to use the Plotly widgets. (You can paste a Dropbox link, too!)

## 1. Data prep

This notebook uses a data file called `trips.csv`, created in `chts-data-prep.ipynb`.

In [None]:
import pandas as pd

In [None]:
trips = pd.read_csv('trips.csv')
len(trips)

In [None]:
trips.head()

In [None]:
# Filter out some missing values to make later analysis easier:

trips = trips.loc[trips.trip_distance_miles.notnull()]
len(trips)

In [None]:
# Label some travel modes (see Seaborn demo for full list)

trips['mode_label'] = ''

trips.loc[trips['mode'].isin([1]), 'mode_label'] = 'walk'
trips.loc[trips['mode'].isin([2]), 'mode_label'] = 'cycle'
trips.loc[trips['mode'].isin([5]), 'mode_label'] = 'drive'
trips.loc[trips['mode'].isin([15,16,17]), 'mode_label'] = 'bus'
trips.loc[trips['mode'].isin([24,26,27]), 'mode_label'] = 'train'

## 2. Plotly Express

The easiest-to-use interface to Plotly is called "Plotly Express": https://plot.ly/python/plotly-express/

Similar to Seaborn, this has templates for generating a variety of common chart types. Here's the full list: https://plot.ly/python-api-reference/plotly.express.html

In [None]:
import plotly.express as px

In [None]:
# Let's start with a scatter plot. It takes a few seconds to generate -- this is
# because Plotly is building an interactive widget where we can pan and zoom
# through all 350,000 data points!

fig = px.scatter(trips, x='trip_distance_miles', y='prev_trip_duration_min')

# You have to add a title separately
fig.update_layout(title='Trip distance vs. duration, California Household Travel Survey')

fig.show()

In [None]:
# Let's focus on a much smaller subset of the data, and color dots by travel mode

data = trips.loc[(trips.city == 'BERKELEY') &
                 trips.mode_label.isin(['cycle','drive'])]

fig = px.scatter(data, 
                 x = 'trip_distance_miles', 
                 y = 'prev_trip_duration_min',
                 color = 'mode_label')

fig.update_layout(title='Trip distance vs. duration, by travel mode')
fig.show()

### Exercise

Try changing the city or the travel modes. Or, try creating a different kind of Plotly Express visualization: histograms are pretty straightforward too. https://plot.ly/python-api-reference/generated/plotly.express.histogram.html

## 3. Plotly Graph Objects

Underneath Plotly Express, there's a more detailed representation of each kind of chart in Plotly's `graph_objects` library. We have to work with these directly for certain kinds of customizations. For example, here we'll overlay two histograms in the same figure.

The full list of "Graph Object" chart types is here (scroll down a bit): https://plot.ly/python-api-reference/

In [None]:
import plotly.graph_objects as go

In [None]:
fig = go.Figure()  # Set up an empty figure

# A Plotly "trace" is a collection of data to be plotted
fig.add_trace(go.Histogram(x=trips.loc[trips.shopping == 1].trip_distance_miles,
                           histnorm = 'probability',  # normalize area of histogram to 1
                           name = 'shopping'))

fig.add_trace(go.Histogram(x=trips.loc[trips.work == 1].trip_distance_miles,
                           histnorm = 'probability',
                           name = 'commute'))

fig.update_traces(opacity=0.75)  # reduce opacity so we can see both distributions
fig.update_xaxes(range=[0,25])  # trim outliers

fig.update_layout(barmode='overlay',  # Overlay the histograms on top of each other
                  title='Distribution of shopping trip vs. commute trip distances')

fig.show()

## 4. Adding control: data selection, sliders, text input

You can also customize Plotly's widgets to provide even more control: selecting data with drop-down menus, adjusting variables with sliders, and so on. Unfortunately, this is not so easy to set up.

Here's an example of using a slider to toggle the visibility of lines on a chart: https://plot.ly/python/sliders/

Here's an example using drop-down menus to control which data is displayed: https://plot.ly/python/figurewidget-app/

These features are better suited for publishing an interactive visualization on the web than for exploratory data analysis, because you have to write code describing exactly what you want to happen when the input arrives. (And potentially, also set up hidden visualization elements to create the illusion of more interactivity than really exists!)

### Jupyter widgets

Related to this, there's also a library from the Jupyter team that provides general-purpose widgets designed specifically for Jupyter notebooks: https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html

You can link these to pretty much anything in your notebook, from simple text displays to Seaborn visualizations. These can be published as part of a notebook, but can't easily be embedded into other web pages.

In [None]:
from ipywidgets import interact

In [None]:
# Define a function to run based on interactive input

def f(x):
    return x*2

In [None]:
# Display a slider widget. 'X' is initially set to 10, and changes as the slider moves.
# Run the function 'f()' each time the input changes, and display the output.

interact(f, x=10);

## Exercises

Use the remainder of class to work on loading your own data and creating some visualizations, using either Seaborn or Plotly!