# Interactive plots with `plotly`

[Plotly](https://plotly.com) has recently become one of the most popular graphing libraries in the data community.
The library enables feature rich, interactive plotting from different programming languages, including Python of course.

Recommended sources to learn from:

* Official documentation: https://plotly.com/python/
* [Plotly Tutorial for Beginners](https://www.kaggle.com/kanncaa1/plotly-tutorial-for-beginners) on Kaggle
* [Visualization with Plotly.Express: Comprehensive guide](https://towardsdatascience.com/visualization-with-plotly-express-comprehensive-guide-eb5ee4b50b57)


Installation is best described in https://plotly.com/python/getting-started/

## Plotly vs. Plotly Express (vs. Dash)

The recommended way to interact with Plotly is via the high level 
[Plotly Express](https://plotly.com/python/plotly-express/) interface.
The "low-level" [Graph Objects](https://plotly.com/python/graph-objects/) interface contains
components for tree representations of graphical objects, such as [`Figure`](https://plotly.com/python/figure-structure/).
Plotly Express uses these components internally and returns instances of `plotly.graph_objects.Figure`.
Certain kinds of figures are cumbersome not yet possible to create with Plotly Express though.

[Dash](https://dash.plotly.com) is a Python framework for building web (data analytic) applications
with almost no required knowledge of HTML, CSS or JavaScript.
Plotly graphs are natively supported and rich interactions are possible in Dash apps.

In this lecture, we focus on Plotly Express.

## Plotly and Jupyter

It is possible to *interact from Jupyter notebooks* with Plotly figures 
using the [FigureWidget](https://plotly.com/python/figurewidget/) class.
This enables turning notebooks into even more powerful interactive applications.

## Exporting Plotly plots

There are two common / recommended ways to export Plotly figures.

1. Export the whole notebook using [`jupyter nbconvert`](https://nbconvert.readthedocs.io/) to HTML. This way, you can distribute the static version of your notebook with Plotly *interactivity preserved*. Ideal for exporting reports that do not require printing.

2. Export individual figures to static, high-quality files using [Kaleido](https://github.com/plotly/Kaleido). See the plotly static image export documentation for more information: https://plotly.com/python/static-image-export/.

## Plotly Express

*Quoting the documentation: https://plotly.com/python/plotly-express/*

Plotly Express is a terse, consistent, high-level API for creating
figures.


### Overview

The `plotly.express` module (usually imported as `px`) contains
functions that can create entire figures at once, and is referred to as
Plotly Express or PX. Plotly Express is a built-in part of the `plotly`
library, and is the recommended starting point for creating most common
figures. Every Plotly Express function uses [graph
objects](https://plotly.com/python/graph-objects/) internally and returns a
`plotly.graph_objects.Figure` instance. Throughout the `plotly`
documentation, you will find the Plotly Express way of building figures
at the top of any applicable page, followed by a section on how to use
graph objects to build similar figures. Any figure created in a single
function call with Plotly Express could be created using graph objects
alone, but with between 5 and 100 times more code.

Plotly Express provides [more than 30 functions for creating different
types of
figures](https://plotly.com/python-api-reference/plotly.express.html).
The API for these functions was carefully designed to be as consistent
and easy to learn as possible, making it easy to switch from a scatter
plot to a bar chart to a histogram to a sunburst chart throughout a data
exploration session. *Scroll down for a gallery of Plotly Express plots,
each made in a single function call.*

Plotly Express currently includes the following functions:

-   **Basics**: [`scatter`](https://plotly.com/python/line-and-scatter/),
    [`line`](https://plotly.com/python/line-charts/),
    [`area`](https://plotly.com/python/filled-area-plots/), [`bar`](https://plotly.com/python/bar-charts/),
    [`funnel`](https://plotly.com/python/funnel-charts/),
    [`timeline`](https://plotly.com/python/gantt/)
-   **Part-of-Whole**: [`pie`](https://plotly.com/python/pie-charts/),
    [`sunburst`](https://plotly.com/python/sunburst-charts/),
    [`treemap`](https://plotly.com/python/treemaps/),
    [`funnel_area`](https://plotly.com/python/funnel-charts/)
-   **1D Distributions**: [`histogram`](https://plotly.com/python/histograms/),
    [`box`](https://plotly.com/python/box-plots/), [`violin`](https://plotly.com/python/violin/),
    [`strip`](https://plotly.com/python/strip-charts/)
-   **2D Distributions**: [`density_heatmap`](https://plotly.com/python/2D-Histogram/),
    [`density_contour`](https://plotly.com/python/2d-histogram-contour/)
-   **Matrix Input**: [`imshow`](https://plotly.com/python/imshow/)
-   **3-Dimensional**: [`scatter_3d`](https://plotly.com/python/3d-scatter-plots/),
    [`line_3d`](https://plotly.com/python/3d-line-plots/)
-   **Multidimensional**: [`scatter_matrix`](https://plotly.com/python/splom/),
    [`parallel_coordinates`](https://plotly.com/python/parallel-coordinates-plot/),
    [`parallel_categories`](https://plotly.com/python/parallel-categories-diagram/)
-   **Tile Maps**: [`scatter_mapbox`](https://plotly.com/python/scattermapbox/),
    [`line_mapbox`](https://plotly.com/python/lines-on-mapbox/),
    [`choropleth_mapbox`](https://plotly.com/python/mapbox-county-choropleth/),
    [`density_mapbox`](https://plotly.com/python/mapbox-density-heatmaps/)
-   **Outline Maps**: [`scatter_geo`](https://plotly.com/python/scatter-plots-on-maps/),
    [`line_geo`](https://plotly.com/python/lines-on-maps/),
    [`choropleth`](https://plotly.com/python/choropleth-maps/)
-   **Polar Charts**: [`scatter_polar`](https://plotly.com/python/polar-chart/),
    [`line_polar`](https://plotly.com/python/polar-chart/),
    [`bar_polar`](https://plotly.com/python/wind-rose-charts/)
-   **Ternary Charts**: [`scatter_ternary`](https://plotly.com/python/ternary-plots/),
    [`line_ternary`](https://plotly.com/python/ternary-plots/)

### High-Level Features

The Plotly Express API in general offers the following features:

-   **A single entry point into `plotly`**: just
    `import plotly.express as px` and get access to [all the plotting
    functions](https://plotly.com/python-api-reference/plotly.express.html),
    plus [built-in demo datasets under
    `px.data`](https://plotly.com/python-api-reference/generated/plotly.data.html#module-plotly.data)
    and [built-in color scales and sequences under
    `px.color`](https://plotly.com/python-api-reference/generated/plotly.colors.html#module-plotly.colors).
    Every PX function returns a `plotly.graph_objects.Figure` object, so
    you can edit it using all the same methods like [`update_layout` and
    `add_trace`](https://plotly.com/python/creating-and-updating-figures/#updating-figures).
-   **Sensible, Overrideable Defaults**: PX functions will infer
    sensible defaults wherever possible, and will always let you
    override them.
-   **Flexible Input Formats**: PX functions [accept input in a variety
    of formats](https://plotly.com/python/px-arguments/), from `list`s and `dict`s to
    [long-form or wide-form Pandas `DataFrame`s](https://plotly.com/python/wide-form/) to
    [`numpy` arrays and `xarrays`](https://plotly.com/python/imshow/) to [GeoPandas
    `GeoDataFrames`](https://plotly.com/python/maps/).
-   **Automatic Trace and Layout configuration**: PX functions will
    create one [trace](https://plotly.com/python/figure-structure) per animation frame for
    each unique combination of data values mapped to discrete color,
    symbol, line-dash, facet-row and/or facet-column. Traces\'
    `legendgroup` and `showlegend` attributed are set such that only one
    legend item appears per unique combination of discrete color, symbol
    and/or line-dash. Traces are automatically linked to a
    correctly-configured [subplot of the appropriate
    type](https://plotly.com/python/figure-structure).
-   **Automatic Figure Labelling**: PX functions label axes, legends and
    colorbars based in the input `DataFrame` or `xarray`, and provide
    [extra control with the `labels`
    argument](https://plotly.com/python/styling-plotly-express/).
-   **Automatic Hover Labels**: PX functions populate the hover-label
    using the labels mentioned above, and provide [extra control with
    the `hover_name` and `hover_data`
    arguments](https://plotly.com/python/hover-text-and-formatting/).
-   **Styling Control**: PX functions [read styling information from the
    default figure template](https://plotly.com/python/styling-plotly-express/), and
    support commonly-needed [cosmetic controls like `category_orders`
    and `color_discrete_map`](https://plotly.com/python/styling-plotly-express/) to
    precisely control categorical variables.
-   **Uniform Color Handling**: PX functions automatically switch
    between [continuous](https://plotly.com/python/colorscales/) and [categorical
    color](https://plotly.com/python/discrete-color/) based on the input type.
-   **Faceting**: the 2D-cartesian plotting functions support [row,
    column and wrapped facetting with `facet_row`, `facet_col` and
    `facet_col_wrap` arguments](https://plotly.com/python/facet-plots/).
-   **Marginal Plots**: the 2D-cartesian plotting functions support
    [marginal distribution plots](https://plotly.com/python/marginal-plots/) with the
    `marginal`, `marginal_x` and `marginal_y` arguments.
-   **A Pandas backend**: the 2D-cartesian plotting functions are
    available as [a Pandas plotting backend](https://plotly.com/python/pandas-backend/) so
    you can call them via `df.plot()`.
-   **Trendlines**: `px.scatter` supports [built-in trendlines with
    accessible model output](https://plotly.com/python/linear-fits/).
-   **Animations**: many PX functions support [simple animation support
    via the `animation_frame` and `animation_group`
    arguments](https://plotly.com/python/animations/).


In [None]:
import pandas as pd

import plotly.express as px
import plotly.graph_objs as go


## Prepare data

We will use data from 

* previous lectures
* https://www.kaggle.com/mylesoneill/world-university-rankings
* https://github.com/kelvins/US-Cities-Database


### Read and preprocess laser incidents data

In [None]:
territories = pd.read_csv("data/us_state_population.csv")

In [None]:
# You can skip executing this cell. The file is included in the repository.

available_reports = (
    "https://www.faa.gov/about/initiatives/lasers/laws/media/Laser_Report_2020.xlsx",
    "https://www.faa.gov/about/initiatives/lasers/laws/media/Laser_Report_2019_final.xlsx",
    "https://www.faa.gov/about/initiatives/lasers/laws/media/Laser_Report_2018_final.xlsx",
    "https://www.faa.gov/about/initiatives/lasers/laws/media/reported_laser_illumination_incidents_CY_2017.xlsx",
    "https://www.faa.gov/about/initiatives/lasers/laws/media/reported_laser_illumination_incidents_CY_2016.xlsx",
    "https://www.faa.gov/about/initiatives/lasers/laws/media/reported_laser_illumination_incidents_CY_2015.xls",
    # the columns here are very different, skip it for this analysis
    # "https://www.faa.gov/about/initiatives/lasers/laws/media/laser_incidents_2010-2014.xls",
)

# *** Uncomment to regenerate the source file:
# laser_incidents_raw = pd.concat((pd.read_excel(url) for url in available_reports), axis=0, ignore_index=True)
# laser_incidents_raw.to_csv("data/laser_incidents_2015-2020_raw.csv")

In [None]:
laser_incidents_raw = pd.read_csv("data/laser_incidents_2015-2020_raw.csv")
# use only meaningful columns, not Unnamed ...
laser_incidents = laser_incidents_raw[
    [column for column in laser_incidents_raw.columns if "Unnamed" not in column]
]
# there are "State" and "State " columns: merge them into a single one
laser_incidents = laser_incidents.assign(
    State=laser_incidents["State"].where(
        laser_incidents["State"].notna(), laser_incidents["State "]
    )
)
# strip white space from state names
laser_incidents = laser_incidents.assign(State=laser_incidents["State"].str.strip())
# drop columns we do not need any more ("Aviation Altitude" are all NA values)
laser_incidents = laser_incidents.drop(columns=["State ", "Aviation Altitude"])

In [None]:
import collections

# needed below for mapping string values to bool
# "yes" and "no" are defined, anything else becomes NA
value_to_bool = collections.defaultdict(lambda: pd.NA)
value_to_bool["yes"] = True
value_to_bool["no"] = False


# try to convert to better dtypes
laser_incidents = laser_incidents.convert_dtypes()
# convert some columns manually with some preprocessing
laser_incidents = laser_incidents.assign(
    **{
        "Incident Time": laser_incidents["Incident Time"].astype("string"),
        "Altitude": pd.to_numeric(laser_incidents["Altitude"], errors="coerce"),
        "Laser Color": laser_incidents["Laser Color"].str.strip().str.lower(),
        "Injury": laser_incidents["Injury"]
            .str.lower()
            .str.strip()
            .map(value_to_bool)
            .astype("boolean"),
    }
)

# make the suspicious times NA
laser_incidents.loc[
    laser_incidents["Incident Time"].astype("string").str.len() > 4, "Incident Time"
] = pd.NA

# using string manipulation and time deltas to construct full time stamps (date + time)
laser_incidents = laser_incidents.assign(
    timestamp = pd.to_datetime(laser_incidents["Incident Date"])
    + pd.to_timedelta(
        laser_incidents["Incident Time"].str[:-2]
        + "h"
        + laser_incidents["Incident Time"].str[-2:]
        + "min",
        errors="coerce",
    )
)

In [None]:
laser_incidents = pd.merge(
    laser_incidents, territories, left_on="State", right_on="Territory", how="inner"
)

### Merge with US cities for geo location data

In [None]:
us_cities = pd.read_csv("us_cities.csv")
us_cities.sample(10)

In [None]:
laser_incidents_geo = pd.merge(
    laser_incidents,
    us_cities,
    left_on=["City", "State"],
    right_on=["CITY", "STATE_NAME"],
    how="left",
).drop(columns=["ID", "STATE_CODE", "STATE_NAME", "COUNTY", "CITY"])
laser_incidents_geo.sample(10)

In [None]:
laser_incidents_geo = laser_incidents_geo.assign(
    YearMonth=laser_incidents_geo.apply(
        lambda row: "{year:.0f}-{month:02.0f}".format(
            year=row["timestamp"].year, month=row["timestamp"].month
        ),
        axis="columns",
    )
)

incidents_w_time_index = laser_incidents_geo.set_index(["timestamp"])

incidents_w_time_index.head()

### Read university rating data

In [None]:
universities_times_data = pd.read_csv("data/universities/timesData.csv")
universities_times_data = universities_times_data.assign(
    total_score=pd.to_numeric(universities_times_data["total_score"], errors="coerce"),
    international=pd.to_numeric(universities_times_data["international"], errors="coerce"),
    income=pd.to_numeric(universities_times_data["income"], errors="coerce"),
    num_students=pd.to_numeric(
        universities_times_data["num_students"].str.replace(",", ""), errors="coerce"
    ),
    year=universities_times_data["year"].astype("category"),
)
universities_times_data

## Basic plots

### Relations via scatter plots

In [None]:
px.scatter(
    universities_times_data.loc[universities_times_data["year"] == 2016],
    x="total_score",
    y="world_rank",
#     hover_data=universities_times_data.columns,
)

In [None]:
px.scatter(
    universities_times_data,
    x="total_score", 
    y=["citations", "research", "teaching", "international", "income"],
)

### Distribution plots

In [None]:
px.histogram(universities_times_data, x="citations")

In [None]:
px.box(universities_times_data, x="year", y="citations")

In [None]:
px.scatter(
    universities_times_data.dropna(subset=["num_students"]), 
    x="citations", 
    y="total_score",
    animation_frame="year",
    size="num_students",
    color="research",
)

### Using geo locations

In [None]:
px.scatter_geo(
    incidents_w_time_index.dropna(subset=["Laser Color"]),
    lon="LONGITUDE",
    lat="LATITUDE",
#     scope="usa",
#     opacity=0.4,
#     color="Laser Color",
)

### Timeseries plots

In [None]:
by_loc = incidents_w_time_index.groupby(
    [
        "City",
        "State",
        incidents_w_time_index.index.year,
        incidents_w_time_index.index.month,
    ]
)

by_loc_agg = by_loc.agg(
    count=("Incident Date", "count"),
    injuries=("Injury", "sum"),
    YearMonth=("YearMonth", "first"),
    LATITUDE=("LATITUDE", "first"),
    LONGITUDE=("LONGITUDE", "first"),
    Population=("Population", "first"),
    AltitudeMean=("Altitude", "mean"),
    AltitudeStd=("Altitude", "std"),
).rename_axis(index=["City", "State", "Year", "Month"]).reset_index().convert_dtypes()

by_loc_agg

In [None]:
incidents_hourly = (
    incidents_w_time_index.notna()
    .any(axis="columns")
    .resample("1H")
    .count()
    .rename("incidents per hour")
).convert_dtypes()
incidents_daily = incidents_hourly.resample("1D").mean().convert_dtypes()
incidents_daily_filtered = incidents_daily.rolling("28D").mean().convert_dtypes()

In [None]:
px.histogram(incidents_hourly, log_y=True)

In [None]:
px.line(incidents_hourly.sort_index())

In [None]:
px.line(incidents_daily.to_frame().join(incidents_daily_filtered.rename("incidents filtered")))

### Exporting figures and notebooks

In [None]:
fig = px.line(incidents_daily.to_frame().join(incidents_daily_filtered.rename("incidents filtered")))
fig.write_image("incidents_per_hour.pdf", width=2800, height=1600, scale=5)

In [None]:
fig.write_html("incidents_per_hour.html")

In [None]:
!jupyter nbconvert 190_plotly.ipynb --to html --no-input

## Challenge of the day

1. Choose either of the datasets we worked with today.
2. Create one shiny, representative figure using whatever technology you prefer.
3. Export it as a stand-alone file (html for plotly) and send it privately via Slack to Jakub or Jan.
4. Vote anonymously for the best one :)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=77a5caea-ff40-471d-8b4b-98dc66dd30c3' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>