# Interactive Visualization in Jupyter Notebooks

Jorge Piazentin Ono, Juliana Freire, Claudio Silva

New York University

Jupyter Notebook is a fantastic tool for data exploration, enabling analysts to write documents that contain software code, computational output, formatted text and data visualizations. Visualization is a big component of the data exploration process, and can be frequently found in Jupyter Notebooks: for example, a recent study of public GitHub repositories found that *matplotlib* was the second most imported package in the notebook environment \cite{pimentel2019large}.

# Interactive Visualization in Jupyter Notebooks

When datasets are too large or too complex, interactive visualization becomes a useful tool in exploratory data analysis. Interactive visualizations can enable, among many others, the display of information at multiple levels of detail, the exploration of data using coordinated views, and the dynamic change of the charts to focus on the user's interests. While Jupyter Notebooks can contain interactive visualizations, the vast majority of charts produced in the environment are static.

In this paper, we will present three simple and powerful approaches in which data scientists can create interactive visualizations in Jupyter Notebooks: *matplotlib* callbacks, visualization toolkits and custom HTML embedding. The three approaches offer a number of benefits and drawbacks that need to be considered by the developer in order for them to make an informed decision about their visualization project. By the end of this paper, the reader will have a good understanding of the three methods, and will be able to select an implementation approach depending on the level of interaction, customization and data flow desired.

**Matplotlib Callbacks.** The *matplotlib* library \cite{hunter2007matplotlib} is the most popular general purpose visualization package for Jupyter Notebooks \cite{pimentel2019large}. This tool enables the creation static, animated, and interactive visualizations, that can be rendered directly as the output of notebook cells.  However, the available user interactions are limited: there is support for click and keypress events, but drag-and-drop, tooltips, and cross-filtering, frequently supported in visualization tools, are not directly provided. To expand the possible user interactions, [*ipywidgets*](https://ipywidgets.readthedocs.io) can be used. *ipywidgets* is a library that provides HTML form inputs in the Jupyter interface, including  drop down menus, text boxes, and sliders.

**Visualization Toolkits.** In order to enable the creation of more interactive visualizations in Python and Jupyter Notebooks, many open source visualization toolkits have been developed. Among those, Perkel and others \cite{perkel2018data} highlight [*Plotly*](https://plotly.com/), [*Bokeh*](https://bokeh.org/) and [*Altair*](https://altair-viz.github.io/). These libraries are built on top of web technologies, and create visualizations that can be seen in web browsers. Syntax-wise, *Plotly* and *Bokeh* are very similar to *matplotlib*. However, both libraries have been developed with a focus on user interaction, enabling the creation of web-based dashboards that combine interactive widgets and charts, and support multiple user inputs, including click, drag-and-drop, tooltips, selection, crossfilter, and bidirectional communication with Python via callbacks. *Altair* \cite{vanderplas2018altair} differs from the aforementioned libraries in the way visualizations are defined: it uses a declarative specification that ports  VEGA-Lite \cite{satyanarayan2016vega}, a data visualization grammar, to Python. A wide range of interactive visualizations can be expressed using a small number of Altair primitives, making this library very flexible. However, the produced visualizations cannot communicate with Python and therefore the results of user interactions cannot be used in further computations.

**Custom HTML Embedding.** There might be cases when a visualization cannot be created using any off-the-shelf Python libraries. When this happens, the developer/researcher has the option to write their own visualization using a web framework, and embed this visualization in Jupyter Notebooks. This option offers the most flexibility, as the visualization can be fully customized and interactions can be scripted on demand. JavaScript libraries such as [React](https://reactjs.org/) and [D3](https://d3js.org/) can be used to facilitate the implementation of custom visualizations.

Table 1 summarizes the different approaches to add interactive visualizations in Jupyter Notebooks. The approaches are classified in terms of interaction, type of output, level of customization, support for dashboards, and data flow. When creating a new visualization, we believe these properties should be taken into consideration.

**Table 1**:  Summary of Interactive Visualization Approaches in Jupyter Notebook 

| Library        | Interaction   | Output   | Customization   | Dashboard   | Data Flow                 |
|:---------------|:--------------|:---------|:----------------|:------------|:--------------------------|
| matplotlib     | Low           | Flexible | Low             | Limited     | Bidirectional             |
| Plotly         | High          | HTML     | Low             | Yes         | Bidirectional             |
| Bokeh          | High          | HTML     | Low             | Yes         | Bidirectional             |
| Altair         | High          | HTML     | Low             | Yes         | Python &#8594; JavaScript |
| HTML Embedding | High          | HTML     | High            | Yes         | Bidirectional             |

# Interactive Visualizations in Action

In this section, we will show how to create interactive visualizations in Jupyter notebooks using three approaches discussed in the previous section: *matplotlib* charts, *Altair* specifications, and custom HTML visualizations. Since the syntax of *Plotly* and *Bokeh* are very similar to *matplotlib*, we will not cover them in this paper. We refer the interested reader to their online documentations.

## Matplotlib with Callbacks

In order to enable interactive *matplotlib* charts in the notebook environment, users need to activate this option using the *"%matplotlib notebook"* magic command [^1]. The produced charts will natively support pan and zoom operations, but can be configured to receive other types of user input, such as mouse click and key press, which can signal the run of user-defined callback functions [^2]. 

After a chart is created, for example, using *pyplot.scatter*, the user events can be captured by setting callback functions on the *canvas* using the method *mpl_connect*. Multiple events are available, including *button_press_event*, *button_release_event*, *key_press_event* and *key_release_event*.

[^1]: https://matplotlib.org/3.3.3/users/interactive.html
[^2]: https://matplotlib.org/3.3.3/users/event_handling.html

We show a minimal example below, where the visualization draws points on top of the user clicks. The resulting visualization in shown in Figure 1.

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

fig, ax = plt.subplots(); # Creating an empty chart
plt.xlim([0, 10]); plt.ylim([0, 10]) # Setting X and Y axis limits
def onclick(event): # Callback function
    ax.scatter(event.xdata, event.ydata, color='steelblue') # Draw a point on top of the user click position.
    
cid = fig.canvas.mpl_connect('button_press_event', onclick) # Callback setup

![Matplotlib visualization where user can annotate points](Images/InteractiveMatplotlib.png)

**Figure 1**: Interactive Matplotlib chart, where the user can click on the canvas in order to add a point at that position. The interactive chart also enables pan and zoom operations by default. 

As we saw, this approach can add *click* interactions to a chart with a few lines of code. However, we are limited to the types of charts and interactions supported by matplotlib. When these options are not enough, the developer might need to consider other libraries, such as *Altair*, or creating their own visualization in JavaScript. 

## Altair Specification

Altair enables the creation of interactive visualizations by using a pythonic port of the Vega-Lite specification \cite{vanderplas2018altair}. Altair uses a declarative visualization paradigm: instead of telling the library every step of how to draw a chart, the programmer specifies the data and the visual encodings, and the library takes care of the rest.

In order to create a chart, the developer needs to have a *Pandas DataFrame* containing the data to be visualized. An *Altair.Chart* object needs to be created, with the corresponding *DataFrame* passed as a parameter. Next, an *encoding* and a *mark* needs to be selected. *Encodings* tell *Altair* how the *DataFrame* columns should be mapped to visual attributes. Meanwhile, *marks* specify how the attributes should be represented on the plot (for example, as a circle, line, area chart, etc.).

We show a basic example of an Altair scatter plot with the Iris dataset (Figure 2). The dataset contains information regarding 150 Iris flowers, with measurements of length and width of the plant, as well as the flower species.  Data points can be hovered to show additional information as a tooltip (notice that this was not possible in *matplotlib*). In the code below, *mark_circle* is used to indicate the type of chart desired (scatter plot with circles) and the *encode* function specify the chart encoding, in this case, what columns are mapped to the *x* and *y* positions, *color* of the circle, and *tooltip* on hover.

In [None]:
import altair as alt
from vega_datasets import data

df = data.iris()

alt.Chart(df).mark_circle().encode(
    x='petalLength',
    y='petalWidth',
    color='species',
    tooltip=['sepalLength', 'sepalWidth', 'petalLength', 'petalWidth', 'species']
).interactive()

![Altair visualization of the iris dataset](Images/AltairIris.png)

**Figure 2**: Interactive Altair scatter plot of the Iris dataset. The chart displays a tooltip with flower information on mouse hover. The library also enables pan and zoom.

For more complex examples, please see the Altair documentation. There are many chart possibilities, and graphics can be combined to create interactive dashboards with multiple views. For example, Figure 3 (1) shows an Altair dashboard that visualizes a flight dataset (example taken from the online documentation [^3]). (2) The user can select flights based on delay (in hours) and see how delay correlates with the other variables (distance and time).

![Altair Dashboard](Images/AltairDashboard.png)

**Figure 3**: Altair Dashboard showing a flight dataset. (1) Histograms for flight distance, delay and time. (2) The user selected a range of delay values and the system automatically updates the other views. 

One disadvantage *Altair* is that we cannot have access to data generated by the user in Python. For example, we would not be able to receive data points selected in Altair in the next Jupyter cell. Such capability exists in *matplotlib* and in custom JavaScript visualizations, because we can set up callbacks between JavaScript and Python.

[^3]: https://altair-viz.github.io/gallery/interactive_layered_crossfilter.html

In [None]:
# Altair Dashboard with crossfilter
# Example taken from the Altair documentation
# https://altair-viz.github.io/gallery/interactive_layered_crossfilter.html

import altair as alt
from vega_datasets import data

source = alt.UrlData(
    data.flights_2k.url,
    format={'parse': {'date': 'date'}}
)

brush = alt.selection(type='interval', encodings=['x'])

# Define the base chart, with the common parts of the
# background and highlights
base = alt.Chart().mark_bar().encode(
    x=alt.X(alt.repeat('column'), type='quantitative', bin=alt.Bin(maxbins=20)),
    y='count()'
).properties(
    width=160,
    height=130
)

# gray background with selection
background = base.encode(
    color=alt.value('#ddd')
).add_selection(brush)

# blue highlights on the transformed data
highlight = base.transform_filter(brush)

# layer the two charts & repeat
alt.layer(
    background,
    highlight,
    data=source
).transform_calculate(
    "time",
    "hours(datum.date)"
).repeat(column=["distance", "delay", "time"])

## HTML Embedding

Displaying custom visualizations in Jupyter Notebook can be done in a few lines of code. 
The package [*Ipython.display*](https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html) can be used to embed any HTML code in notebook cells. The HTML may contain both CSS and JavaScript, which affords flexible, interactive and customizable visualizations to be created.

In order to embed the visualization in a cell, one needs to create a string variable containing all the HTML, JavaScript, data and CSS code needed for the visualization. Since writing everything in a Jupyter cell can be too cumbersome, one can write the visualization in a code editor and then load the document in Python. JavaScript Bundlers, such as [Webpack](https://webpack.js.org/), can convert multiple HTML, JavaScript and CSS files into a single file, facilitating this process.

In the following, we show a minimal example of HTML embedding in a Jupyter cell. The code adds a single button to the page, which when clicked displays an alert box with the message "Hello World".

In [None]:
from IPython.display import display, HTML
html_string = """<button onclick="alert('Hello World')">Hello World</button>"""
display(HTML(html_string))

In order to create the HTML string, formatting methods can be used. For example, a base string may contain the container *div* where the visualization is going to be inserted, a *script* tag where the bundled code is going to be added, and a function call to plot the visualization with the provided data in JSON format. The *string.format* function can be used to add the remaining information to the string, filling in the placeholders.

The following code snippet shows how to embed a JavaScript library and CSV data in the HTML string. This example visualization shows an interactive chart that displays baseball game trajectories (Figure 4). The user can control the progress of the play using a slider. Furthermore, the user can select a player or the ball to edit its trajectory (either clicking on the field, or the button "Clear trajectory"). This visualization is an adaptation of the Baseball annotation system HistoryTracker \cite{ono2019historytracker}.

In [None]:
from IPython.display import display, HTML
import pandas as pd

with open("./BaseballVisualizer/build/baseballvisualizer.js", "r") as f:
    bundled_code = f.read()

play = pd.read_csv("./play_annotated.csv")
data = {'tracking': play.to_json(orient="records")}

html = """
<html>
<body>
<div id="container"/>
<script type="application/javascript">
    {bundled_code}
    baseballvisualizer.renderBaseballAnnotator("#container", {data});
</script>
</body>
</html>
""".format(bundled_code=bundled_code, data=data)

display(HTML(html))

![Baseball Visualization and Annotation tool for Jupyter Notebook](Images/CustomJavascriptHistorytracker.png)

**Figure 4**: Custom JavaScript visualization of Baseball plays. The user can: (1) animate the play using the slider. Select a position to edit (in the picture, the BALL is selected) and (2) clear the trajectory. Annotate the positions of the ball when it is thrown (3) and hit to the center field (4).

Callbacks can be set up in both JavaScript and Python using the *comm* API [^4] in order to send data from one to the other. For example, if a sports scientist is interested in modifying a Baseball trajectory and running some further analysis in Python, he might set up this bidirectional communication.

A minor change needs to happen in both the JavaScript and the Python code. In JavaScript, a new *comm* object needs to be created with an identifier (in this example, *submit_trajectory*). Then, the *comm* object is used to *send* a message to Python, containing the edited trajectory data. Finally, when Python acknowledges the message, we display an alert.

```javascript
function submitTrajectoryToServer(trajectory){
    let comm = window.Jupyter.notebook.kernel.comm_manager.new_comm('submit_trajectory')
    // Send trajectory to Python
    comm.send({'trajectory': trajectory})

    // Receive message from Python
    comm.on_msg(function(msg) {
        alert("Trajectory received by Jupyter Notebook.")
    });
}
```

The Python code needs to expect a message from JavaScript. In order to set this up, we use the *register_target* function, passing to it the communication identifier and the Python callback function. In the following code snippet, this callback will store the trajectory in the variable *received_trajectory*.

[^4]: https://jupyter-notebook.readthedocs.io/en/stable/comms.html

In [None]:
received_trajectory = []
def receive_trajectory(comm, open_msg):
    # comm is the kernel Comm instance
    # Register handler for future messages
    @comm.on_msg
    def _recv(msg):
        global received_trajectory 
        # Use msg['content']['data'] for the data in the message
        received_trajectory = msg['content']['data']['trajectory']
        print(received_trajectory)
        comm.send({'received': True})

get_ipython().kernel.comm_manager.register_target('submit_trajectory', receive_trajectory)

Finally, after the user clicks the "Submit" button, the trajectory can be retrieved in the Jupyter notebook, analyzed and saved to disk.

In [None]:
received_trajectory_df = pd.DataFrame(received_trajectory)
received_trajectory_df.to_csv("edited_trajectory.csv")

# To Look Further

There are many domain-specific visualization libraries for Jupyter notebook which use the techniques described in this paper. Figure 5 shows examples in three different domains, which illustrate how diverse and flexible these visualizations can be. The examples belong to the fields of scientific visualization \cite{breddels2020ipygany}, sports analytics \cite{lage2016statcast} and machine learning \cite{nori2019interpretml, ono2020pipelineprofiler}. 1) *ipygany* \cite{breddels2020ipygany} enables the visualization of 3D meshes in Jupyter notebooks. Users can zoom, rotate and apply effects to 3D meshes interactively using this library. 2) *StatCast Dashboard* \cite{lage2016statcast} supports the interactive query, filter, and visualization of spatiotemporal baseball trajectories and statistics. The library communicates with a baseball play database in order to execute complex queries involving player, teams, game dates and events. 3) *InterpretML* \cite{nori2019interpretml} is a Python package that contains a collection of algorithms for explaining and visualizing Machine Learning (ML) models, including LIME, SHAP and Partial Dependency Plots. Finally, 4) *PipelineProfiler* \cite{ono2020pipelineprofiler} contains visualizations that enable the exploration and comparison of ML pipelines produced by Automatic Machine Learning systems. 

In this paper, we have presented three ways to create interactive visualizations in Jupyter Notebooks: *matplotlib* charts, *Altair* specifications and custom HTML visualizations. We hope that this document will help developers to create their own interactive charts.  This paper is written entirely in a Jupyter Notebook, which can be run by the interested reader in order to interact with the visualizations and explore the source code in more detail. The notebook is available on the paper GitHub page: https://github.com/jorgehpo/PaperInteractiveJupyterVisualization

![Four domain-specific visualization libraries for Jupyter Notebooks](Images/CustomLibraries.png)

**Figure 5**: Domain-specific visualization libraries for Jupyter Notebook. 1) *ipygany*: visualization of 3D meshes. 2) *StatCast Dashboard*: visualization of Baseball trajectories and game statistics. 3) *InterpretML*: visualization of machine learning model explanations. 4) *PipelineProfiler*: visualization of machine learning pipelines produced by AutoML systems.