In [None]:
# Install dependencies. Need to make sure all dependencies are listed
!pip install numpy matplotlib pandas tabulate altair vega_datasets

# Introduction

Jupyter Notebook is a fantastic tool for data exploration, enabling analysts to write documents that contain software code, computational output, formatted text and even multimedia \cite{perkel2018jupyter}. Visualization is a big component of the data exploration process, and can be frequently found in Jupyter Notebooks: for example, a recent study of public github repositories found that *matplotlib* was the second most imported package in the notebook environment \cite{pimentel2019large}.

When datasets are too large or too complex, interactive visualization becomes a useful tool in exploratory data analysis. Interactive visualizations can enable, among many others, the display of information at multiple levels of detail, the exploration of data using coordinated views, and the dynamic change of the charts to focus on the user's interests \cite{munzner2014visualization}. While Jupyter Notebooks can contain interactive visualizations, the vast majority of charts produced in the environment are static.

In this paper, we will present three simple and powerful approaches in which data scientists can create interactive visualizations in Jupyter Notebooks: *Matplotlib Callbacks*, *Visualization Toolkits* and *Custom HTML Embedding*. The three approaches offer a number of benefits and drawbacks that need to be considered by the developer in order for them to make an informed decision about their visualization project. By the end of this paper, the reader will have a good understanding of the three methods, and will be able to select an implementation approach depending on the level of interaction, customization and data flow desired.

This paper is written entirely in a Jupyter Notebook, which can be run by the interested reader in order to interact with the visualizations and explore the source code in more detail.

# Interactive Visualization in Jupyter Notebooks

*matplotlib* \cite{hunter2007matplotlib} is the most popular general purpose visualization library for Jupyter Notebooks \cite{pimentel2019large}. This tool enables the creation static, animated, and interactive visualizations, that can be rendered directly as the output of notebook cells. The library can render visualizations in different formats, including static (raster, SVG, etc.) and interactive.  Although user interaction is supported in *matplotlib*, this is not the focus of the library: while there is support for click and keypress, interactions such drag-and-drop, tooltips, and cross-filtering, frequently supported in visualization tools \cite{munzner2014visualization}, are not directly supported.

In order to enable the creation of more interactive visualizations in Python and Jupyter Notebooks, many open source Visualization Toolkits have been developed. Among those, Perkel and others \cite{perkel2018data} highlight *Plotly* \cite{plotly2020}, *Bokeh* \cite{bokeh2020} and *Altair* \cite{vanderplas2018altair}. These libraries are built on top of web technologies, and create visualizations that can be seen in web browsers. Sintax-wise, *Plotly* and *Bokeh* are very similar to *matplotlib*. However, both libraries have been developed with a focus on user interaction, enabling the creation of web-based dashboards that combine interactive widgets and charts, and support multiple user inputs, including click, drag-and-drop, tooltips, selection, crossfilter, and bidirectional communication with Python via callbacks. *Altair* differs from the aforementioned libraries in the way visualizations are defined: it uses a declarative specification that ports  VEGA-Lite \cite{satyanarayan2016vega} grammar to Python. A wide range of interactive visualizations can be expressed using a small number of Altair primitives. However, the visualizations cannot communicate back with Python, therefore the results of user interactions cannot be used in further computations.

There might be cases when a visualization cannot be created using any off-the-shelf Python libraries. When this happens, the developer/researcher has the option to write their own visualization using a web framework, and embed this visualization in Jupyter Notebooks. This option offers the most flexibility, as the visualization can be fully customized, interactions can be scripted on demand, and even animations can be implemented. Javascript libraries such as React \cite{fedosejev2015react} and D3 \cite{bostock2011d3} can be used to facilitate the implementation of custom visualizations.

Some examples of custom Javascript visualizations in Jupyter Notebooks include libraries for scientific visualization \cite{breddels2020ipygany, breddels2020ipyvolume}, sports analytics \cite{lage2016statcast} and machine learning \cite{nori2019interpretml, ono2020pipelineprofiler}. IPyGany \cite{breddels2020ipygany} and IPyVolume \cite{breddels2020ipyvolume} enable the visualization of 3D meshes and volumes in notebooks, respectively. StatCast Dashboard \cite{lage2016statcast} supports the interactive query, filter, and visualization of spatiotemporal baseball trajectories. InterpretML \cite{nori2019interpretml} is a python package that contains a collection of algorithms for explaining and visualizing Machine Learning (ML) models, including LIME, SHAP and Partial Dependency Plots. Finally, PipelineProfiler \cite{ono2020pipelineprofiler} is a tool that enables users to explore ML pipelines produced by Automatic Machine Learning systems. 

Table 1 summarizes the different approaches to add interactive visualizations in Jupyter Notebooks. The approaches are classified in terms of interaction, type of output, level of customization, support for dashboards, and data flow. When creating a new visualization, we believe these properties should be taken into consideration.

In [None]:
import pandas as pd
from IPython.display import display, Markdown
pd.set_option('display.notebook_repr_html', True)
librarySummary = pd.DataFrame([
    {
        'Library': 'matplotlib',
        'Interaction': 'Low',
        'Output': 'Flexible',
        'Customization': 'Low',
        'Dashboard': 'No',
        'Data Flow': 'Bidirectional'
    },
    {
        'Library': 'Plotly',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'Low',
        'Dashboard': 'Yes',
        'Data Flow': 'Bidirectional'
    },
    {
        'Library': 'Bokeh',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'Low',
        'Dashboard': 'Yes',
        'Data Flow': 'Bidirectional'
    },
    {
        'Library': 'Altair',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'Low',
        'Dashboard': 'Yes',
        'Data Flow': 'Python => Javascript'
    },
    {
        'Library': 'Custom JS',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'High',
        'Dashboard': 'Yes',
        'Data Flow': 'Bidirectional'
    }
])
#display(Markdown("**Table 1**:  Summary of Interactive Visualization Approaches in Jupyter Notebook "))
#display(Markdown(librarySummary.to_markdown(showindex=False)))
print(librarySummary.to_markdown(showindex=False))

**Table 1**:  Summary of Interactive Visualization Approaches in Jupyter Notebook 

| Library    | Interaction   | Output   | Customization   | Dashboard   | Data Flow            |
|:-----------|:--------------|:---------|:----------------|:------------|:---------------------|
| matplotlib | Low           | Flexible | Low             | No          | Bidirectional        |
| Plotly     | High          | HTML     | Low             | Yes         | Bidirectional        |
| Bokeh      | High          | HTML     | Low             | Yes         | Bidirectional        |
| Altair     | High          | HTML     | Low             | Yes         | Python => Javascript |
| Custom JS  | High          | HTML     | High            | Yes         | Bidirectional        |

# Interactive Visualizations 3 Ways

In this section, we will show how to create interactive visualizations in Jupyter notebooks using three approaches discussed in the previous section: embedded *matplotlib* charts, *Altair* specifications, and custom Javascript libraries. Since the sintax of *Plotly* and *Bokeh* are so similar to *matplotlib*, we will not cover them in this paper. We encourage the interested reader to see their respective online documentations.

## Embedded Matplotlib Charts

In order to enable interactive *matplotlib* charts in the notebook environment, users need to activate this option using the *"%matplotlib notebook"* magic command [^1]. The produced charts will natively support pan and zoom operations, but can be configured to receive other types of user input, such as mouse click and key press, which can signal the run of user-defined callback functions [^2]. 

After a chart is created, for example, using *pyplot.scatter*, the user events can be captured by setting callback functions on the *canvas* using the method *mpl_connect*. Multiple events are available, including *button_press_event*, *button_release_event*, *key_press_event* and *key_release_event*.

[^1]: https://matplotlib.org/3.3.3/users/interactive.html
[^2]: https://matplotlib.org/3.3.3/users/event_handling.html

We show a minimal example below, where the visualization draws points on top of the user clicks.

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

fig, ax = plt.subplots(); # Creating an empty chart
plt.xlim([0, 10]); plt.ylim([0, 10]) # Setting X and Y axis limits
def onclick(event): # Callback function
    ax.scatter(event.xdata, event.ydata, color='steelblue') # Draw a point on top of the user click position.
    
cid = fig.canvas.mpl_connect('button_press_event', onclick) # Callback setup

![](Images/InteractiveMatplotlib.png)

**Figure 1**: Interactive Matplotlib chart, where the user can click on the canvas in order to add a point at that position. The interactive chart also enables pan and zoom operations by default. 

As we saw, this approach can add *click* interactions to a chart with a few lines of code. However, we are limited to the types of charts and interactions supported by matplotlib. When these options are not enough, the developer might need to consider other libraries, such as *Altair*, or creating their own visualization in Javascript. 

## Altair Specification

Altair enables the creation of interactive visualizations by using a pythonic port of the Vega-Lite specification \cite{vanderplas2018altair}. Altair uses a declarative visualization paradigm: instead of telling the library every step of how to draw a chart, the programmer specifies the data and the visual encodings, and the library takes care of the rest.

In order to create a chart, the developer needs to have a *Pandas DataFrame* containing the data to be visualized. An *Altair.Chart* object needs to be created, with the corresponding *DataFrame* passed as a parameter. Next, an *encoding* and a *mark* needs to be selected. *Encodings* tell *Altair* how the *DataFrame* columns should be mapped to visual attributes. Meanwhile, *marks* specify how the attributes should be represented on the plot (for example, as a circle, line, area chart, etc).

We show a basic example of an Altair scatter plot with the Iris dataset. The dataset contains information regarding 150 Iris flowers, with measurements of length and width of the plant, as well as the flower species. In the chart below, the scatterplot shows petalLength and petalWidth in the cartesian plane. Color is used to show the flower species. Finally, data points can be hovered to show additional information as a tooltip (notice that this was not possible in *matplotlib*). In the code below, *mark_circle* is used to indicate the type of chart desired (scatter plot with circles) and the *encode* function specify the chart encoding, in this case, what columns are mapped to the *x* and *y* positions, *color* of the circle, and *tooltip* on hover.

In [None]:
import altair as alt
from vega_datasets import data

df = data.iris()

alt.Chart(df).mark_circle().encode(
    x='petalLength',
    y='petalWidth',
    color='species',
    tooltip=['sepalLength', 'sepalWidth', 'petalLength', 'petalWidth', 'species']
).interactive()

![](Images/AltairIris.png)

**Figure 2**: Interactive Altair scatter plot of the Iris dataset. The chart displays a tooltip with flower information on mouse hover. The library also enables pan and zoom.

For more complex examples, please see the Altair documentation. There are many chart possibilities, and charts can be combined to create interactive dashboards. As an example of dashboard, this page[^3] shows how to create a chart with three distinct views and crossfilter capabilities.

One disadvantage *Altair* is that we cannot have access to data generated by the user in Python. For example, we would not be able to receive data points selected in Altair in the next Jupyter cell. Such capability exists in *matplotlib* and in custom Javascript visualizations, because we can set up callbacks between Javascript and Python.

[^3]: https://altair-viz.github.io/gallery/interactive_layered_crossfilter.html

## Custom Javascript Visualizations

Displaying custom Javascript visualizations in Jupyter Notebook can be done in a few lines of code. 
The package *Ipython.display*[^4] can be used to embed any HTML code in notebook cells. The HTML may contain both CSS and Javascript, which affords flexible, interactive and customizable visualizations to be created.

[^4]: https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html

In order to embed the the visualization in a cell, one needs to create a string variable containing all the HTML, Javascript, data and CSS code needed for the visualization. Since writing everything in a Jupyter cell can be too cumbersome, one can write the visualization in a code editor and then load the document in Python. Javascript Bundlers, such as Webpack \cite{webpack2020}, can convert multiple HTML, Javascript and CSS files into a single file, facilitating this process.

In the following, we show a minimal example of HTML embedding in a jupyter cell. The code adds a single button to the page, which when clicked displays an alert box with the message "Hello World".

In [None]:
from IPython.display import display, HTML
html_string = """<button onclick="alert('Hello World')">Hello World</button>"""
display(HTML(html_string))

In order to create the HTML string, formatting methods can be used. For example, a base string may contain the container *div* where the visualization is going to be inserted, a *script* tag where the bundled code is going to be added, and a function call to plot the visualization with the provided data in JSON format. The *string.format* function can be used to add the remaning information to the string, filling in the placeholders.

The following code snippet shows how to embed a Javascript library and CSV data in the HTML string. This example visualization shows an interactive chart that displays baseball game trajectories. The user can control the progress of the play using a slider. Furthermore, the user can select a player or the ball to edit it's trajectory (either clicking on the field, or the button "Clear trajectory"). This visualization is an adaptation of the Baseball annotation system HistoryTracker \cite{ono2019historytracker}.

In [None]:
from IPython.display import display, HTML
import pandas as pd

with open("./BaseballVisualizer/build/baseballvisualizer.js", "r") as f:
    baseball_visualizer_bundle = f.read()

play = pd.read_csv("./play_annotated.csv", sep=";")
    
html = """
<html>
<body>
<div id="container"/>
<script type="application/javascript">
    {bundled_code}
    baseballvisualizer.renderBaseballAnnotator("#container", {data});
</script>
</body>
</html>
""".format(bundled_code=baseball_visualizer_bundle, data={'tracking': play.to_json(orient="records")})

display(HTML(html))

![](Images/CustomJavascriptHistorytracker.png)

**Figure 3**: Custom Javascript visualization of Baseball plays. The user can: (1) animate the play using the scrollbar. Select a position to edit (in the picture, the BALL is selected) and (2) clear the trajectory. Annotate the positions of the ball when it is thrown (3) and hit to the center field (4).

Callbacks can be set up in both Javascript and Python in order to send data from one to the other. The *comm* API [^5] can be used to do so. For example, if a sports scientist is interested in modifying a Baseball trajectory and running some further analysis in Python, he might set up this bidirectional communication.

A minor change needs to happen in both the Javascript and the Python code. In Javascript, a new *comm* object needs to be created with an identifier (in this example, *submit_trajectory*). Then, the *comm* object is used to *send* a message to Python, containing the edited trajectory data. Finally, when Python acknowledges the message, we display an alert.

```javascript
function submitTrajectoryToServer(trajectory){
    let comm = window.Jupyter.notebook.kernel.comm_manager.new_comm('submit_trajectory')
    // Send trajectory to Python
    comm.send({'trajectory': trajectory})

    // Receive message from Python
    comm.on_msg(function(msg) {
        alert("Trajectory received by Jupyter Notebook.")
    });
}
```

The Python code needs to expect a message from Javascript. In order to set this up, we use the *register_target* function, passing to it the communication identifier and the Python callback function. In the following code snippet, this callback will store the trajectory in the variable *received_trajectory*.

[^5]: https://jupyter-notebook.readthedocs.io/en/stable/comms.html

In [None]:
received_trajectory = []
def receive_trajectory(comm, open_msg):
    # comm is the kernel Comm instance
    # Register handler for future messages
    @comm.on_msg
    def _recv(msg):
        global received_trajectory 
        # Use msg['content']['data'] for the data in the message
        received_trajectory = msg['content']['data']['trajectory']
        print(received_trajectory)
        comm.send({'received': True})

get_ipython().kernel.comm_manager.register_target('submit_trajectory', receive_trajectory)

Finally, after the user clicks the "Submit" button, the trajectory can be retrieved in the Jupyter notebook, analyzed and saved to disk.

In [None]:
pd.DataFrame(received_trajectory).to_csv("edited_trajectory.csv", index=False)

# Conclusion

In this paper, we have presented three ways to create interactive visualizations in Jupyter Notebooks: *matplotlib* charts, *Altair* specifications and custom Javascript visualizations. We hope that this document will help developers to select the most appropriate method when they create their own interactive charts.