In [None]:
# Install dependencies
!pip install numpy matplotlib pandas tabulate

# Abstract



# Introduction

Jupyter Notebook is a fantastic tool for data exploration, enabling analysts to write documents that contain software code, computational output, formatted text and even multimedia \cite{perkel2018jupyter}. Visualization is a big component of the data exploration process, and can be frequently found in Jupyter Notebooks: for example, a recent study of public github repositories found that *matplotlib* was the second most imported package in the notebook environment \cite{pimentel2019large}.

When datasets are too large or too complex, interactive visualization becomes a useful tool in exploratory data analysis. Interactive visualizations can enable, among many others, the display of information at multiple levels of detail, the exploration of data using coordinated views, and the dynamic change of the charts to focus on the user's interests \cite{munzner2014visualization}. While Jupyter Notebooks can contain interactive visualizations, the vast majority of charts produced in the environment are static.

In this paper, we will present three simple and powerful approaches in which data scientists can create interactive visualizations in Jupyter Notebooks: *Matplotlib Callbacks*, *Visualization Toolkits* and *Custom HTML Embedding*. The three approaches offer a number of benefits and drawbacks that need to be considered by the developer in order for them to make an informed decision about their visualization project. By the end of this paper, the reader will have a good understanding of the three methods, and will be able to select an implementation approach depending on the level of interaction, customization and data flow desired.

# Interactive Visualization in Jupyter Notebooks

*matplotlib* \cite{hunter2007matplotlib} is the most popular general purpose visualization library for Jupyter Notebooks \cite{pimentel2019large}. This tool enables the creation static, animated, and interactive visualizations, that can be rendered directly as the output of notebook cells. The library can render visualizations in different formats, including static (raster, SVG, etc.) and interactive. In order to enable interactive charts in the notebook environment, users need to activate this option using the *"%matplotlib notebook"* magic command [^1]. The produced charts will natively support pan and zoom operations, but can be configured to receive other types of user input, such as mouse click and key press, which can run user-defined callback functions [^2]. Although user interaction is supported in *matplotlib*, this is not the focus of the library: interactions such drag-and-drop, tooltips, and cross-filtering are not directly supported, and need to be coded from scratch. 

[^1]: https://matplotlib.org/3.3.3/users/interactive.html
[^2]: https://matplotlib.org/3.3.3/users/event_handling.html

In order to enable the creation of more interactive visualizations in Python and Jupyter Notebooks, many open source libraries have been developed. Among those, Perkel and others \cite{perkel2018data} highlight *Plotly* \cite{plotly2020}, *Bokeh* \cite{bokeh2020} and *Altair* \cite{vanderplas2018altair}. These libraries are built on top of web technologies, and create visualizations that can be seen in web browsers. Sintax-wise, *Plotly* and *Bokeh* are very similar to *matplotlib*. However, both libraries have been developed with a focus on user interaction, enabling the creation of web-based dashboards that combine interactive widgets and charts, and support multiple user inputs, including click, drag-and-drop, tooltips, selection, crossfilter, and bidirectional communication with Python via callbacks. *Altair* differs from the aforementioned libraries in the way visualizations are defined: it uses a declarative specification that ports  VEGA-Lite \cite{satyanarayan2016vega} grammar to Python. A wide range of interactive visualizations can be expressed using a small number of Altair primitives. However, the visualizations cannot communicate back with Python, therefore the results of user interactions cannot be used in further computations.

There might be cases when a visualization cannot be created using any off-the-shelf Python libraries. When this happens, the developer/researcher has the option to write their own visualization using a web framework, and embed this visualization in Jupyter Notebooks. This option offers the most flexibility, as the visualization can be fully customized, interactions can be scripted on demand, and even animations can be implemented. Javascript libraries such as React \cite{fedosejev2015react} and D3 \cite{bostock2011d3} can be used to facilitate the implementation of custom visualizations.

Some examples of custom Javascript visualizations in Jupyter Notebooks include libraries for scientific visualization \cite{breddels2020ipygany, breddels2020ipyvolume}, sports analytics \cite{lage2016statcast} and machine learning \cite{nori2019interpretml, ono2020pipelineprofiler}. IPyGany \cite{breddels2020ipygany} and IPyVolume \cite{breddels2020ipyvolume} enable the visualization of 3D meshes and volumes in notebooks, respectively. StatCast Dashboard \cite{lage2016statcast} supports the interactive query, filter, and visualization of spatiotemporal baseball trajectories. InterpretML \cite{nori2019interpretml} is a python package that contains a collection of algorithms for explaining and visualizing Machine Learning (ML) models, including LIME, SHAP and Partial Dependency Plots. Finally, PipelineProfiler \cite{ono2020pipelineprofiler} is a tool that enables users to explore ML pipelines produced by Automatic Machine Learning systems. 

Table 1 summarizes the different approaches to add interactive visualizations in Jupyter Notebooks. The approaches are classified in terms of interaction, type of output, level of customization and support for dashboards.

In [None]:
import pandas as pd
from IPython.display import display, Markdown
pd.set_option('display.notebook_repr_html', True)
librarySummary = pd.DataFrame([
    {
        'Library': 'matplotlib',
        'Interaction': 'Low',
        'Output': 'Flexible',
        'Customization': 'Low',
        'Dashboard': 'No'
    },
    {
        'Library': 'Plotly',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'Low',
        'Dashboard': 'Yes'
    },
    {
        'Library': 'Bokeh',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'Low',
        'Dashboard': 'Yes'
    },
    {
        'Library': 'Altair',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'Low',
        'Dashboard': 'Yes'
    },
    {
        'Library': 'Custom JS',
        'Interaction': 'High',
        'Output': 'HTML',
        'Customization': 'High',
        'Dashboard': 'Yes'
    }
])
display(Markdown("**Table 1**:  Interactive Visualization in Jupyter Notebook "))
display(Markdown(librarySummary.to_markdown(showindex=False)))

# Interactive Visualizations 3 Ways

## Matplotlib Interactive Callbacks

## Javascript-based packages

## Custom Javascript Visualizations

Displaying custom Javascript visualizations in Jupyter Notebook can be done in a few lines of code. 
The package *Ipython.display*[^1] contains the function *display* and the class *HTML*, which together can embed any HTML code in notebook cells. The HTML may contain both CSS and Javascript, which affords flexible, interactive and customizable visualizations to be created.

[^1]: https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html

In order to embed the the visualization in a cell, one needs to create a string variable containing all the HTML, Javascript and CSS code needed for the visualization. Since writing everything in a Jupyter cell can be too cumbersome, one can write the visualization in a code editor, then load the document in python and display it. Javascript Bundlers, such as Webpack \cite{webpack_webpack_2020}, can convert multiple HTML, Javascript and CSS files into a single file, facilitating this process.

In the following, we show a minimal example of HTML embedding in a jupyter cell:

In [None]:
from IPython.display import display, HTML
html_string = """<button onclick="alert('Hello World')">Hello World</button>"""
display(HTML(html_string))

Two popular frameworks for writing Javascript visualizations are D3 \cite{bostock2011d3} and React \cite{fedosejev2015react}. Both frameworks manipulate the HTML page based on the data, which facilitates the design of new visualizations. Integrating these third party frameworks in custom Jupyter Notebook visualizations can be accomplished in two ways: 1) Using Webpack \cite{webpack2020} to bundle all the dependencies and visualization source code together in  a new library file. And 2) Importing the framework directly on a notebook cell using the *Require* command. 

The first approach has a higher setup cost, as it requires setting up a new NPM project and configuring Webpack to bundle the code and dependencies into a single javascript file. However, it allows the visualization to be divided in multiple files, which often results in a more organized code. For detailed instructions on how to compile a javascript library using Webpack, please see the documentation page[^2].

[^2]: https://webpack.js.org/guides/author-libraries/

Once the code is bundled into a single file, it can be used in Jupyter notebooks by reading the  file as a text document, and displaying it using the *display(HTML(source))* command.

Data can be passed from Python to javascript by using JSON strings. First, the data is converted to json. Then, the string can be sent to a variable in javascript and used normally.

The following code snippet shows how to embed a custom library that enables the display and edit of baseball play trajectories. The trajectory data comes from \cite{ono2019historytracker}

In [None]:
import pandas as pd

with open("./BaseballVisualizer/build/baseballvisualizer.js", "r") as f:
    baseball_visualizer_bundle = f.read()

play = pd.read_csv("./play_annotated.csv", sep=";")
    
html = """
<html>
<body>
<div id="container"/>

<script type="application/javascript">
{baseball_visualizer_bundle}
</script>

<script> 
     baseballvisualizer.renderBaseballAnnotator("#container", {data});
</script>

</body>
</html>
""".format(baseball_visualizer_bundle=baseball_visualizer_bundle, data={'tracking': play.to_json(orient="records")})

display(HTML(html))

In [None]:
received_trajectory = []
def receive_trajectory(comm, open_msg):
    # comm is the kernel Comm instance
    # Register handler for later messages
    @comm.on_msg
    def _recv(msg):
        global received_trajectory 
        # Use msg['content']['data'] for the data in the message
        received_trajectory = msg['content']['data']['trajectory']
        print(received_trajectory)
        comm.send({'received': True})

get_ipython().kernel.comm_manager.register_target('submit_trajectory', receive_trajectory)

In [None]:
pd.DataFrame(received_trajectory)

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt
img = plt.imread("/Users/jorgehpo/Downloads/newplot.png")
from matplotlib_annotator import MatplotlibAnnotator
mpl_annotator = MatplotlibAnnotator()
mpl_annotator.plot_annotate(img, ["1", "2", "3"])