# Abstract



# Introduction

Jupyter Notebook is a fantastic tool for data exploration, enabling analysts to write documents that contain software code, computational output, formatted text and even multimedia \cite{perkel2018jupyter}. Visualization is a big component of the data exploration process, and can be frequently found in Jupyter Notebooks: for example, a recent study of public github repositories found that *matplotlib* was the second most imported package in the notebook environment \cite{pimentel2019large}.

When datasets are too large or too complex, interactive visualization becomes a useful tool in exploratory data analysis. Interactive visualizations can enable, among many others, the display of information at multiple levels of detail, the exploration of data using coordinated views, and the dynamic change of the charts to focus on the user's interests \cite{munzner2014visualization}. While Jupyter Notebooks can contain interactive visualizations, the vast majority of charts produced in the environment are static.

In this paper, we will present three simple and powerful approaches in which data scientists can create interactive visualizations in Jupyter Notebooks: *Matplotlib Callbacks*, *Visualization Toolkits* and *Custom HTML Embedding*. The three approaches offer a number of benefits and drawbacks that need to be considered by the developer in order for them to make an informed decision about their visualization project. By the end of this paper, the reader will have a good understanding of the three methods, and will be able to select an implementation approach depending on the level of interaction, customization and data flow desired.

In [4]:
# A common pattern in Jupyter workflows include the execution of code cells, followed by the visualization of the results and formatted text with a description of the experiments \cite{pimentel2019large}.

# These notebooks are widely used in both in science and industry: one analyzis of the GitHub public repositories in 2018 counted more than 2.5 million Jupyter notebooks \cite{perkel2018jupyter}.

# Data Visualization allows people to analyze data when they do not know what questions they need to ask in advance \cite{munzner2014visualization}.

# Related Work

3D Volume visualizations

ipygany: https://blog.jupyter.org/ipygany-jupyter-into-the-third-dimension-29a97597fc33

ipyvolume: https://github.com/maartenbreddels/ipyvolume        

Altair \cite{vanderplas2018altair} is declarative statistical visualization library for Python. It ports the VEGA-Lite specification. It enables the creation of a wide range of interactive statistical visualizations to be expressed using a small number of grammar primitives. Multiple views can be combined and cross-filtered. All visualizations can be used interactively in Jupyter Notebooks.

PipelineProfiler \cite{ono2020pipelineprofiler} is a python library that enables the visualization and comparison of Machine Learning (ML) pipelines produced by AutoML systems. Users can compare pipelines based on graph structure (how pipeline nodes/inputs/outputs are combined), hyperparameters, and node importance. The library is integrated with Jupyter Notebooks and Google Colab Notebooks. Users can subset, sort, and export pipelines back to python for further analysis.

StatCast Dashboard \cite{lage2016statcast} is a tool that enables baseball experts and fans to query, filter and analyze tracking data gathered by Major League Baseball's StatCast, a spatiotemporal player tracking system. The system enables users to query MLB's games based on play metadata (teams, players, date, etc), play statistics (ball velocity, runner top speed, route efficiency, etc), as well as trajectory (brushing on the baseball field diagram). Queried plays can be exported to csv and ingested back in the notebook. It is integrated with Jupyter notebooks to facilitate the data science workflow.

VisJS2jupyter \cite{rosenthal2017interactive} is a library designed to enable the exploration of  biological networks in Jupyter notebooks.  The library consumes networks in the NetworkX \cite{hagberg2008exploring} format and plots the interactive visualizations using Javascript.

InterpretML \cite{nori2019interpretml} is a python package that contains a collection of algorithms for explaining machine learning models, including LIME, SHAP and Partial Dependency Plots. Every explanation can be displayed to the user using visualizations, that wre integrated with Jupyter Notebook.s well as visualizations and dashboards that enable the exploration  of these explanations.

LIME \cite{ribeiro2016should} is a python library that explains the predictions of machine learning  classifiers by using a local interpretable surrogate model on top of the original model's predictions. The prediction explanations can be plotted on Jupyter notebooks. 

The SHAP library \cite{lundberg2018explainable} also provides methods to explain machine learning models and display interactive visualizations on Jupyter Notebooks.  

# Interactive Visualizations 3 Ways

## Matplotlib Interactive Callbacks

## Javascript-based packages

## Custom Javascript Visualizations

Displaying custom Javascript visualizations in Jupyter Notebook can be done in a few lines of code. 
The package *Ipython.display*[^1] contains the function *display* and the class *HTML*, which together can embed any HTML code in notebook cells. The HTML may contain both CSS and Javascript, which affords flexible, interactive and customizable visualizations to be created.

[^1]: https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html

In order to embed the the visualization in a cell, one needs to create a string variable containing all the HTML, Javascript and CSS code needed for the visualization. Since writing everything in a Jupyter cell can be too cumbersome, one can write the visualization in a code editor, then load the document in python and display it. Javascript Bundlers, such as Webpack \cite{webpack_webpack_2020}, can convert multiple HTML, Javascript and CSS files into a single file, facilitating this process.

In the following, we show a minimal example of HTML embedding in a jupyter cell:

In [None]:
from IPython.display import display, HTML
html_string = """<button onclick="alert('Hello World')">Hello World</button>"""
display(HTML(html_string))

Two popular frameworks for writing Javascript visualizations are D3 \cite{bostock2011d3} and React \cite{fedosejev2015react}. Both frameworks manipulate the HTML page based on the data, which facilitates the design of new visualizations. Integrating these third party frameworks in custom Jupyter Notebook visualizations can be accomplished in two ways: 1) Using Webpack \cite{webpack2020} to bundle all the dependencies and visualization source code together in  a new library file. And 2) Importing the framework directly on a notebook cell using the *Require* command. 

The first approach has a higher setup cost, as it requires setting up a new NPM project and configuring Webpack to bundle the code and dependencies into a single javascript file. However, it allows the visualization to be divided in multiple files, which often results in a more organized code. For detailed instructions on how to compile a javascript library using Webpack, please see the documentation page[^2].

[^2]: https://webpack.js.org/guides/author-libraries/

Once the code is bundled into a single file, it can be used in Jupyter notebooks by reading the  file as a text document, and displaying it using the *display(HTML(source))* command.

Data can be passed from Python to javascript by using JSON strings. First, the data is converted to json. Then, the string can be sent to a variable in javascript and used normally.

The following code snippet shows how to embed a custom library that enables the display and edit of baseball play trajectories. The trajectory data comes from \cite{ono2019historytracker}

In [None]:
import pandas as pd

with open("./BaseballVisualizer/build/baseballvisualizer.js", "r") as f:
    baseball_visualizer_bundle = f.read()

play = pd.read_csv("./play_annotated.csv", sep=";")
    
html = """
<html>
<body>
<div id="container"/>

<script type="application/javascript">
{baseball_visualizer_bundle}
</script>

<script> 
     baseballvisualizer.renderBaseballAnnotator("#container", {data});
</script>

</body>
</html>
""".format(baseball_visualizer_bundle=baseball_visualizer_bundle, data={'tracking': play.to_json(orient="records")})

display(HTML(html))

In [None]:
received_trajectory = []
def receive_trajectory(comm, open_msg):
    # comm is the kernel Comm instance
    # Register handler for later messages
    @comm.on_msg
    def _recv(msg):
        global received_trajectory 
        # Use msg['content']['data'] for the data in the message
        received_trajectory = msg['content']['data']['trajectory']
        print(received_trajectory)
        comm.send({'received': True})

get_ipython().kernel.comm_manager.register_target('submit_trajectory', receive_trajectory)

In [None]:
pd.DataFrame(received_trajectory)