
<div class="alert alert-info">

**Warning:** This notebook needs a running kernel to be fully interactive, please run it locally or on  [mybinder](https://mybinder.org/v2/gh/vaexio/vaex/master?filepath=docs%2Fsource%2Ftutorial_jupyter.ipynb).

</div>

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/vaexio/vaex/latest?filepath=docs%2Fsource%2Ftutorial_jupyter.ipynb)


# Jupyter integration: interactivity

Vaex can process about 1 billion rows per second, and in combination with the Jupyter notebook, this allows for interactive exporation of large datasets.

## Introduction
The `vaex-jupyter` package contains the building blocks to interactively define an N-dimensional grid, which is then used for visualizations.

We start by defining the building blocks (`vaex.jupyter.model.Axis`, `vaex.jupyter.model.DataArray` and `vaex.jupyter.view.DataArray`) used to define and visualize our N-dimensional grid.

Let us first import the relevant packages, and open the example DataFrame:

In [1]:
import vaex
import vaex.jupyter.model as vjm

import numpy as np
import matplotlib.pyplot as plt

df = vaex.example()

We want to build a 2 dimensinoal grid with the number counts in each bin. To do this, we first define two axis objects:

In [2]:
E_axis = vjm.Axis(df=df, expression=df.E, shape=140)
Lz_axis = vjm.Axis(df=df, expression=df.Lz, shape=100)
Lz_axis

Axis(_debug=False, _status_change_delay=0.0, calculation=None, centers=None, exception=None, expression=Lz, max=None, min=None, shape=100, shape_default=64, slice=None, status=Status.NO_LIMITS)

_COMMENT: This needs updating_

We inspect the `Lz_axis` object, and see that is min and max are None, as well at its centers. We can force the computation by calling [calculate_limits](api.html#???). Note that this is for demonstration purposes only, and 
usually computed automatically by `vaex-jupyter`.

In [3]:
# COMMENT: This cell needs an explanation or it needs to be modified?
await vaex.jupyter.gather()
Lz_axis

Axis(_debug=False, _status_change_delay=0.0, calculation=None, centers=[-2877.11808899 -2830.27174744 -2783.42540588 -2736.57906433
 -2689.73272278 -2642.88638123 -2596.04003967 -2549.19369812
 -2502.34735657 -2455.50101501 -2408.65467346 -2361.80833191
 -2314.96199036 -2268.1156488  -2221.26930725 -2174.4229657
 -2127.57662415 -2080.73028259 -2033.88394104 -1987.03759949
 -1940.19125793 -1893.34491638 -1846.49857483 -1799.65223328
 -1752.80589172 -1705.95955017 -1659.11320862 -1612.26686707
 -1565.42052551 -1518.57418396 -1471.72784241 -1424.88150085
 -1378.0351593  -1331.18881775 -1284.3424762  -1237.49613464
 -1190.64979309 -1143.80345154 -1096.95710999 -1050.11076843
 -1003.26442688  -956.41808533  -909.57174377  -862.72540222
  -815.87906067  -769.03271912  -722.18637756  -675.34003601
  -628.49369446  -581.64735291  -534.80101135  -487.9546698
  -441.10832825  -394.26198669  -347.41564514  -300.56930359
  -253.72296204  -206.87662048  -160.03027893  -113.18393738
   -66.33759583 

Note that the Axis is a [traitlets HasTrait object](https://traitlets.readthedocs.io), similar to all ipywidget objects, meaning we can link all its properties to an ipywidget and thus creating interactivity. We can also use [observe](https://traitlets.readthedocs.io/en/stable/using_traitlets.html#observe) to listen to any changes to our model.

## An interactive xarray DataArray display

_COMMENT: I think this text below is crucial to the whole thing, and needs to be explained better, especially the "what is returned" part. I will try to think of how to do this._

Now that we have defined our two axes, we use the [widget accessor](api.html#vaex.jupyter.DataFrameAccessorWidget) to create a [vaex.jupyter.model.DataArray](api.html#vaex.jupyter.model.DataArray) together with a  [vaex.jupyter.view.DataArray](api.html#vaex.jupyter.view.DataArray), and link that model and view. What is returned is a view, which is an ipywidget, and can be visualized in the Jupyter notebook.

In [4]:
data_array_widget = df.widget.data_array(axes=[Lz_axis, E_axis], selections=[None, 'default'])
data_array_widget

DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…

From the specification of the axis and the selections, Vaex computes a 3d histogram, the first dimension being the selections. Interally this is simply a numpy array, but we wrap it in an [xarray](http://xarray.pydata.org/)  [DataArray](http://xarray.pydata.org/en/stable/data-structures.html#dataarray) object. An xarray DataArray object can be seen as a labeled Nd array, i.e. a numpy array with extra metadata to make it fully self-describing.

Notice that in the above code cell, we specified the `selection` argument with a list containing two elements in this case, `None` and `'default'`. The `None` selection simply shows all the data, while the `default` refers to any selection made without explicitly naming it. Even though the later has not been defined at this point, we can still pre-emptively include it, in case we want to modify it later.

The most important properties of the `data_array` are printed out below:

In [5]:
### COMMENT: are the 1st two print lines necessary? They just print out as empty lists

# NOTE: since the computations are done in the background, data_array_widget.model.grid is initially None.
# We can force vaex-jupyter to wait till all executions are done using:
print(vaex.jupyter.utils._debounced_futures)  # COMMENT <-
await vaex.jupyter.gather()
print(vaex.jupyter.utils._debounced_futures)   # COMMENT <-
# get a reference to the xarray DataArray object
data_array = data_array_widget.model.grid
print(f"type:", type(data_array))
print("dims:", data_array.dims)
print("data:", data_array.data)
print("coords:", data_array.coords)
print("Lz's data:", data_array.coords['Lz'].data)
print("Lz's attrs:", data_array.coords['Lz'].attrs)
print("And displaying the xarray DataArray:")
display(data_array)  # this is what the vaex.jupyter.view.DataArray uses

[<Future pending>]
[]
type: <class 'xarray.core.dataarray.DataArray'>
dims: ('selection', 'Lz', 'E')
data: [[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]]
coords: Coordinates:
  * selection  (selection) object None
  * Lz         (Lz) float64 -2.877e+03 -2.83e+03 ... 1.714e+03 1.761e+03
  * E          (E) float64 -2.414e+05 -2.394e+05 ... 3.296e+04 3.495e+04
Lz's data: [-2877.11808899 -2830.27174744 -2783.42540588 -2736.57906433
 -2689.73272278 -2642.88638123 -2596.04003967 -2549.19369812
 -2502.34735657 -2455.50101501 -2408.65467346 -2361.80833191
 -2314.96199036 -2268.1156488  -2221.26930725 -2174.4229657
 -2127.57662415 -2080.73028259 -2033.88394104 -1987.03759949
 -1940.19125793 -1893.34491638 -1846.49857483 -1799.65223328
 -1752.80589172 -1705.95955017 -1659.11320862 -1612.26686707
 -1565.42052551 -1518.57418396 -1471.72784241 -1424.88150085
 -1378.0351593  -1331.18881775 -1284.3424762  -1237.49613464


Note that `data_array.coords['Lz'].data` is the same as `Lz_axis.centers` and `data_array.coords['Lz'].attrs` contains the same `min/max` as the `Lz_axis`.

Also, we see that displaying the xarray.DataArray object (`data_array_view.model.grid`) gives us the same output as the `data_array_view` above. There is a big difference however. If we change a selection:

In [6]:
df.select(df.x > 0)

and scroll back we see that the `data_array_view` widget has updated itself, and now contains two selections! This is a very powerful feature, that allows us to make interactive visualizations.


## Interactive plots

To make interactive plots we can pass a pass a custom `display_function` to the `data_array_widget`. This will override the default notebook behaviour which is a call to `display(data_array_widget)`. In the following example we create a function that displays a matplotlib figure:

In [7]:
# NOTE: da is short for 'data array'
def plot2d(da):
    plt.figure(figsize=(8, 8))
    ar = da.data[1]  # take the numpy data, and select take the selection
    print(f'imshow of a numpy array of shape: {ar.shape}')
    plt.imshow(np.log1p(ar.T), origin='lower')

df.widget.data_array(axes=[Lz_axis, E_axis], display_function=plot2d)

DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…

In the above figure, we  choose index 1 along the selection axis (the `default` selection; choose an index of 0 would correspond to the `None` selection). If we now change the selection, the figure will update itself:

In [8]:
df.select(df.id < 10)

As xarray's DataArray is fully self describing, we can improve the plot by using the dimension names for labeling, and setting the extent of the figure's axes.

Note that we don't need any information from the Axis objects created above, and in fact, we should not use them, since they may not be in sync with the xarray DataArray object. Later on, we will create a widget that will edit the Axis' expression. 

_COMMENT: This should come later, when this topic is discussed_
>If we change the expression of the Axis object (e.g. `Lz_axis.expression = np.abs(df.Lz)`, it can take a while >efore the computation finishes, so be sure to only use the information in the xarray DataArray object.

Our improved visualization with proper axes and labeling:

In [9]:
def plot2d_with_labels(da):
    plt.figure(figsize=(8, 8))
    grid = da.data[0]  # take the numpy data, and select the first selection (no selection)
    dim_x = da.dims[1]
    dim_y = da.dims[2]
    plt.title(f'{dim_y} vs {dim_x} - shape: {grid.shape}')
    extent = [
        da.coords[dim_x].attrs['min'], da.coords[dim_x].attrs['max'],
        da.coords[dim_y].attrs['min'], da.coords[dim_y].attrs['max']
    ]
    plt.imshow(np.log1p(grid.T), origin='lower', extent=extent, aspect='auto')
    plt.xlabel(da.dims[1])
    plt.ylabel(da.dims[2])

da_plot_view_nicer = df.widget.data_array(axes=[Lz_axis, E_axis], display_function=plot2d_with_labels)
da_plot_view_nicer

DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…

We can also create more sophisticated plots, for example one where we show all of the selections. Note that we can pre-emptively expect a selection and define it later:

In [10]:
def plot2d_with_selections(da):
    grid = da.data
    # Create 1 row and #selections of columns of matplotlib axes
    fig, axgrid = plt.subplots(1, grid.shape[0], sharey=True, squeeze=False)
    for selection_index, ax in enumerate(axgrid[0]):
        ax.imshow(np.log1p(grid[selection_index].T), origin='lower')

df.widget.data_array(axes=[Lz_axis, E_axis], display_function=plot2d_with_selections,
                     selections=[None, 'default', 'rest'])

DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…

Modifying a selection will update the figure.

In [11]:
df.select(df.id < 10)  # select 10 objects
df.select(df.id >= 10, name='rest')  # and the rest

Another advantage of using xarray is its excellent plotting capabilities. It handles a lot of the boring stuff like axis labeling, and also provides a nice interface for slicing the data even more.

Let us introduce another axis, FeH (fun fact: FeH or metallicity, is a property of stars that tells us how much heavy elements are contained in them, and an indicator of their origin):

In [12]:
FeH_axis = vjm.Axis(df=df, expression='FeH', min=-3, max=1, shape=5)
da_view = df.widget.data_array(axes=[E_axis, Lz_axis, FeH_axis])
da_view

DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…

We can see that we now have a 4 dimensional grid, which we would like to visualize.

And [xarray's plot](http://xarray.pydata.org/en/stable/plotting.html#two-dimensions) make our life much easier:

In [13]:
def plot_with_xarray(da):
    da_log = np.log1p(da)  # Note that an xarray DataArray is like a numpy array
    da_log.plot(x='Lz', y='E', col='FeH', row='selection', cmap='viridis')

plot_view = df.widget.data_array([E_axis, Lz_axis, FeH_axis], display_function=plot_with_xarray,
                                 selections=[None, 'default', 'rest'])
plot_view

DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…

_COMMENT: Would be nice if we can link Grammar of Graphics to something? For the unfamiliar_

We only have to tell xarray which axis it should map to which 'aesthetic', speaking in Grammar of Graphics terms.

## Selection widgets
Although we can change the selection in the notebook (e.g. `df.select(df.id > 20)`), if we create a dashboard ([using Voila](https://voila.readthedocs.io/en/stable/)) we cannot execute arbitrary code. Vaex-jupyter also comes with many widgets, and one of them is a `selection_expression` widget:

In [14]:
selection_widget = df.widget.selection_expression()
selection_widget

ExpressionSelectionTextArea(label='Filter by custom expression', placeholder='Enter a custom (boolean) express…

The `counter_selection` creates a widget which keeps track of the number of rows in a selection. In this case we ask it to be 'lazy', which means that it will not cause extra passes over the data, but will ride along if some user action triggers a calculation.

In [15]:
await vaex.jupyter.gather()
w = df.widget.counter_selection('default', lazy=True)
w

ERROR:MainThread:asyncio:Exception in callback <TaskStepMethWrapper object at 0x1139ebd50>()
handle: <Handle <TaskStepMethWrapper object at 0x1139ebd50>()>
Traceback (most recent call last):
  File "/Users/jovan/miniconda3/lib/python3.7/site-packages/nest_asyncio.py", line 150, in run
    ctx.run(self._callback, *self._args)
RuntimeError: Cannot enter into task <Task pending coro=<_debounced_callable.__call__.<locals>.debounced_execute.<locals>.run_async() running at /Users/jovan/Work/vaex/packages/vaex-core/vaex/jupyter/utils.py:149>> while another task <Task pending coro=<InteractiveShell.run_cell_async() running at /Users/jovan/miniconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3020> cb=[IPythonKernel._cancel_on_sigint.<locals>.cancel_unless_done(<Future pendi...ernel.py:230]>)() at /Users/jovan/miniconda3/lib/python3.7/site-packages/ipykernel/ipkernel.py:230, IOLoop.add_future.<locals>.<lambda>() at /Users/jovan/miniconda3/lib/python3.7/site-packages/tornado/io

ERROR:MainThread:asyncio:Task was destroyed but it is pending!
task: <Task pending coro=<_debounced_callable.__call__.<locals>.debounced_execute.<locals>.run_async() running at /Users/jovan/Work/vaex/packages/vaex-core/vaex/jupyter/utils.py:149>>
  return _format_callback(func.func, func.args, func.keywords, suffix)
ERROR:MainThread:asyncio:Exception in callback <TaskStepMethWrapper object at 0x1147b9b50>()
handle: <Handle <TaskStepMethWrapper object at 0x1147b9b50>()>
Traceback (most recent call last):
  File "/Users/jovan/miniconda3/lib/python3.7/site-packages/nest_asyncio.py", line 150, in run
    ctx.run(self._callback, *self._args)
RuntimeError: Cannot enter into task <Task pending coro=<_debounced_callable.__call__.<locals>.debounced_execute.<locals>.run_async() running at /Users/jovan/Work/vaex/packages/vaex-core/vaex/jupyter/utils.py:149>> while another task <Task pending coro=<InteractiveShell.run_cell_async() running at /Users/jovan/miniconda3/lib/python3.7/site-packages/IPyt

Counter(characters=['&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '9', '9', …

## Axis control widgets


Let us create new axis objects using the same expressions as before, but give them more general names (x_axis and y_axis), because we want to change the expressions interactively.

In [16]:
x_axis = vjm.Axis(df=df, expression=df.Lz)
y_axis = vjm.Axis(df=df, expression=df.E)

da_xy_view = df.widget.data_array(axes=[x_axis, y_axis], display_function=plot2d_with_labels, shape=180)
da_xy_view

DataArray(children=[Container(children=[ProgressCircularNoAnimation(color='#9ECBF5', size=30, text='', value=1…

Again, we can change the expressions of the axes programmatically:

In [17]:
# wait for the previous plot to finish
await vaex.jupyter.gather()
# Change both the x and y axis
x_axis.expression = np.log(df.x**2)
y_axis.expression = df.y
# Note that both assignment will create 1 computation in the background (minimal amount of passes over the data)
await vaex.jupyter.gather()
# vaex computed the new min/max, and the xarray DataArray
# x_axis.min, x_axis.max, da_xy_view.model.grid

ERROR:MainThread:asyncio:Task was destroyed but it is pending!
task: <Task pending coro=<_debounced_callable.__call__.<locals>.debounced_execute.<locals>.run_async() running at /Users/jovan/Work/vaex/packages/vaex-core/vaex/jupyter/utils.py:149>>
  return Promise()


But, if we want to create a dashboard with Voila, we need to have a widget that controls them:

In [18]:
x_widget = df.widget.expression(x_axis.expression, label='X axis')
x_widget

Expression(label='X axis', placeholder='Enter a custom expression', prepend_icon='functions', success_messages…

This widget will allow us to edit an expression, which will be validated by Vaex. How do we 'link' the value of the widget to the axis expression? Because both the Axis as well as the `x_widget` are [HasTrait objects](https://traitlets.readthedocs.io/en/stable/using_traitlets.html), we can link their traits together: 

In [19]:
from ipywidgets import link
link((x_widget, 'value'), (x_axis, 'expression'))

<traitlets.traitlets.link at 0x11e8c8250>

Since this operation is so common, we can also directly pass the Axis object, and Vaex will set up the linking for us:

In [20]:
y_widget = df.widget.expression(y_axis, label='X axis')
# vaex now does this for us, much shorter
# link((y_widget, 'value'), (y_axis, 'expression'))
y_widget

Expression(label='X axis', placeholder='Enter a custom expression', prepend_icon='functions', success_messages…

In [21]:
### COMMENT: why is this here?
await vaex.jupyter.gather()

## A nice container

If you are familiar with the [ipyvuetify](https://github.com/mariobuikhuizen/ipyvuetify/) components, you can combine them to create very pretty widgets. Vaex-jupyter comes with a nice container:

In [22]:
from vaex.jupyter.widgets import ContainerCard

ContainerCard(title='My plot',
              subtitle="using vaex-jupyter",
              main=da_xy_view,
              controls=[x_widget, y_widget], show_controls=True)

ContainerCard(controls=[Expression(label='X axis', placeholder='Enter a custom expression', prepend_icon='func…

We can directly assign a Vaex expression to the `x_axis.expression`, or to `x_widget.value` since they are linked.

In [23]:
y_axis.expression = df.vx

## Interactive plots

So far we have been using interactive widgets to control the axes if the view. The figure itself however was not interactive, and we could not have panned or zoomed for example.

Vaex has a few builtin visualizations, most notably a heatmap and histogram using [bqplot](https://github.com/bqplot/bqplot/):

In [24]:
heatmap_xy = df.widget.heatmap(df.x, df.y)
heatmap_xy

Heatmap(children=[ToolsToolbar(interact_value=None, supports_normalize=False, template='<template>\n  <v-toolb…

The heatmap above is itself a widget. Thus we can combine it with other widgets to create a more sophisticated interface.

In [25]:
x_widget = df.widget.expression(heatmap_xy.model.x, label='X axis')
y_widget = df.widget.expression(heatmap_xy.model.y, label='X axis')

ContainerCard(title='My plot',
              subtitle="using vaex-jupyter and bqplot",
              main=heatmap_xy,
              controls=[x_widget, y_widget, selection_widget],
              show_controls=True,
              card_props={'style': 'min-width: 800px;'})

ContainerCard(card_props={'style': 'min-width: 800px;'}, controls=[Expression(label='X axis', placeholder='Ent…

_COMMENT: The sentence is not finished_

By switching the tool in the toolbar (click <i aria-hidden="true" class="v-icon notranslate material-icons theme--light">pan_tool</i>, or change it programmmatically in the next cell), we can zoom in. The plot's axis bounds are directly synched to the 



In [26]:
heatmap_xy.tool = 'pan-zoom'

In [27]:
### COMMENT: This needs to be described I guess, perhaps as part of the missing text above.
await vaex.jupyter.gather()
heatmap_xy.model.x.expression = np.log10(df.x**2)

And here is an example of a histogram using multiple selections:

In [28]:
### COMMENT: why is this needed? If it is needed we need to explain it.
await vaex.jupyter.gather()

In [29]:
histogram_Lz = df.widget.histogram(df.Lz)
histogram_Lz

Histogram(children=[ToolsToolbar(interact_value=None, supports_transforms=False, template='<template>\n  <v-to…

## Creating your own visualizations

If you want to create your own visualization on this framework, go to the [Examples](examples.html) page, or go directly to: [add missing link]


### ipyvolume example

[![](screenshot/example_jupyter_ipyvolume.png)](example_jupyter_ipyvolume.html)

### plotly example

[![](screenshot/example_jupyter_plotly.png)](example_jupyter_plotly.html)