# Interactive Visualization with Bokeh, HoloViews, and Datashader

<br>Owner: **Keith Bechtol** ([@bechtol](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@bechtol))
<br>Last Verified to Run: **2018-08-10**
<br>Verified Stack Release: **v16.0, w201831**

This notebook demonstrates a few of the interactive features of the Bokeh, HoloViews, and Datashader plotting packages in the notebook environment. These packages are part of the [PyViz](http://pyviz.org/) set of python tools intended for visualization use cases in a web browser, and can be used to create quite sophisticated dashboard-like interactive displays and widgets. The goal of this notebook is to provide an introduction and starting point from which to create more advanced, custom interactive visualizations. As a source of inspiration, check out this beautiful [example notebook](https://github.com/timothydmorton/qa_explorer) using HSC data created with the [qa_explorer](https://github.com/timothydmorton/qa_explorer) tools.

### Learning Objectives
After working through and studying this notebook you should be able to
   1. Use `bokeh` to create interactive figures with brushing and linking between multiple plots
   2. Use `holoviews` and `datashader` to create two-dimensional histograms with dynamic binning to efficiently explore large datasets   

Other techniques that are demonstrated, but not empasized, in this notebook are
   1. Use `parquet` to efficiently access large amounts of data

### Logistics
This notebook is intended to be runnable on `lsst-lspdev.ncsa.illinois.edu` from a local git clone of https://github.com/LSSTScienceCollaborations/StackClub.

## Setup
You can find the Stack version by using `eups list -s` on the terminal command line.

In [None]:
# What version of the Stack am I using?
! echo $HOSTNAME
! eups list -s | grep lsst_distrib

In [None]:
import numpy as np
import astropy.io.fits as pyfits

import bokeh
from bokeh.io import output_file, output_notebook, show
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource, Range1d, HoverTool, Selection
from bokeh.plotting import figure, output_file

import holoviews as hv
from holoviews import streams
from holoviews.operation.datashader import datashade, dynspread, rasterize
hv.extension('bokeh')

In [None]:
output_notebook()

## Prelude: Data Sample

The data in the following example comes from the Dark Energy Survey Data Release 1 (DES DR1). The input data for this example obtained with the M2 database query in Appendix C of the [DES DR1 paper](https://arxiv.org/abs/1801.03181) from the [DES Data Release page](https://des.ncsa.illinois.edu/releases/dr1/dr1-access).

In [None]:
infile = '/project/kbechtol/des/dr1/dr1_m2_dered_test.fits'
reader = pyfits.open(infile)
data = reader[1].data
reader.close()

data = data[data['MAG_AUTO_G_DERED'] < 26.]
print(len(data))

## Part 1: Brushing and linking between scatter plots with Bokeh

Based on http://bokeh.pydata.org/en/latest/docs/user_guide/interaction/linking.html#linked-brushing 

In [None]:
ra_target, dec_target = 323.36, -0.82

mag = data['MAG_AUTO_G_DERED']
color = data['MAG_AUTO_G_DERED'] - data['MAG_AUTO_R_DERED']

# create a column data source for the plots to share
source = ColumnDataSource(data=dict(x0=data['RA'] - ra_target,
                                    y0=data['DEC'] - dec_target,
                                    x1=color,
                                    y1=mag,
                                    ra=data['RA'],
                                    dec=data['DEC'],
                                    coadd_object_id=data['COADD_OBJECT_ID']))

In [None]:
# There is a little bit of trickery here to a create custom hover tool on both panels
hover_left = HoverTool(tooltips=[("(RA,DEC)", "(@ra, @dec)"),
                                 ("(g-r,g)", "(@x1, @y1)"),
                                 ("coadd_object_id", "@coadd_object_id")])
hover_right = HoverTool(tooltips=[("(RA,DEC)", "(@ra, @dec)"),
                                  ("(g-r,g)", "(@x1, @y1)"),
                                  ("coadd_object_id", "@coadd_object_id")])
TOOLS = "box_zoom,box_select,lasso_select,reset,help"
TOOLS_LEFT = [hover_left, TOOLS]
TOOLS_RIGHT = [hover_right, TOOLS]

In [None]:
width = 300

# create a new plot and add a renderer
left = figure(tools=TOOLS_LEFT, plot_width=width, plot_height=width, output_backend="webgl",
              title='Spatial: Centered on (RA, Dec) = (%.2f, %.2f)'%(ra_target, dec_target))
left.circle('x0', 'y0', hover_color='firebrick', source=source,
            selection_fill_color='steelblue', selection_line_color='steelblue',
            nonselection_fill_color='silver', nonselection_line_color='silver')
left.x_range = Range1d(0.3, -0.3)
left.y_range = Range1d(-0.3, 0.3)
left.xaxis.axis_label = 'Delta RA'
left.yaxis.axis_label = 'Delta DEC'

# create another new plot and add a renderer
right = figure(tools=TOOLS_RIGHT, plot_width=width, plot_height=width, output_backend="webgl",
               title='CMD')
right.circle('x1', 'y1', hover_color='firebrick', source=source,
             selection_fill_color='steelblue', selection_line_color='steelblue',
             nonselection_fill_color='silver', nonselection_line_color='silver')
right.x_range = Range1d(-0.5, 2.5)
right.y_range = Range1d(26., 16.)
right.xaxis.axis_label = 'g - r'
right.yaxis.axis_label = 'g'

p = gridplot([[left, right]])

#output_file("bokeh_m2_example.html", title="M2 Example")

#selection = streams.Selection1D(source=left)
#selection = source.selected
#print(selection.indices)
#selection = Selection(source=source)

# Declare points as source of selection stream
selection = streams.Selection1D(source=source)

show(p)

Use the hover tool to see information about individual datapoints (e.g., the `coadd_object_id`). Notice the data points highlighted in one panel with the hover tool are also highlighted in the other panel. Next, use the selection box and selection lasso to make various selections in either panel. The selected data points will be displayed in the other panel. 

> Note that the default tool is "Box Zoom", not "Box Select"! And see the "Reset" button - that's useful.

**Open Issue:** Bonus for someone can suggest how to access the indices of the selected points!

In [None]:
selection.index
#selection.indices
#type(selection)

In [None]:
s = Selection()
s.indices

### Experimental Section: Creating the same plot with HoloViews

In [None]:
%%output size=150
%%opts Points [tools=['box_select']]
points = hv.Points((data['RA'] - ra_target, data['DEC'] - dec_target))
#points = hv.Points(np.random.multivariate_normal((0, 0), [[1, 0.1], [0.1, 1]], (1000,)))

# Declare points selection:
sel = streams.Selection1D(source=points)

#boundsxy = (0, 0, 0, 0)
#box = streams.BoundsXY(source=points, bounds=boundsxy)

#dynspread(datashade(points, cmap=bokeh.palettes.Viridis256))
#datashade(points, cmap=bokeh.palettes.Viridis256)
points + points

In [None]:
len(sel.index)

In [None]:
# help(hv.Points)

In [None]:
# Here I had a re-sized hv Points with custom colors for the selection
# hv.Points()

In [None]:
# help(sel)

In [None]:
# print(sel.index)

In [None]:
import pandas as pd
df = pd.DataFrame.from_records(data)
df['x'] = data.RA - ra_target
df['y'] = data.DEC - dec_target

ds = hv.Dataset(df)#, kdims)

In [None]:
points_xy = hv.Points((ds.data['x'], ds.data['y']))
points_xy

In [None]:
%%output size=150
#%%opts Points [tools=['box_select']]
points = hv.Points((data['RA'] - ra_target, data['DEC'] - dec_target))
#points = hv.Points(np.random.multivariate_normal((0, 0), [[1, 0.1], [0.1, 1]], (1000,)))

# Declare points selection selection
sel = streams.Selection1D(source=points)

boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box]) 

#dynspread(datashade(points, cmap=bokeh.palettes.Viridis256))
datashade(points, cmap=bokeh.palettes.Viridis256) * bounds

In [None]:
box

In [None]:
selection = (points.data.x > box.bounds[0]) \
    & (points.data.y > box.bounds[1]) \
    & (points.data.x < box.bounds[2]) \
    & (points.data.y < box.bounds[3])
np.sum(selection)

In [None]:
selected_points = points[selection.values]

In [None]:
len(selected_points)

In [None]:
selected_points

> For more help on selecting points in HoloViews, see the [user guide](http://build.holoviews.org/User_Guide/Indexing_and_Selecting_Data.html).

## Intermission: Rapid Data Access with Parquet

For the next example, we want to use a much larger dataset. Let's use Gaia Data Release 2 (Gaia DR2). 

In [None]:
import glob
import pandas as pd
import pyarrow.parquet as pq

In [None]:
infiles = sorted(glob.glob('/project/shared/data/gaia_dr2_1am/*.parquet'))
print(len(infiles))

In [None]:
%%time
df_array = []
for ii in range(0, 10):
    print(infiles[ii])
    columns = ['ra', 'dec', 'phot_g_mean_mag'] # 'phot_g_mean_mag', 'phot_bp_mean_mag', 'phot_rp_mean_mag']
    df_array.append(pq.read_table(infiles[ii], columns=columns).to_pandas())
df = pd.concat(df_array)

In [None]:
print('Dataframe contains %.2f M rows'%(len(df) / 1.e6))
print(df.columns.values)

## Part 2: Visualizing Larger Datasets with Datashader

In the examples below, notice that as one zooms in on the datashaded two-dimensional histograms, the bin sizes are dynamically adjusted to show finer or coarer granularity in the distribution. This allows one to interactively explore large datasets without having to manually adjust the bin sizes while panning and zooming. 

In [None]:
%%output size=150
#%%opts Points [tools=['box_select']]
points = hv.Points((df.ra, df.dec))
#points = hv.Points(np.random.multivariate_normal((0, 0), [[1, 0.1], [0.1, 1]], (1000,)))

# Declare points selection selection
#sel = streams.Selection1D(source=points)

boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box]) 

#dynspread(datashade(points, cmap=bokeh.palettes.Viridis256))
datashade(points, cmap=bokeh.palettes.Viridis256) * bounds

Here is a callback example in which we retrieve the indices of the selected points from the plot above. First, use the selection tool to create a selection box for the two-dimensional histogram above. Then run the cell below to count the number of datapoints within the selection region.

In [None]:
selection = (points.data.x > box.bounds[0]) \
    & (points.data.y > box.bounds[1]) \
    & (points.data.x < box.bounds[2]) \
    & (points.data.y < box.bounds[3])
print('The selection box contains %i datapoints'%(np.sum(selection)))

Now we make a linked plot

In [None]:
# First, create a holoviews dataset instance
kdims = [('ra', 'RA(deg)'), ('dec', 'Dec(deg)')]
vdims = [('phot_g_mean_mag', 'G(mag)')]
ds = hv.Dataset(df, kdims, vdims)
ds

In [None]:
points = hv.Points(ds)

#boundsxy = (0, 0, 0, 0)
boundsxy = (np.min(ds.data['ra']), np.min(ds.data['dec']), np.max(ds.data['ra']), np.max(ds.data['dec']))
box = streams.BoundsXY(source=points, bounds=boundsxy)
box_plot = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])

In [None]:
def update_histogram(bounds=bounds):
    
    selection = (ds.data['ra'] > bounds[0]) & \
                (ds.data['dec'] > bounds[1]) & \
                (ds.data['ra'] < bounds[2]) & \
                (ds.data['dec'] < bounds[3])
    
    selected_mag = ds.data.loc[selection]['phot_g_mean_mag']
    
    frequencies, edges = np.histogram(selected_mag)
    
    hist = hv.Histogram((np.log(frequencies), edges))
    return hist

In [None]:
%%output size=150
dmap = hv.DynamicMap(update_histogram, streams=[box])
datashade(points, cmap=bokeh.palettes.Viridis256) * box_plot + dmap

## Part 3: Images

In [None]:
%%opts Image  [height=600 width=650]
#%%output size=200
# Adjust size, make viridis colormap
               
zz = np.random.poisson(100, size=(4000, 4000))
xx, yy = np.meshgrid(np.arange(4000), np.arange(4000))
zz += xx

bounds=(0, 0, 4000, 4000)   # Coordinate system: (left, bottom, top, right)
img = hv.Image(zz, bounds=bounds).options(colorbar=True, cmap=bokeh.palettes.Viridis256, logz=True)

boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=img, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])

rasterize(img) * bounds


In [None]:
box

In [None]:
%%opts Points (color='black' marker='x' size=20)
closest = img.closest((0.1,0.1))
print('The value at position %s is %s' % (closest, img[1000, 1000]))
rasterize(img) * hv.Points([img.closest((1000,1000))])

In [None]:
img.sample(x=0) + img.reduce(x=np.mean)

In [None]:
#hv.help(hv.Image)

In [None]:
%output?