# Visualization

Visualizing large amounts of data presents challenges for both speed and fidelity. Several of the [PyViz](https://pyviz.org/index.html) libraries are specifically design for high perf and high volume, and have additional plugins to support interactivity via Bokeh.

### Datashader

__Datashader__ is an open-source library for analyzing and visualizing large datasets. 
* designed to "rasterize" or "aggregate" datasets into regular grids that can be viewed as images
* can plot a billion points in a second or so on a 16GB laptop
* scales up easily to out-of-core, distributed, or GPU processing for even larger datasets

The computation-intensive steps in this process are written in Python but transparently compiled to machine code using Numba and flexibly distributed across CPU cores and processors using Dask or GPUs using CUDA, providing a highly optimized rendering pipeline that makes it practical to work with extremely large datasets even on standard hardware while exploiting distributed and GPU systems when available.

In [None]:
import datashader as ds
import pandas as pd
from colorcet import fire
from datashader import transfer_functions as tf

df = pd.read_csv('data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])
df.head()

Here, we're visualizing a subset of the NYC taxi data which is not, itself, a large-data demo. But we can swap in the full dataset if we have the local memory/storage to do so.

In [None]:
agg = ds.Canvas().points(df, 'dropoff_x', 'dropoff_y')
tf.set_background(tf.shade(agg, cmap=fire),"black")

We can create an interactive rendering which will recompute the dataview as we change the viewport ... but this requires a JupyterLab plugin compatible with JupyterLab 2.x ... for now, we'll look at the static version in Lab ...

In [None]:
import holoviews as hv
from holoviews.element.tiles import EsriImagery
from holoviews.operation.datashader import datashade
hv.extension('bokeh')

map_tiles  = EsriImagery().opts(alpha=0.5, width=900, height=480, bgcolor='black')
points     = hv.Points(df, ['dropoff_x', 'dropoff_y'])
taxi_trips = datashade(points, x_sampling=1, y_sampling=1, cmap=fire, width=900, height=480)

map_tiles * taxi_trips

And now reload the notebook in non-Lab mode (in a new tab, remove `lab` from the URL, add `tree`, and select this notebook)...

Now pan/zoom, and note the recalculation.

### hvPlot

Also noteworthy is __hvPlot__ https://hvplot.holoviz.org/index.html, which uses a plug-in/hook style integration to supplement the plotting capabilities of other SciPy projects, including Dask.

Combined with Dask and Datashader (see examples at https://hvplot.holoviz.org/user_guide/Plotting.html to get started), we can assemble a high-performance data workbench with a user experience nearly identical to the basic, single-node Python tools.