# Visualising and plotting data <img align="right" src="../resources/csiro_easi_logo.png">

_Work in progress_ 

#### Index
- [Overview](#Overview)
- [Performance considerations](#Performance-considerations)
- [Setup](#Setup)
   - [Dask](#Dask)
   - [Imports](#Imports)
   - [Example query](#Example-query)
- [Display a map with area of interest](#Display-a-map-with-area-of-interest)
- [HoloViews example](#HoloViews-example)
- [More examples](#More-examples)
- [Help for plot functions](#Help-for-plot-functions)
- [Options for hvPlot and Holoviews](#Options-for-hvPlot-and-Holoviews)
- [Options for Bokeh](#Options-for-Bokeh)
- [GeoViews](#GeoViews)
- [Appendix](#Appendix)

<div class="alert alert-warning">
    <p>This notebook is under development</p>
</div>

## Overview

Broadly, there are two plotting "backends" available to python notebooks. Backends refer to the library that will render the plots.
- Matplotlib: static plots
- Bokeh: dynamic plots

Static vs dynamic plots relate to the data zoom level and extent seen in the plot. Static plots require the plot code to be re-run to change the zoom or extent, while dynamic plots allow the user to zoom and scroll within the available data range.

Matplotlib is widely used both directly (e.g., `import matplotlib.pyplot as plt`) and indirectly (via another plotting library). Bokeh is usually used indirectly (via another plotting library).

> The EASI Training notebooks and many other python notebooks use matplotlib.
> This notebook introduces dynamic data plotting with the HoloVis set of libraries.

Plotting libraries follow this pattern:
1. Create a figure to annotate and attach axes, lables, colorbars etc.
1. Map a data set to the figure viewport pixel space and render a colour for each pixel.
1. Send the figure to the screen or browser.

Large data arrays can challenge this pattern by requiring the data to be in-memory on the JupyterLab node in order to calculate (2).

## HoloViz

[HoloViz.org](https://holoviz.org) is a set of python libraries that address this challenge. The website has information and tutorials for the Holoviz family of libraries.

HoloViews and Datashader visualisation libraries have been designed work with xarray, pandas and dask data libraries.
 - http://holoviews.org/user_guide/Large_Data.html - How HoloViews works with Datashader and Bokeh

The HoloViz libraries provide a visualisation workflow that efficiently scales with data size.
- __[HoloViz.org](https://holoviz.org)__ - Information and tutorials for a set of browser-based Python visualisation tools
  - [PyHEP2019_slides.pdf](https://indico.cern.ch/event/833895/contributions/3577846/attachments/1928191/3205023/PyHEP2019_slides.pdf) - Summary presentation from an Anaconda developer
- __Panel__ - Create apps and dashboards from any supported plotting library
- __[hvPlot](https://hvplot.pyviz.org)__ - High-level plotting and layouts interface for holoviews objects, renders with Bokeh
- __[HoloViews](https://holoviews.org)__ - High-level library adds semantic 'intuition' to create and label plots and layouts. Uses datashader, and renders with either Bokeh or Matplotlib
- __[GeoViews](http://geoviews.org)__ - Extends HoloViews for geographic data using Cartopy
- __[Datashader](https://datashader.org/getting_started/Introduction.html)__ - Manages large datasets with dask, and fast 2D histograming/binning prior to the data being rendered
- __Param__ - Create declarative user-configurable objects
- __Colorcet__ - Provide perceptually uniform colormaps
- __[Bokeh](https://docs.bokeh.org/en/latest/docs/user_guide.html)__. Interactive plotting library (javascript)
- __Matplotlib__. The standard Python Numpy plotting library

> _Panel_ and _Param_ are not used in this tutorial (yet), and _Bokeh_ is preferred over _Matplotlib_.

## We recommend:
- Use Datashader and Bokeh for all plots to provide scalability and interactivity.
- Any combination of hvPlot, HoloViews and GeoViews depending on the task.
- Use ```hv.rasterize()``` as a convienent wrapper for Datashader.

## Setup

#### Imports

In [None]:
# Data tools
import numpy as np
import xarray as xr
import pandas as pd
from datetime import datetime

# Datacube
import datacube
from datacube.utils import masking  # https://github.com/opendatacube/datacube-core/blob/develop/datacube/utils/masking.py
from odc.algo import enum_to_bool   # https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py
from datacube.utils.rio import configure_s3_access

# Holoviews, Datashader and Bokeh
import hvplot.pandas
import hvplot.xarray
import holoviews as hv
import panel as pn
import colorcet as cc
import cartopy.crs as ccrs
from datashader import reductions
from holoviews import opts
# import geoviews as gv
from holoviews.operation.datashader import rasterize
hv.extension('bokeh', logo=False)

# Python
import sys, os, re
os.environ['USE_PYGEOS'] = '0'

# Optional EASI tools
sys.path.append(os.path.expanduser('../scripts'))
from easi_tools import EasiDefaults
import notebook_utils
easi = EasiDefaults()
from app_utils import display_map

#### Dask

In [None]:
cluster, client = notebook_utils.initialize_dask(use_gateway=False)
display(cluster if cluster else client)
print(notebook_utils.localcluster_dashboard(client, server=easi.hub))

#### Example query

In [None]:
# This configuration is read from the defaults for this system. 
# Examples are provided in a commented line to show how to set these manually.

study_area_lat = easi.latitude
# study_area_lat = (39.2, 39.3)

study_area_lon = easi.longitude
# study_area_lon = (-76.7, -76.6)

product = easi.product('landsat')
# product = 'landsat8_c2l2_sr'

set_time = easi.time
# set_time = ('2020-08-01', '2020-12-01')

set_crs = easi.crs('landsat')
# set_crs = 'EPSG:32618'

set_resolution = easi.resolution('landsat')
# set_resolution = (-30, 30)

In [None]:
query = {
    'product': product,
    'longitude': study_area_lon,
    'latitude': study_area_lat,
    'time': set_time,
    'output_crs': set_crs,
    'resolution': set_resolution,
    'group_by': 'solar_day',
    'dask_chunks': {'x': 2048, 'y': 2048}
}

dc = datacube.Datacube()
data = dc.load(**query)
data

In [None]:
# Masking

# Measurements for the selected product
measurements = dc.list_measurements().loc[query['product']]

# Separate lists of measurement data names and flag names
data_names = measurements[ pd.isnull(measurements.flags_definition)].index
flag_names = measurements[pd.notnull(measurements.flags_definition)].index

# Select one for use below
flag_name = flag_names[0]

# Simple masking by nodata value
# valid_data = masking.mask_invalid_data(data[data_names], keep_attrs=True)

good_pixel_flags = {
    'nodata': False,
    'cloud': 'not_high_confidence',
    'cloud_shadow': 'not_high_confidence',
}

mask = masking.make_mask(data[flag_name], **good_pixel_flags)  # expand dictionary of pixel flags

good_data = data[data_names].where(mask)
# good_data.plot(col='time', robust=True, col_wrap=4)

display(good_data)

## Display a map with area of interest

The [DEA notebooks](https://github.com/GeoscienceAustralia/dea-notebooks) include a convenient area of interest map display tool. The background is the "Google Maps" satellite imagery. The Red square show the bounding box area of interest.

Other [dea-notebooks plotting tools](https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Scripts/dea_plotting.py) include:
- ```rgb()``` - uses ```xarray.plot.imshow```
- ```display_map()``` - uses ```Folium].Map``` (leaflet.js) and Google Maps imagery
- ```map_shapefile()``` - uses ```ipyleaflet.Map``` and [Geopandas](https://geopandas.org)
- ```xr_animation()``` - uses ```matplotlib.animation``` to animate an xarray dataset by time and save the result to mp4

These functions should be sufficiently generic to use directly. Note however that ```rgb()``` and ```xr_animation()``` could potentially send a large amount of data to the user node for rendering, which could be partly mitigated by using a dask-enabled xarray. We can expect to define comparable HoloViews versions of these functions as required.

In [None]:
map = display_map(x=query['longitude'], y=query['latitude'])
map

# Optionally change the display size
# from https://github.com/python-visualization/folium/issues/229#issuecomment-162010967
import folium
width = '80%'  # Scale or number of pixels. Height as well but then scaling is relatively slow
fig = folium.Figure(width=width)
fig.add_child(map)
fig

## HoloViews example

_Work in progress_: Adapt examples from the Products notebooks.

Xarray dataset. Use this as a reference example.
1. Create an xarray dataset in your own notebook. You can subset and preprocess your data any way you like to get to a `selected_data` dataset. The xarray data can be a [Dataset or a DataArray](http://xarray.pydata.org/en/stable/data-structures.html).
1. Use `persist()` because you are likely to be re-running the visualisation part (editing layout etc).
1. Wrap a HoloViews `hv.Dataset` around your data.
1. `rasterize()` is a [HoloViews wrapper for Datashader](http://holoviews.org/user_guide/Large_Data.html).
1. Pass to `rasterize()` a HoloViews object to be applied to your data. Specify the:
  - HoloViews object type `hv.Image()`
  - `x` and `y` dimension names (data dimension names to map to Image `x` and `y` dimensions)
  - Layer name (DataArray name), even if only one.
  - `z` dimension name (e.g., time), if applicable.
1. Set plotting options, including special options passed to the HoloViews element, datashader and bokeh. Options can also be added separately.
1. `pre_compute=True` instructs Datashader to pre-compute what it can.
1. `hist()` optionally adds a dynamic histogram linked to the image.
1. If the `z` dimension is time, `rasterize()` will add a Bokeh time slider to the layout.

The data limits and units are passed through from xarray dataset. Colors are mapped dynamically by Bokeh for each image in the viewport to match the viewport data range.

Use the toolbar to scroll, zoom and hover etc.

In [None]:
### THIS SECTION NEEDS FURTHER WORK ###

In [None]:
good_data.coastal.isel(time=0).plot(robust=True)

In [None]:
good_data

In [None]:
# Just select 5 time slices to make things quicker
time_ind = np.linspace(1, good_data.sizes['time'], 5, dtype='int') - 1   # select some time slices to display
selected_data = good_data.isel(time=time_ind)
selected_data = selected_data.dropna('time',how="all") # drop any scenes that have no data

In [None]:
layer_name = 'nir08'
plot_data = selected_data[layer_name]
options = {
    'title': f'{query["product"]}: {layer_name}',
    'width': 600,
    'height': 600,
    'aspect': 'equal',
    'cmap': cc.rainbow,
    'clim': (5000, 20000),                          # Limit the color range depending on the layer_name
    'colorbar': True,
    'tools': ['hover'],
}

# Set the Dataset CRS
plot_crs = set_crs

# Native data and coastline overlay:
# - Comment `crs`, `projection`, `coastline` to plot in native_crs coords
# TODO: Update the axis labels to 'longitude', 'latitude' if `coastline` is used
    
layer_plot = plot_data.hvplot.image(
    x = 'x', y = 'y',                        # Dataset x,y dimension names
    rasterize = True,                        # Use Datashader
    # aggregator = reductions.mean(),          # Datashader selects mean value
    precompute = True,                       # Datashader precomputes what it can
    # crs = plot_crs,                          # Dataset crs
    # projection = ccrs.PlateCarree(),         # Output projection (use ccrs.PlateCarree() when coastline=True)
    # coastline='50m',                         # Coastline = '10m'/'50m'/'110m'
).options(opts.Image(**options)).hist(bin_range = options['clim'])

# display(layer_plot)
# Optional: Change the default time slider to a dropdown list, https://stackoverflow.com/a/54912917
fig = pn.panel(layer_plot, widgets={'time': pn.widgets.Select})  # widget_location='top_left'
display(fig)

In [None]:
# Compute the data fully...
# Use a slider instead of a dropdown
# Convert data to a hv.Dataset

plot_data = selected_data.compute()
hv_ds = hv.Dataset(plot_data)

rasterize(
    hv_ds.to(
        hv.Image,     # hv element
        ["x", "y"],   # x, y dimension names (key dimensions)
        layer_name,   # array name
        ["time"]      # z dimension name (value dimensions)
    ).opts(
        **options
    ),
    precompute=True
).hist()

## More examples

_Work in progress_

Consider an adapted version of https://examples.pyviz.org/landsat/landsat.html

## Help for plot functions

Find help for HoloViews functions (uncomment as desired)

In [None]:
# Show the primary options for an Element, using HoloViews hv.help()
# hv.help(hv.Image)
hv.help(rasterize)

# Show the full options and methods for an Element, using Python help()
# help(hv.Image)
# help(rasterize)

## Options for hvPlot and Holoviews

Add options to functions. There are two ways to pass options to elements, which can be used interchangably:
1. Pass `'option': 'value' pairs to the `.opts()` method. Pairs can be stored in a dict and passed with the `\**dict` notation.
1. Pass `opts._Element_()` objects to the `.opts()` method. This gives finer-grained control applying options to elements. 

In [None]:
# These examples are not meant to run

# Add options to an element
hv.Image().opts(option='value', ...)

# Specify options for elements with a compound object
hv.Image().opts(
    opts.Image(option='value', ...),
    ...
)

## Options for Bokeh

Dive into Bokeh's options for fine-grained control

The Python interface is imported by HoloViews (edit for element of interest)
- https://github.com/bokeh/bokeh/blob/main/bokeh/models/widgets/tables.py

Which in turn calls (typed) javascript (edit for element of interest)
- https://github.com/bokeh/bokeh/blob/main/bokehjs/src/lib/models/widgets/tables/data_table.ts

Selected configuration modules
- https://github.com/bokeh/bokeh/blob/main/examples/integration/widgets/data_table_customization.py - Column formatters
- https://github.com/bokeh/bokeh/blob/main/bokeh/core/enums.py - Options validator

## GeoViews

_Work in progress_

If reprojecting data onto a GeoViews grid, consider ```gv.operation.project``` to pre-project data before GeoViews/Datashader

## Appendix

#### hvPlot

- Create Holoviews and Geoviews Layouts.

#### HoloViews

- Data is wrapped into HoloViews _Elements_ that can be combined into a _Layout_. Elements are passed to Datashader or Bokeh/Matplotlib for plotting
- ```hv.rasterize()``` sends the data object to Datashader and instructs Bokeh to render each image.
- ```hv.datashade()``` sends the data object to Datashader and instructs Datashader to render each image (== ```shade(rasterize()```).
- Finer-grained plotting options can (mostly) be passed through to Datashader and Bokeh.

#### Datashader

- How Datashader works - https://datashader.org/getting_started/Pipeline.html
- How Datashader works with HoloViews and Bokeh - https://datashader.org/getting_started/Interactivity.html
- Techniques for making plots that better represent that data - https://datashader.org/user_guide/Plotting_Pitfalls.html
- Data are binned to viewport pixels.
- Fine-grained control of the binning calculation is available, as well as the option to render the bins into an image.

#### Dask

- Use `persist()` on data that will used multiple times in your following analyses, including running a plot multiple times

#### Bokeh

- https://docs.bokeh.org/en/latest/docs/user_guide.html
- Explore the user guide for customisations. Follow examples to see how options can be passed through from HoloViews or Datashader.