# Python data science ecosystem <img width="260px" align="right" src="../resources/csiro_easi_logo.png">
 
#### Index
- [Numpy, Pandas, Scipy](#Numpy,-Pandas,-Scipy)
- [Dask](#Dask)
- [Xarray](#Xarray)
- [HoloViews, GeoViews and DataShader](#HoloViews,-GeoViews-and-DataShader)
    - [HoloViews](#HoloViews)
    - [GeoViews](#GeoViews)
    - [DataShader](#DataShader)
    - [Using HoloViews and Bokeh](#Using-HoloViews-and-Bokeh)

### Numpy, Pandas, Scipy

Core Python data analysis libraries

See:
- https://numpy.org/doc/stable/
- https://pandas.pydata.org/docs/user_guide/
- https://www.scipy.org/

### Dask

Dask scales python libraries for multi-processing across a cluster.

See:
- https://docs.dask.org/
- https://docs.dask.org/en/latest/array-best-practices.html (Chunk size considerations)

### Xarray

Datacube.load() returns an Xarray object. Datacube objects contain addition context for use with datacube operations but most if not all of the Xarray operations will work with datacube objects.

http://xarray.pydata.org/
- Xarray wraps around Dask. Switch between Dask Array and NumPy with a consistent API

http://xarray.pydata.org/en/stable/dask.html#what-is-a-dask-array

-  Dask uses a lazy computation model that divides large arrays into smaller blocks called chunks. Performing an operation on a dask array queues up a series of lazy computations that map across each chunk. These computations aren’t actually performed until values from a chunk are accessed. (http://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/)

http://xarray.pydata.org/en/stable/dask.html#using-dask-with-xarray

Convert an xarray data structure from lazy Dask arrays into eager, in-memory NumPy arrays
- `ds.load()`

Load into distributed memory and keep as as Dask arrays distributed across a cluster
- `ds.persist()`

As a NumPy array
- `ds.variable.values` .. Always a NumPy array
- `np.asarray(ds.variable)` .. Explicit conversion by wrapping a DataArray with np.asarray

Optimisation tips
- http://xarray.pydata.org/en/stable/dask.html#optimization-tips

### HoloViews, GeoViews and DataShader

HoloViews, GeoViews and DataShader are all part of the HoloViz suite of tools - https://holoviz.org/

<img width="600px" src="../resources/holoviz-tools.jpg">

These tools provide a range of visualisation options for working with scientific data and several are particularly useful for working with the datacube.

See the notebook [05 - Visualising Data](05%20-%20Visualising%20Data%20-%20DRAFT.ipynb) for examples of working with these libraries.

#### HoloViews

https://holoviews.org/getting_started/Introduction.html
- HoloViews supports numpy, xarray, and dask arrays when working with array data

HoloViews integrates several common data analysis and visualisation libraries so that it is easier to do both. This means that the data objects contain all of the information required for visualisation rather than having to provide that information separately to a visualisation library. Holoviews integrates directly with plotting libraries such as [Bokeh](https://bokeh.org/), [Plotly](https://plotly.com/python/) and [Matplotlib](https://matplotlib.org/). See https://holoviews.org/getting_started/Introduction.html for an introduction to Holoviews.

HoloViews examples (https://holoviews.org/gallery/):

<img width="400px" src="../resources/holoviews-examples.jpg">

To learn more about using HoloViews to visualise gridded datasets, see: 
https://holoviews.org/getting_started/Gridded_Datasets.html

> **Bokeh** is a powerful data visualisation tool which can be used via HoloViews. It enables the creation of interactive data visualisations. Bokeh contains a wide variety of plot types for both tabular and gridded data as shown below. 
> 
> See https://bokeh.org/ and https://docs.bokeh.org/en/latest/docs/gallery.html

#### GeoViews
GeoViews is a geograhpic visualisation tool. It can plot geospatial, gridded and multidimensional data as geographic plots. As with HoloViews, GeoViews leverages plotting libraries like [Bokeh](https://bokeh.org/) and [Matplotlib](https://matplotlib.org/).

<img width="600px" src="../resources/geoviews-examples.jpg">

#### DataShader
DataShader is designed to handle very large datasets. DataShader is designed to "rasterize" or "aggregate" datasets into regular grids that can then be analysed or visualised. DataShader can also work in conjunction with [Bokeh](https://bokeh.org/) and [GeoViews](http://geoviews.org/) for interactive plots and maps.

See https://datashader.org/ for more information and examples.

<img width="400px" src="../resources/datashader-examples.jpg">

### Using HoloViews and Bokeh

In [None]:
# Basic imports for HoloViews

import holoviews as hv
from holoviews import opts
hv.extension('bokeh')