# Python Plotting Showcase 3 - Stats

In this notebook we demonstrate some simple stats plots covered by hyplot.pandas.

Currently based off the pyviz tutorial on gridded data: https://hvplot.pyviz.org/user_guide/Statistical_Plots.html.

**TODO** Use a IFU dataset and expand

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import hvplot.pandas
import bokeh
#bokeh.sampledata.download()
from bokeh.sampledata import iris, stocks 

iris = iris.flowers

In [3]:
hvplot.scatter_matrix(iris, c="species")

In [4]:
hvplot.parallel_coordinates(iris, "species")

Another similar approach is to visualize the dimensions using Andrews curves,
which are constructed by generating a Fourier series from the features of each
observation, visualizing the aggregate differences between classes. 

The hvplot.andrews_curves() function provides a simple API to generate Andrews
curves from a dataframe, closely matching the API of pandas.plotting.andrews_curves().

Once again we can see the significant difference of the setosa species. However, unlike the parallel coordinate plot, the Andrews plot does not give any real quantitative insight into the features that drive those differences.

In [5]:
hvplot.andrews_curves(iris, "species")

Lastly, for the analysis of time series hvplot offers a so called lag plot, implemented by the hvplot.lag_plot()
function, modelled on the matching pandas function.

In [7]:
index = pd.DatetimeIndex(stocks.AAPL['date'])
stock_df = pd.DataFrame({'IBM': stocks.IBM['close'], 'AAPL': stocks.AAPL['close']}, index=index)

hvplot.lag_plot(stock_df, lag=365, alpha=0.3) + stock_df.hvplot.line(width=400)

## Dive into holoviews? MOVE TO EXTRA NOTEBOOK !!!

http://holoviews.org/getting_started/Gridded_Datasets.html

**TODO**:
- check if we have time for this
- maybe one example to illustrate how hvplot wraps holoviews and if you want something custom you need to use holoviews directly?

In [8]:
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', 'matplotlib')

In [10]:
import numpy as np
# modify the default parameters of np.load
data = np.load('data/twophoton.npz')
calcium_array = data['Calcium']
calcium_array.shape

(62, 111, 50)

In [20]:
ds = hv.Dataset((np.arange(50), np.arange(111), np.arange(62), calcium_array),
                ['Time', 'x', 'y'], 'Fluorescence')
ds

:Dataset   [Time,x,y]   (Fluorescence)

In [12]:
type(ds.data), list(ds.data.keys())

(dict, ['Time', 'x', 'y', 'Fluorescence'])

In [21]:
# it's better to clone to an xarray
ds = ds.clone(datatype=['xarray'])

In [23]:
ds.data

<xarray.Dataset>
Dimensions:       (Time: 50, x: 111, y: 62)
Coordinates:
  * Time          (Time) int64 0 1 2 3 4 5 6 7 8 ... 41 42 43 44 45 46 47 48 49
  * x             (x) int64 0 1 2 3 4 5 6 7 ... 103 104 105 106 107 108 109 110
  * y             (y) int64 0 1 2 3 4 5 6 7 8 9 ... 53 54 55 56 57 58 59 60 61
Data variables:
    Fluorescence  (y, x, Time) uint16 386 441 196 318 525 ... 801 899 583 774

In [24]:
opts.defaults(
    opts.GridSpace(shared_xaxis=True, shared_yaxis=True),
    opts.Image(cmap='viridis', width=400, height=400),
    opts.Labels(text_color='white', text_font_size='8pt', text_align='left', text_baseline='bottom'),
    opts.Path(color='white'),
    opts.Spread(width=600),
    opts.Overlay(show_legend=False))

In [25]:
ds.to(hv.Image, ['x', 'y']).hist()

In [26]:
ROIs = data['ROIs']
roi_bounds = hv.Path([hv.Bounds(tuple(roi)) for roi in ROIs])
print(ROIs.shape)

(147, 4)


In [27]:
labels = hv.Labels([(roi[0], roi[1], i) for i, roi in enumerate(ROIs)])
(ds[21].to(hv.Image, ['x', 'y']) * roi_bounds * labels).relabel('Time: 21')

In [28]:
x0, y0, x1, y1 = ROIs[60]
roi = ds.select(x=(x0, x1), y=(y0, y1), time=(250, 280)).relabel('ROI #60')
roi.to(hv.Image, ['x', 'y'])

In [29]:
roi.to(hv.Curve, 'Time').grid()

In [30]:
agg = roi.aggregate('Time', np.mean, spreadfn=np.std)
hv.Spread(agg) * hv.Curve(agg)