# StatArray Class
### Extends the numpy ndarray class to add extra attributes such as names, and units, and allows us to attach statistical descriptors of the array.  The direct extension to numpy maintains speed and functionality of numpy arrays.
##### Back to [Main](MAIN_PAGE.ipynb)

In [None]:
%matplotlib notebook
%load_ext autoreload
%autoreload 2

In [None]:
from geobipy import StatArray
import numpy as np
import matplotlib.pyplot as plt
import h5py
from geobipy import hdfRead

## Instantiating a new StatArray class
The StatArray can take any numpy function that returns an array as an input.  The name and units of the variable can be assigned to the StatArray.

In [None]:
Velocity=StatArray(np.random.randn(3),name="Velocity",units="$\frac{m}{s}$")
Velocity.summary()

## Attaching a Prior and Proposal Distributions to an StatArray
The StatArray class has been built so that we may easily attach not only names and units, but statistical distributions too.  We won't go into too much detail about the different distribution classes here so check out [This Notebook](Distributions.ipynb) for a better description.

Two types of distributions can be attached to the StatArray.
* Prior Distribution
    The prior represents how the user believes the variable should behave from a statistical standpoint.  The values of the variable can be evaluated against the attached prior, to determine how likely they are to have occured [Wiki Page](https://en.wikipedia.org/wiki/Prior_probability).
* Proposal Distribution
    The proposal describes a probability distribution from which to sample when we wish to perturb the variable [Wiki Page](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm).

### Attach a univariate normal distribution as the prior

In [None]:
# Obtain an instantiation of a random number generator
prng = np.random.RandomState()
mean = 0.0
variance = 1.0
Velocity.setPrior('Normal',mean, variance, prng=prng)

#### We can also attach a proposal distribution

In [None]:
Velocity.setProposal('Normal',mean,variance, prng=prng)
Velocity.summary()
print("Class type of the prior: ",type(Velocity.prior))
print("Class type of the proposal: ",type(Velocity.proposal))

#### The values in the variable can be evaluated against the prior
In this case, we have 3 elements in the variable, and a univariate Normal for the prior. Therefore each element is evaluated to get 3 probabilities, one for each element.

In [None]:
Velocity.probability()

The univarite proposal distribution can generate random samples from itself.

In [None]:
Velocity.proposal.rng()

We can perturb the variable by drawing from the attached proposal distribution.

In [None]:
Velocity.perturb()

### Attach multivariate normal distributions as the prior and proposal
Attach the multivariate prior

In [None]:
mean = np.random.randn(Velocity.size)
variance = np.ones(Velocity.size)
Velocity.setPrior('MvNormal', mean, variance, prng=prng)

Since the prior is multivariate, the appropriate equations are used to evaluate the probability for all elements in the StatArray.

In [None]:
Velocity.probability()

Attach the multivariate proposal

In [None]:
mean = np.random.randn(Velocity.size)
variance = np.ones(Velocity.size)
Velocity.setProposal('MvNormal', mean, variance, prng=prng)

Perturb the variables using the multivariate proposal.

In [None]:
Velocity.perturb()

### Plotting
#### We can easily plot the StatArray with its built in plotting functions.  All plotting functions can take the matplotlib keywords

The simplest is to just plot the array

In [None]:
Velocity=StatArray(np.random.randn(1000),name="Velocity",units="$\frac{m}{s}$")
Time = StatArray(np.linspace(0, 100, Velocity.size), name='Time', units='s')
Distance = StatArray(np.random.exponential(size=Velocity.size), name='Distance', units='m')

In [None]:
plt.figure()
Velocity.plot(linewidth=0.5, marker='x', markersize=1.0)

We can quickly plot a bar graph.

In [None]:
plt.figure()
Velocity.bar()

#### We can scatter the contents of the StatArray if it is 1D

In [None]:
plt.figure()
Velocity.scatter(alpha=0.7)

#### Histogram Equalization
A neat trick with colourmaps is histogram equalization.  This approach forces all colours in the images to have an equal weight.  This distorts the colour bar, but can really highlight the lower and higher ends of whatever you are plotting. Just add the equalize keyword!

In [None]:
plt.figure()
Velocity.scatter(alpha=0.7, equalize=True)

#### Take the log base(x) of the data
We can also take the data to a log, log10, log2, or a custom number!

In [None]:
plt.figure()
Velocity.scatter(alpha=0.7,edgecolor='k',log='e') # could also use log='e', log=2, log=x) where x is the base you require

#### X and Y axes
We can specify the x axis of the scatter plot.

In [None]:
plt.figure()
Velocity.scatter(x=Time, alpha=0.7, edgecolor='k')

Notice that I never specified the y axis, so the y axis defaulted to the values in the StatArray. In this case, any operations applied to the colours, are also applied to the y axis, e.g. log=10.  When I take the values of Velocity to log base 10, because I do not specify the y plotting locations, those locations are similarly affected.

I can however force the y co-ordinates by specifying it as input. In the second subplot I explicitly plot distance on the y axis. In the first subplot, the y axis is the same as the colourbar.

In [None]:
plt.figure()
ax1 = plt.subplot(211)
Velocity.scatter(x=Time, alpha=0.7, edgecolor='k', log=10)
plt.subplot(212, sharex=ax1)
Velocity.scatter(x=Time, y=Distance, alpha=0.7, edgecolor='k', log=10)

#### Point sizes
Since the plotting functions take matplotlib keywords, I can also specify the size of each points.

In [None]:
s = np.ceil(100*(np.abs(np.random.randn(Velocity.size))))
plt.figure()
plt.tight_layout()
ax1 = plt.subplot(211)
Velocity.scatter(x=Time, y=Distance, s=s, alpha=0.7,edgecolor='k', sizeLegend=2)
plt.subplot(212, sharex=ax1)
#Velocity.scatter(x=Time, y=Distance, s=s, alpha=0.7,edgecolor='k', sizeLegend=[1.0, 100, 200, 300])
v = np.abs(Velocity)+1.0
Velocity.scatter(x=Time, y=Distance, s=s, alpha=0.7,edgecolor='k', sizeLegend=[1.0, 100, 200, 300], log=10)

Of course we can still take the log, or equalize the colour histogram

In [None]:
plt.figure()
Velocity.scatter(x=Time, y=Distance, s=s, alpha=0.7,edgecolor='k',equalize=True,log=10)

#### Typically pcolor only works with 2D arrays. The StatArray has a pcolor method that will pcolor a 1D array

In [None]:
plt.figure()
plt.subplot(221)
Velocity.pcolor()
plt.subplot(222)
Velocity.pcolor(y=Time)
plt.subplot(223)
Velocity.pcolor(y=Time, flipy=True)
plt.subplot(224)
Velocity.pcolor(y=Time, log=10, equalize=True)

#### We can plot a histogram of the StatArray

In [None]:
plt.figure()
Velocity.hist(100)

#### We can write the StatArray to a HDF5 file.  HDF5 files are binary files that can include compression.  They allow quick and easy access to parts of the file, and can also be written to and read from in parallel!

In [None]:
with h5py.File('1Dtest.h5','w') as f:
    Velocity.toHdf(f,'test')

#### We can then read the StatArray from the file
Here x is a new variable, that is read in from the hdf5 file we just wrote. 

In [None]:
x = hdfRead.readKeyFromFiles('1Dtest.h5','/','test')
print('x has the same values as Velocity? ',np.all(x == Velocity))
x[2] = 5.0 # Change one of the values in x
print('x has its own memory allocated (not a reference/pointer)? ',np.all(x == Velocity) == False)

## We can also define a 2D array

In [None]:
Velocity=StatArray(np.random.randn(50,100),"Velocity","$\frac{m}{s}$")
Velocity.summary()

### The StatArray Class's functions work whether it is 1D or 2D

#### We can still do a histogram

In [None]:
plt.figure()
Velocity.hist()

#### And we can use pcolor to plot the 2D array

In [None]:
plt.figure()
ax = Velocity.pcolor()

#### The StatArray comes with extra plotting options
Here we specify the x and y axes for the 2D array using two other 1D StatArrays

In [None]:
plt.figure()
x = StatArray(np.arange(100),name='x Axis',units = 'mm')
y = StatArray(np.arange(50),name='y Axis',units = 'elephants')
ax=Velocity.pcolor(x=x,y=y)

We can plot using a log10 scale, in this case, we have values that are less than or equal to 0.0.  Plotting with the log option will by default mask any of those values, and will let you know that it has done so!

In [None]:
plt.figure()
ax=Velocity.pcolor(x=x,y=y,log=2)

A neat trick with colourmaps is histogram equalization.  This approach forces all colours in the image to have an equal amount.  This distorts the colours, but can really highlight the lower and higher ends of whatever you are plotting

In [None]:
plt.figure()
ax=Velocity.pcolor(x=x,y=y,equalize=True)

We can equalize the log10 plot too :)

In [None]:
plt.figure()
ax=Velocity.pcolor(x=x,y=y,equalize=True, log=10)

### Create a stacked area plot of a 2D StatArray

In [None]:
A=StatArray(np.abs(np.random.randn(13,100)), name='Variable', units="units")
x = StatArray(np.arange(100),name='x Axis',units = 'mm')
plt.figure()
ax1 = plt.subplot(211)
A.stackedAreaPlot(x=x, axis=1)
plt.subplot(212, sharex=ax1)
A.stackedAreaPlot(x=x, i=np.s_[[1,3,4],:], axis=1, labels=['a','b','c'])