# Descriptive Statistics

Numpy offers many statistical functions, but if you want to obtain several
statistical variables from the same array, it' s necessary to process the data
several times to calculate the various parameters. This example shows how to use
the `DescriptiveStatistics` class to obtain several statistical variables with a
single calculation. Also, the calculation algorithm is incremental and is more
numerically stable. 

---
**Reference**

Pébay, P., Terriberry, T.B., Kolla, H. et al.
Numerically stable, scalable formulas for parallel and online
    computation of higher-order multivariate central moments
    with arbitrary weights.
Comput Stat 31, 1305–1325,
2016,
https://doi.org/10.1007/s00180-015-0637-z

---


In [1]:
import dask.array
import numpy
import pyinterp

Create a random array

In [17]:
values = numpy.random.random_sample((2, 4, 6, 8))

Create a DescriptiveStatistics object.

In [18]:
ds = pyinterp.DescriptiveStatistics(values)

The constructor will calculate the
statistical variables on the provided data. The calculated variables are
stored in the instance and can be accessed using different methods:
- mean
- var
- std
- skewness
- kurtosis
- min
- max
- sum
- sum_of_weights
- count

In [19]:
ds.count()

array([384], dtype=uint64)

In [20]:
ds.mean()

array([0.51912064])

It's possible to get a structured numpy array containing the different
statistical variables calculated.

In [21]:
ds.array()

array([(384, -1.18435337, 0.99856285, 0.51912064, 0.00063736, -0.09977946, 384., 199.34232616, 0.08335215)],
      dtype=[('count', '<u8'), ('kurtosis', '<f8'), ('max', '<f8'), ('mean', '<f8'), ('min', '<f8'), ('skewness', '<f8'), ('sum_of_weights', '<f8'), ('sum', '<f8'), ('var', '<f8')])

Like numpy, it's possible to compute statistics along axis.

In [28]:
ds = pyinterp.DescriptiveStatistics(values, axis=(1, 2))
ds.mean()

array([[0.62106639, 0.55990265, 0.49600494, 0.59198591, 0.38853371,
        0.4836088 , 0.52754268, 0.43650409],
       [0.38897503, 0.47861264, 0.47846479, 0.5845428 , 0.55372859,
        0.55683471, 0.61509492, 0.5445276 ]])

The class can also process a dask array. In this case, the call to the
constructor triggers the calculation.

In [29]:
ds = pyinterp.DescriptiveStatistics(
    dask.array.from_array(values, chunks=(2, 2, 2, 2)),
    axis=(1, 2))
ds.mean()

array([[0.62106639, 0.55990265, 0.49600494, 0.59198591, 0.38853371,
        0.4836088 , 0.52754268, 0.43650409],
       [0.38897503, 0.47861264, 0.47846479, 0.5845428 , 0.55372859,
        0.55683471, 0.61509492, 0.5445276 ]])

Finally, it's possible to calculate weighted statistics.

In [33]:
weights = numpy.random.random_sample((2, 4, 6, 8))
ds = pyinterp.DescriptiveStatistics(values, weights=weights, axis=(1, 2))
ds.mean()

array([[0.64582051, 0.59698443, 0.51044621, 0.52356292, 0.36105208,
        0.48819244, 0.56269849, 0.4328734 ],
       [0.42408451, 0.43184485, 0.44069451, 0.57962569, 0.53435349,
        0.56716452, 0.65048573, 0.54651728]])