# deeptrack.statistics

<a href="https://colab.research.google.com/github/DeepTrackAI/DeepTrack2/blob/develop/tutorials/3-advanced-topics/DTAT385_statistics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# !pip install deeptrack  # Uncomment if running on Colab/Kaggle.

This advanced tutorial introduces the module deeptrack.statistics.

In [2]:
import numpy as np

from deeptrack import statistics

## 1. What is the `statistics` module?

The `statistics` module provides a set of features that perform statistical operations on input data to reduce its dimensionality. These operations include common tasks like summing, averaging, and calculating standard deviations along specified axes. The module is built around NumPy functions, so it offers familiar syntax and behavior for users. Additionally, it supports a `distributed` option, which determines whether each image in
the input list should be handled individually or not.

## 2. The `Reducer` class

The `Reducer` class is the base class for features that reduce the input dimensionality using a statistical function. This class handles most of the core logic for the operations in the statistics module, including specifying the function (e.g., sum, mean) and the axis along which to reduce. Users typically won't interact with `Reducer` directly but will instead use its subclasses (e.g., `Sum`, `Mean`, `Std`) that provide the specific statistical function to apply.

## 3. Statistical Operations

###  Sum

The `Sum` operation calculates the sum of the input values along the specified axis.

In [3]:
# Example data.

input_values = [np.ones((2,)), np.zeros((2,))]

print(input_values)

[array([1., 1.]), array([0., 0.])]


In [4]:
sum_operation = statistics.Sum(axis=0, distributed=True)

sum_result = sum_operation(input_values)

print(sum_result)

[2.0, 0.0]


Above, the sum operation is performed along axis 0, with `distributed`=True, meaning the input arrays are processed individually. 
By setting `distributed`=False, the inputs are handled together, instead of individually:

In [5]:
sum_operation = statistics.Sum(axis=0, distributed=False)

sum_result = sum_operation(input_values)

print(sum_result)

[1. 1.]


###  Product

The `Prod` operation calculates the product of the input values along the specified axis.

In [6]:
prod_operation = statistics.Prod(axis=0, distributed=True)

prod_result = prod_operation(input_values)

print(prod_result)

[1.0, 0.0]


### Other operations

Other statistical operations in the module include:

`Mean`: Computes the arithmetic mean along the specified axis.

`Median`: Computes the median along the specified axis.

`Std`: Computes the standard deviation along the specified axis.

`Variance`: Computes the variance along the specified axis.

`Cumsum`: Computes the cumulative sum along the specified axis.

`Min/Max`: Computes the minimum/maximum values along the specified axis.

`PeakToPeak`: Computes the range (max - min) along the specified axis.

`Quantile`: Computes the q-th quantile along the specified axis.

`Percentile`: Computes the q-th percentile along the specified axis.

## 4. Adding Reducers to the Pipeline

Reducers, such as `Sum`, can be added to a pipeline in two different ways. Both methods allow the reducer to be applied to the output of a preceding feature or sequence of features.

`summed_pipeline = some_pipeline_of_features >> Sum(axis=0)`

`summed_pipeline = Sum(some_pipeline_of_features, axis=0)`

However, combining these two methods is not supported and may lead to unpredictable behavior.