# ajcr/rolling

Computationally efficient rolling window iterators for Python (including sum, min/max, median and more...)
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
 Failed to load latest commit information. doc May 12, 2018 rolling Aug 11, 2018 .gitignore Dec 29, 2017 .travis.yml Mar 1, 2018 LICENSE Dec 29, 2017 README.md Aug 9, 2018 conftest.py Dec 30, 2017 requirements.txt Feb 25, 2018 setup.cfg Feb 25, 2018 setup.py May 12, 2018

# rolling

A collection of computationally efficient rolling window iterators and operations for Python.

This module implements useful arithmetical, logical and statistical functions on rolling/moving/sliding windows, including Sum, Min, Max, Median and Standard Deviation. There's also a flexible 'Apply' iterator whereby any function can be applied to the window. Both fixed-length and variable-length window iteration is supported.

To get started, see the Quickstart section below, or have a look at the some Recipes.

## Overview

Suppose we have a list of integer values `x` and we want to find the maximum of each window of 3 values. We could use Python's `max()` function and write `[max(x[i:i+3]) for i in range(len(x) - 2)]` to do this, for example.

But applying builtin Python's functions (like `max()` and `sum()`) to a window becomes increasingly slow as the window gets larger. The complexity is typically linear (i.e. O(k) where k is the size of the window).

For for many operations there are algorithms that return the value for the next window in sublinear time (e.g. O(log k)) or constant time (i.e. O(1)). The algorithms implemented so far in this module are summarised below:

Operation Update Memory Comments
Sum O(1) O(k) Sum of window values
Product O(1) O(k) Product of window values
Nunique O(1) O(k) Number of unique window values
Mean O(1) O(k) Arithmetic mean of window values
Median O(log k) O(k) Median, uses an indexable skiplist to maintain sorted order
Mode O(1) O(k) Set of most common values, tracked using a bi-directional counter
Var O(1) O(k) Variance, uses Welford's algorithm for better numerical stability
Std O(1) O(k) Standard deviation, uses Welford's algorithm
Skew O(1) O(k) Skewness of the window
Kurtosis O(1) O(k) Kurtosis of the window
Any O(1) O(1) True if any value in the window is True, else False
All O(1) O(1) True if all values in the window are True, else False
Min O(1) O(k) Minimum value, tracks ascending minima using a deque
MinHeap O(1) O(k) Minimum value, tracks ascending minima using a heap
Max O(1) O(k) Maximum value, tracks descending maxima using a deque
Entropy O(1) O(k) Shannon entropy of the window (for fixed-size windows only)

See the References section below for more details about the algorithms and links to other resources.

## Installation

You can install the latest release of the module using pip:

``````pip install rolling
``````

Alternatively, you can install from source on GitHub to include the very latest changes. For example:

``````git clone https://github.com/ajcr/rolling.git
cd rolling/
pip install .
``````

There are no external library dependencies for running this module.

The module is tested with Python 3.5 and above, and Python 3.4 is also known to work. Python 2 is not currently supported.

If you want to run the tests you'll need to install pytest. Once done, just run `pytest` from the base directory.

## Quickstart

Import the `rolling` module:

`>>> import rolling`

Now suppose we have this list:

`>>> counts = [1, 5, 2, 0, 3]`

The `rolling` module allows us to can create an iterator object over this list that performs a reduction operation for a given window size (we'll use a window size of 3 for this example).

Now let's create three iterator objects that roll over this list, each performing a different operation:

```>>> r_sum = rolling.Sum(counts, 3)
>>> r_all = rolling.All(counts, 3)
>>> r_max = rolling.Max(counts, 3)```

Here's the representation of the `rolling.Sum` iterator object. Note that the window type is 'fixed' by default, meaning that only full windows of the specified size are computed (the window does not roll on/off the list):

```>>> r_sum
Rolling(operation='Sum', window_size=3, window_type='fixed')```

The result of iterating over each of these rolling iterator objects using `list()` is shown below.

```>>> list(r_sum)
[8, 7, 5] # i.e. [1+5+2, 5+2+0, 2+0+3]

>>> list(r_all)
[True, False, False]

>>> list(r_max)
[5, 5, 3]```

As well as the built-in efficient algorithms provided by this module, any callable Python object can be applied to a rolling window using the `Apply()` class. For instance, Python's `tuple()` function:

```>>> r_list = rolling.Apply(counts, 3, operation=tuple)
>>> list(r_list)
[(1, 5, 2), (5, 2, 0), (2, 0, 3)]```

Variable-length windows can be specified using the `window_type` argument. This allows windows smaller than the specified size to be evaluated at the beginning and end of the iterable. For instance:

```>>> r_list = rolling.Apply([1, 5, 2, 0, 3], 3, operation=list, window_type='variable')
>>> list(r_list)
[[1],
[1, 5],
[1, 5, 2],
[5, 2, 0],
[2, 0, 3],
[0, 3],
[3]]```

## References and resources

Some rolling algorithms are widely known (e.g. 'Sum') and I am not sure which source to cite. Some algorithms I made up as I was putting the module together (e.g. 'Any', 'All'), but these are relatively simple and probably exist elsewhere.

Other rolling algorithms are very cleverly designed and I learned a lot by reading about them and seeing other peoples' implementations. Here are the main resources that I used:

• Max and Min are implemented using the Ascending Minima and Descending Maxima algorithms described by Richard Harter here. This algorithm is also used in pandas and bottleneck. My attention was first drawn to this algorithm by Jaime Fernandez del Rio's excellent talk The Secret Life Of Rolling Pandas. The algorithm is also described by Keegan Carruthers-Smith here, along with code examples.

• Median uses the indexable skiplist approach presented by Raymond Hettinger here.

• Var and Std use Welford's algorithm. I referred to the rolling variance implementation in pandas as well as an older edit of the Wikipedia page Algorithms for calculating variance.

## Discussion and future work

The algorithms implemented by this module are chosen to be efficient in the sense that the cost of computing each new return value scales efficiently with the size of window.

In practice you might find that it is quicker not to use the the 'efficient' algorithm, and instead apply a function to the window. This is especially true for very small window sizes where the cost of updating a window is relatively complex. For instance, while the window size `k` is less than approximately 50, it may quicker to use `rolling.Apply(array, k, min)` (apply Python's builtin minimum function `min()`) rather than using `rolling.Min(array, k)`.

With this in mind, in future it might be worth implementing some of the algorithms here in compiled code (e.g. as a C extension module, or using Cython) to improve speed.