Computationally efficient rolling window iterators for Python (including sum, min/max, median and more...)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc Version 0.2.0 May 12, 2018
rolling Fix code formatting Aug 11, 2018
.gitignore Initial commit Dec 29, 2017
.travis.yml Add test coverage metic via codecov Mar 1, 2018
LICENSE Initial commit Dec 29, 2017
README.md Add rolling product Aug 9, 2018
conftest.py initial commit Dec 30, 2017
requirements.txt Add requirements.txt Feb 25, 2018
setup.cfg Add setup.cfg Feb 25, 2018
setup.py Version 0.2.0 May 12, 2018

README.md

rolling

PyPI version travis-ci codecov

A collection of computationally efficient rolling window iterators and operations for Python.

This module implements useful arithmetical, logical and statistical functions on rolling/moving/sliding windows, including Sum, Min, Max, Median and Standard Deviation. There's also a flexible 'Apply' iterator whereby any function can be applied to the window. Both fixed-length and variable-length window iteration is supported.

To get started, see the Quickstart section below, or have a look at the some Recipes.

Overview

Suppose we have a list of integer values x and we want to find the maximum of each window of 3 values. We could use Python's max() function and write [max(x[i:i+3]) for i in range(len(x) - 2)] to do this, for example.

But applying builtin Python's functions (like max() and sum()) to a window becomes increasingly slow as the window gets larger. The complexity is typically linear (i.e. O(k) where k is the size of the window).

For for many operations there are algorithms that return the value for the next window in sublinear time (e.g. O(log k)) or constant time (i.e. O(1)). The algorithms implemented so far in this module are summarised below:

Operation Update Memory Comments
Sum O(1) O(k) Sum of window values
Product O(1) O(k) Product of window values
Nunique O(1) O(k) Number of unique window values
Mean O(1) O(k) Arithmetic mean of window values
Median O(log k) O(k) Median, uses an indexable skiplist to maintain sorted order
Mode O(1) O(k) Set of most common values, tracked using a bi-directional counter
Var O(1) O(k) Variance, uses Welford's algorithm for better numerical stability
Std O(1) O(k) Standard deviation, uses Welford's algorithm
Skew O(1) O(k) Skewness of the window
Kurtosis O(1) O(k) Kurtosis of the window
Any O(1) O(1) True if any value in the window is True, else False
All O(1) O(1) True if all values in the window are True, else False
Min O(1) O(k) Minimum value, tracks ascending minima using a deque
MinHeap O(1) O(k) Minimum value, tracks ascending minima using a heap
Max O(1) O(k) Maximum value, tracks descending maxima using a deque
Entropy O(1) O(k) Shannon entropy of the window (for fixed-size windows only)

See the References section below for more details about the algorithms and links to other resources.

Installation

You can install the latest release of the module using pip:

pip install rolling

Alternatively, you can install from source on GitHub to include the very latest changes. For example:

git clone https://github.com/ajcr/rolling.git
cd rolling/
pip install .

There are no external library dependencies for running this module.

The module is tested with Python 3.5 and above, and Python 3.4 is also known to work. Python 2 is not currently supported.

If you want to run the tests you'll need to install pytest. Once done, just run pytest from the base directory.

Quickstart

Import the rolling module:

>>> import rolling

Now suppose we have this list:

>>> counts = [1, 5, 2, 0, 3]

The rolling module allows us to can create an iterator object over this list that performs a reduction operation for a given window size (we'll use a window size of 3 for this example).

Now let's create three iterator objects that roll over this list, each performing a different operation:

>>> r_sum = rolling.Sum(counts, 3)
>>> r_all = rolling.All(counts, 3)
>>> r_max = rolling.Max(counts, 3)

Here's the representation of the rolling.Sum iterator object. Note that the window type is 'fixed' by default, meaning that only full windows of the specified size are computed (the window does not roll on/off the list):

>>> r_sum
Rolling(operation='Sum', window_size=3, window_type='fixed')

The result of iterating over each of these rolling iterator objects using list() is shown below.

>>> list(r_sum)
[8, 7, 5] # i.e. [1+5+2, 5+2+0, 2+0+3]

>>> list(r_all)
[True, False, False]

>>> list(r_max)
[5, 5, 3]

As well as the built-in efficient algorithms provided by this module, any callable Python object can be applied to a rolling window using the Apply() class. For instance, Python's tuple() function:

>>> r_list = rolling.Apply(counts, 3, operation=tuple)
>>> list(r_list)
[(1, 5, 2), (5, 2, 0), (2, 0, 3)]

Variable-length windows can be specified using the window_type argument. This allows windows smaller than the specified size to be evaluated at the beginning and end of the iterable. For instance:

>>> r_list = rolling.Apply([1, 5, 2, 0, 3], 3, operation=list, window_type='variable')
>>> list(r_list)
[[1],
 [1, 5],
 [1, 5, 2],
 [5, 2, 0],
 [2, 0, 3],
 [0, 3],
 [3]]

References and resources

Some rolling algorithms are widely known (e.g. 'Sum') and I am not sure which source to cite. Some algorithms I made up as I was putting the module together (e.g. 'Any', 'All'), but these are relatively simple and probably exist elsewhere.

Other rolling algorithms are very cleverly designed and I learned a lot by reading about them and seeing other peoples' implementations. Here are the main resources that I used:

  • Max and Min are implemented using the Ascending Minima and Descending Maxima algorithms described by Richard Harter here. This algorithm is also used in pandas and bottleneck. My attention was first drawn to this algorithm by Jaime Fernandez del Rio's excellent talk The Secret Life Of Rolling Pandas. The algorithm is also described by Keegan Carruthers-Smith here, along with code examples.

  • Median uses the indexable skiplist approach presented by Raymond Hettinger here.

  • Var and Std use Welford's algorithm. I referred to the rolling variance implementation in pandas as well as an older edit of the Wikipedia page Algorithms for calculating variance.

Discussion and future work

The algorithms implemented by this module are chosen to be efficient in the sense that the cost of computing each new return value scales efficiently with the size of window.

In practice you might find that it is quicker not to use the the 'efficient' algorithm, and instead apply a function to the window. This is especially true for very small window sizes where the cost of updating a window is relatively complex. For instance, while the window size k is less than approximately 50, it may quicker to use rolling.Apply(array, k, min) (apply Python's builtin minimum function min()) rather than using rolling.Min(array, k).

With this in mind, in future it might be worth implementing some of the algorithms here in compiled code (e.g. as a C extension module, or using Cython) to improve speed.