Advanced Experiments with striding
==============================

A few examples of manipulating strides in numpy arrays.

Numpy arrays are wrappers around a single contiguous block of data. The "strides" are what describes how the single block of data is interpreted as an n dimensional array.

For the most part, this is all an implimentation detail, happening under the hood at the C level. But numpy lets you manipulate the strides, and this allows some really powerful tricks.

In [1]:
import numpy as np

In [11]:
# A basic 2d array:
a = np.zeros((3,4,), dtype=np.int64)
a

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [12]:
#flatten it:
a.flatten()

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [13]:
# a basic 1d array
a

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [14]:
# the strides:
a.strides

(32, 8)

This means that you need to skip 8 bytes to get from one element to the next (notice the **dtype** is 8 bytes per value)

In [10]:
a.itemsize

8

In [17]:
# reshape the array
a.shape = (2,6)
a

array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

In [18]:
# now look at the strides
a.strides

(48, 8)

Same bytes in the data block, but the strides define how it is to be interpreted.

You need to skip 48 bytes to get from the beginning of one row to the beginning of the next row (8 bytes per value, times 6 values per row)

And still 8 bytes from one element to the next in one row.

In [18]:
# now a different data type
a = np.arange(12, dtype=np.uint8)
print a

[ 0  1  2  3  4  5  6  7  8  9 10 11]


In [19]:
a.strides

(1,)

only one byte per element

In [20]:
# reshape again
a.shape = (3,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]], dtype=uint8)

In [21]:
a.strides

(4, 1)

So 4 bytes to get from one row to the next, and one from one element to the next.

In [22]:
# make it 3-d
a.shape = (2,3,2)
a

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]]], dtype=uint8)

In [23]:
a.strides

(6, 2, 1)

so six bytes to get form one "slab" to the next, 2 to get from one row to next in that slab, and still 1 to get from item to item.

The fancy stuff
================

The ``stride_tricks`` module provides utilities to manipulate the strides of numpy arrays to do tricky things.

In [24]:
a = np.arange(10, dtype=np.uint8)
a.shape = (2,5)
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]], dtype=uint8)

In [25]:
a2 = np.lib.stride_tricks.as_strided(a, (8, 3), (1,1) )
a2

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]], dtype=uint8)

huh?

``as_strided(a, (8, 3), (1,1) )``

means: take the data in array, ``a``, and make a new array of shape (8,3) with strides (1,1).

So the array is 8\*3 == 24 elements in size, but the underlying data has only 10 elements in it. This works because the strides are set to (1,1), so to find the first element of the nth row, you go n*1 bytes from the beginning:

 * the zeroth row starts at byte 0
 * the first row starts ar byte 1
 * etc..

Then to get the elements in each row you go one byte more:

 * the zeroth element in the zeroth row is at byte 0.
 * the first element in the zeroth row is at byte 1
 * the second element in the zeroth row is at byte 2
 
 * the zeroth element in the first row is at byte 1
 * the first element in the first row is at byte 2
 * the second element in the first row is at byte 3

So that's how we get a bigger array than the number of elements -- elements are re-used in multiple parts of the array.


In [26]:
a2.strides

(1, 1)

In [27]:
a.shape, a.size

((2, 5), 10)

In [28]:
a2.shape, a2.size

((8, 3), 24)

a and a2 share the same data, so if we change a, a2 changes also:

In [29]:
a[0,2] = 6
a

array([[0, 1, 6, 3, 4],
       [5, 6, 7, 8, 9]], dtype=uint8)

In [30]:
a2

array([[0, 1, 6],
       [1, 6, 3],
       [6, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]], dtype=uint8)

Note how the 6 is reused in the first three rows...

Why do this weird stuff?
----------------------------

It lets you do tricks where you want to re-use the same values without actually needing to copy them. For instance, a moving average.

In [31]:
a.mean(axis=0)

array([ 2.5,  3.5,  6.5,  5.5,  6.5])

``filter_example.py`` has a couple simple filters, using stride tricks to impliment them.

In [32]:
import filter_example

In [33]:
# create some data
a = np.arange(25)
np.random.shuffle(a)
a

array([17,  2, 22,  5, 13,  3, 12, 14, 20, 15, 21,  6,  8,  9,  7, 19,  1,
       23, 16, 10, 18, 24, 11,  0,  4])

In [34]:
filter_example.moving_average(a, 3)

array([ 13.66666667,   9.66666667,  13.33333333,   7.        ,
         9.33333333,   9.66666667,  15.33333333,  16.33333333,
        18.66666667,  14.        ,  11.66666667,   7.66666667,
         8.        ,  11.66666667,   9.        ,  14.33333333,
        13.33333333,  16.33333333,  14.66666667,  17.33333333,
        17.66666667,  11.66666667,   5.        ])

In [35]:
filter_example.scaled_by_max(a,4)

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])