# Advanced NumPy

## `numpy` internals

In [1]:
import numpy as np
np.random.seed(2374)

In [2]:
arr = np.random.randint(10, size=(8,8))

Information about array elements:

In [3]:
arr.itemsize, arr.dtype

(4, dtype('int32'))

In [4]:
arr

array([[5, 7, 6, 8, 2, 9, 2, 0],
       [3, 1, 5, 8, 7, 5, 3, 1],
       [8, 6, 9, 3, 3, 5, 1, 2],
       [0, 7, 5, 7, 1, 2, 8, 1],
       [4, 3, 9, 5, 2, 4, 5, 2],
       [8, 4, 8, 7, 8, 9, 2, 2],
       [9, 8, 7, 4, 7, 3, 8, 2],
       [0, 3, 6, 6, 4, 0, 9, 4]])

How to step through array memory? Using `strides` property:

In [5]:
arr.strides

(32, 4)

I. e. `arr[0, 1]` is 8 bytes away from `arr[0, 0]` (one step along axis `1`), while `arr[1, 0]` is 64 bytes away from `arr[0, 0]` (one step along axis `0`).

In [6]:
arr.strides[0] == arr.shape[1] * arr.itemsize

True

But what about views?

In [7]:
arr_view = arr[::2, 1:]

In [8]:
arr

array([[5, 7, 6, 8, 2, 9, 2, 0],
       [3, 1, 5, 8, 7, 5, 3, 1],
       [8, 6, 9, 3, 3, 5, 1, 2],
       [0, 7, 5, 7, 1, 2, 8, 1],
       [4, 3, 9, 5, 2, 4, 5, 2],
       [8, 4, 8, 7, 8, 9, 2, 2],
       [9, 8, 7, 4, 7, 3, 8, 2],
       [0, 3, 6, 6, 4, 0, 9, 4]])

In [9]:
arr_view

array([[7, 6, 8, 2, 9, 2, 0],
       [6, 9, 3, 3, 5, 1, 2],
       [3, 9, 5, 2, 4, 5, 2],
       [8, 7, 4, 7, 3, 8, 2]])

Information about underlying array structure:

In [10]:
arr.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [11]:
arr_view.flags

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Views always have base array:

In [12]:
arr_view.base

array([[5, 7, 6, 8, 2, 9, 2, 0],
       [3, 1, 5, 8, 7, 5, 3, 1],
       [8, 6, 9, 3, 3, 5, 1, 2],
       [0, 7, 5, 7, 1, 2, 8, 1],
       [4, 3, 9, 5, 2, 4, 5, 2],
       [8, 4, 8, 7, 8, 9, 2, 2],
       [9, 8, 7, 4, 7, 3, 8, 2],
       [0, 3, 6, 6, 4, 0, 9, 4]])

In [None]:
arr.base

In [None]:
arr_view.base is arr

`strides` are provided with respect to the **underlying data** (which is the same between original array `arr` and view array `arr_view`!):

In [None]:
arr_view.strides

In [None]:
arr_view.shape

Since view is not contiguous, this relation is not True anymore:

In [None]:
arr_view.strides[0] == arr_view.shape[1] * arr_view.itemsize

Also, view starts not from byte 0 of the data, but steps 8 bytes inside the data:

In [None]:
np.byte_bounds(arr_view)[0] - np.byte_bounds(arr)[0]

In [None]:
np.byte_bounds(arr_view)

In [None]:
np.byte_bounds(arr_view)[1] - np.byte_bounds(arr)[1]

In [None]:
arr

In [None]:
arr_view

In [None]:
arr_view.strides

In [None]:
arr.T

In [None]:
arr.T.strides

Transpose reports similar strides, is it a view?

In [None]:
arr_view.T.strides

In [None]:
arr_view.T[::2, 1:].base is arr

## Cache effects

In [None]:
large_arr = np.random.randint(100, size=(1000000,))

In [None]:
STEP = 4
larger_arr = np.random.randint(100, size=(1000000*STEP,),
                               dtype=np.int8)

In [None]:
larger_arr.shape, large_arr.shape

In [None]:
%timeit -n 100 -r 3 large_arr.sum()

In [32]:
%timeit -n 100 -r 3 larger_arr[::STEP].sum()

782 µs ± 105 µs per loop (mean ± std. dev. of 3 runs, 100 loops each)


In [33]:
del large_arr, larger_arr

In [34]:
large_arr = np.random.randint(100, size=(5, 10000000))

In [35]:
large_arr.nbytes // (1024*1024)

190

In [36]:
large_arr

array([[63, 98, 45, ..., 88, 56, 12],
       [90, 62,  5, ..., 37, 64, 20],
       [21, 89, 26, ..., 40, 55, 11],
       [55, 23, 76, ..., 73, 61,  7],
       [20, 52, 51, ..., 91, 47, 35]])

In [37]:
large_arr.T

array([[63, 90, 21, 55, 20],
       [98, 62, 89, 23, 52],
       [45,  5, 26, 76, 51],
       ...,
       [88, 37, 40, 73, 91],
       [56, 64, 55, 61, 47],
       [12, 20, 11,  7, 35]])

In [38]:
large_arr.T.flags

  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [39]:
large_arr.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [40]:
%timeit -n 50 -r 3 large_arr.sum(axis=1).sum(axis=0)

23.4 ms ± 642 µs per loop (mean ± std. dev. of 3 runs, 50 loops each)


In [41]:
%timeit -n 50 -r 3 large_arr.sum(axis=0).sum()

62 ms ± 3.69 ms per loop (mean ± std. dev. of 3 runs, 50 loops each)


In [42]:
%timeit -n 50 -r 3 large_arr.T.sum(axis=0)

23.6 ms ± 443 µs per loop (mean ± std. dev. of 3 runs, 50 loops each)


In [43]:
large_arr.T.base is large_arr

True

## Mamory allocations in computations

How long does it take to create a copy?

In [44]:
%timeit -n 20 -r 3 large_arr.copy()

74.2 ms ± 434 µs per loop (mean ± std. dev. of 3 runs, 20 loops each)


Operations create new arrays as well:

In [45]:
%timeit -n 20 -r 3 large_arr + 1

83 ms ± 592 µs per loop (mean ± std. dev. of 3 runs, 20 loops each)


`np.add` and `+` do more or less the same:

In [46]:
%timeit -n 20 -r 3 np.add(large_arr, 1)

84 ms ± 1.98 ms per loop (mean ± std. dev. of 3 runs, 20 loops each)


But in-place operations are faster (no allocations):

In [47]:
%timeit -n 100 -r 3 np.add(large_arr, 1, out=large_arr)

27.4 ms ± 279 µs per loop (mean ± std. dev. of 3 runs, 100 loops each)


In [48]:
large_arr

array([[363, 398, 345, ..., 388, 356, 312],
       [390, 362, 305, ..., 337, 364, 320],
       [321, 389, 326, ..., 340, 355, 311],
       [355, 323, 376, ..., 373, 361, 307],
       [320, 352, 351, ..., 391, 347, 335]])

In [49]:
np.add(large_arr, 1, out=large_arr)

array([[364, 399, 346, ..., 389, 357, 313],
       [391, 363, 306, ..., 338, 365, 321],
       [322, 390, 327, ..., 341, 356, 312],
       [356, 324, 377, ..., 374, 362, 308],
       [321, 353, 352, ..., 392, 348, 336]])

# Broadcasting

How can we operate on arrays of different shapes? Should we reshape them first to a common shape?

In [50]:
arr_2d = np.random.randint(10, size=(10, 3))
arr_1d_1 = np.random.randint(10, size=(3, ))
arr_1d_2 = np.random.randint(10, size=(10, ))

In [51]:
arr_2d

array([[4, 1, 7],
       [0, 6, 0],
       [3, 8, 0],
       [3, 5, 0],
       [3, 9, 4],
       [7, 6, 3],
       [2, 6, 7],
       [8, 8, 0],
       [4, 6, 4],
       [1, 3, 6]])

In [52]:
arr_1d_1

array([1, 3, 0])

In [53]:
arr_1d_2

array([8, 5, 0, 5, 2, 6, 9, 8, 5, 8])

In [54]:
arr_2d, arr_1d_1

(array([[4, 1, 7],
        [0, 6, 0],
        [3, 8, 0],
        [3, 5, 0],
        [3, 9, 4],
        [7, 6, 3],
        [2, 6, 7],
        [8, 8, 0],
        [4, 6, 4],
        [1, 3, 6]]),
 array([1, 3, 0]))

Can we add the two?

In [55]:
arr_2d + arr_1d_1

array([[ 5,  4,  7],
       [ 1,  9,  0],
       [ 4, 11,  0],
       [ 4,  8,  0],
       [ 4, 12,  4],
       [ 8,  9,  3],
       [ 3,  9,  7],
       [ 9, 11,  0],
       [ 5,  9,  4],
       [ 2,  6,  6]])

But what was really added to `arr_2d`?

In [58]:
(arr_2d + arr_1d_1) - arr_2d

array([[1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0]])

Can we do the same with `arr_1d_2`?

In [59]:
arr_2d + arr_1d_2

ValueError: operands could not be broadcast together with shapes (10,3) (10,) 

We need to change `arr_1d_2` shape first:

In [60]:
arr_2d + arr_1d_2.reshape((10,1))

array([[12,  9, 15],
       [ 5, 11,  5],
       [ 3,  8,  0],
       [ 8, 10,  5],
       [ 5, 11,  6],
       [13, 12,  9],
       [11, 15, 16],
       [16, 16,  8],
       [ 9, 11,  9],
       [ 9, 11, 14]])

Alternatively, we can do:

In [61]:
np.expand_dims(arr_1d_2, axis=1)

array([[8],
       [5],
       [0],
       [5],
       [2],
       [6],
       [9],
       [8],
       [5],
       [8]])

In [64]:
arr_2d

array([[4, 1, 7],
       [0, 6, 0],
       [3, 8, 0],
       [3, 5, 0],
       [3, 9, 4],
       [7, 6, 3],
       [2, 6, 7],
       [8, 8, 0],
       [4, 6, 4],
       [1, 3, 6]])

In [65]:
arr_1d_2

array([8, 5, 0, 5, 2, 6, 9, 8, 5, 8])

In [63]:
arr_2d + np.expand_dims(arr_1d_2, axis=1)

array([[12,  9, 15],
       [ 5, 11,  5],
       [ 3,  8,  0],
       [ 8, 10,  5],
       [ 5, 11,  6],
       [13, 12,  9],
       [11, 15, 16],
       [16, 16,  8],
       [ 9, 11,  9],
       [ 9, 11, 14]])

It seems `arr_1d_2` was "replicated" in the same way as `arr_1d_1` but along different axis:

In [None]:
(arr_2d + np.expand_dims(arr_1d_2, axis=1)) - arr_2d

To reveal the pattern, let's try a `3D` array:

In [None]:
arr_3d = np.random.randint(10, size=(7, 10, 3))

In [None]:
arr_1d_1.shape

In [None]:
arr_3d

In [None]:
arr_3d + arr_1d_1

In [None]:
(arr_3d + arr_1d_1) - arr_3d

In [None]:
arr_3d.shape, arr_1d_1.shape

Can we do the same with `arr_1d_2`?

In [None]:
arr_3d + arr_1d_2

In [None]:
arr_3d.shape, arr_1d_2.shape, np.expand_dims(arr_1d_2, axis=1).shape

In [None]:
(arr_3d + np.expand_dims(arr_1d_2, axis=1)) - arr_3d

Broadcasting rules:
    
- All input arrays with `ndim` smaller than the input array of largest `ndim`, have 1’s **prepended** to their shapes.
- The size in each dimension of the output shape is the **maximum** of all the input sizes in that dimension.
- An input can be used in the calculation if its size in a particular dimension either **matches** the output size in that dimension, or **is exactly 1**.
- If an input has a dimension of size 1 in its shape, the first data entry in that dimension will be used for all calculations along that dimension. In other words, the stepping machinery of a `ufunc` will simply not step along that dimension (stride will be 0 for that dimension).

### How broadcasting really works

What happens, when we add a unit dimension somewhere?

In [68]:
arr_1d_1[np.newaxis, :]

(1, 3)

In [None]:
arr_1d_1[np.newaxis, :].strides

`strides[0]` is `0`, which means we can use dimension `0` of `arr_1d_1[np.newaxis, :]` in any (underlying, C) loop with any number of iterations. Let's emulate this in pure Python:

In [69]:
arr_2d

array([[4, 1, 7],
       [0, 6, 0],
       [3, 8, 0],
       [3, 5, 0],
       [3, 9, 4],
       [7, 6, 3],
       [2, 6, 7],
       [8, 8, 0],
       [4, 6, 4],
       [1, 3, 6]])

In [None]:
for i in range(arr_2d.shape[0]):

    print(f"Adding elements of row {i}")

    for j in range(arr_2d.shape[1]):
        arr_2d_address = arr_2d.strides[1] * j + arr_2d.strides[0] * i
        arr_1d_address = arr_1d_1_bc.strides[1] * j + arr_1d_1_bc.strides[0] * i

        print(f"\tarr_2d address: {arr_2d_address}")
        print(f"\tarr_1d_1_bc address: {arr_1d_address}")
    print("-" * 80)