# NumPy basics

## Import convention

In [1]:
import numpy as np
np.random.seed(657743)

In [2]:
%pylab inline
plt.style.use("bmh")

Populating the interactive namespace from numpy and matplotlib


## Creating arrays from Python sequences

Core data structure in NumPy:

In [3]:
?np.ndarray

Creating a $2\times 2$ array of floating point numbers (note the garbage in the resulting array):

In [8]:
np.ndarray((2,2,5), dtype=float)

array([[[9.72500563e-312, 6.27463370e-322, 0.00000000e+000,
         0.00000000e+000, 1.89146896e-307],
        [5.30276956e+180, 1.02941932e-071, 4.57222660e-071,
         5.06793390e-086, 3.35960335e-143]],

       [[6.01433264e+175, 6.93885958e+218, 5.56218858e+180,
         3.94356143e+180, 9.77382138e+165],
        [3.99840503e+175, 2.31655091e-056, 4.27933834e-033,
         5.25322566e-144, 1.50008929e+248]]])

By default `float` is 64-bit floating point number:

In [10]:
np.ndarray((2,2)).dtype

dtype('float64')

It works, but is not very convenient. A more convenient high-level option is

In [11]:
?np.array

In [14]:
arr = np.array([[7, 2, 3.], [3, 9, 6]])

In [25]:
arr=np.concatenate([arr,arr])

In [26]:
arr

array([[7., 2., 3.],
       [3., 9., 6.],
       [7., 2., 3.],
       [3., 9., 6.]])

Each array has known `shape`, `size` and `ndim`:

In [21]:
arr.shape[0]

2

In [27]:
print("Array shape is", arr.shape)
print("Array size is", arr.size)
print(f"Array has {arr.ndim} dimensions")

Array shape is (4, 3)
Array size is 12
Array has 2 dimensions


And `dtype`, `itemsize` and `nbytes`:

In [28]:
print("Array dtype is", arr.dtype)
print(f"Each item takes {arr.itemsize} bytes")
print(f"Array takes {arr.nbytes} bytes")

Array dtype is float64
Each item takes 8 bytes
Array takes 96 bytes


## Creating arrays of special shape and/or type

Array with specific `shape` and `dtype`, filled with `0`'s:

In [29]:
zeros_array = np.zeros((2,6), dtype=bool)
zeros_array

array([[False, False, False, False, False, False],
       [False, False, False, False, False, False]])

Array of `0`'s with the same shape as `arr`, but of different `dtype`:

In [35]:
zeros_like_array = np.zeros_like(arr, dtype=np.complex128)
zeros_like_array

array([[0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j]])

Array with specific `shape` and `dtype`, filled with `1`'s:

In [36]:
ones_array = np.ones((3,9), dtype=np.float32)
ones_array

array([[1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)

Array of `1`'s with the same shape as `zeros_array`, and of different `dtype`:

In [37]:
zeros_array

array([[False, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [38]:
ones_like_array = np.ones_like(zeros_array, dtype=np.float32)
ones_like_array

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]], dtype=float32)

Range arrays are very common for indexing and as a drop-in replacement for built-int `range`.

The most simple form is `np.arange(n)`: **start** at default `0`, **increment** by default `1`, **end** at `n` (exclusive):

In [40]:
range_array = np.arange(10)
range_array*10

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Or you can specify both **starting** (inclusive) and **ending** points (exclusive):

In [41]:
range_array = np.arange(-5, 5)
range_array

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

Or all three:

In [49]:
range_array = np.arange(0, 5, .3)
range_array

array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. , 3.3, 3.6,
       3.9, 4.2, 4.5, 4.8])

Negative increment (or *step*) works as usual, but beware of bounds ordering:

In [45]:
range_array = np.arange(0, 10, -2)

In [46]:
range_array

array([], dtype=int32)

In [47]:
range_array = np.arange(10, 0, -2)
range_array

array([10,  8,  6,  4,  2])

It's not only integer (hence, it's a generalization of `range`):

In [50]:
range_float_array = np.arange(-0.5, 5., 0.5)
range_float_array

array([-0.5,  0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

## Basic indexing of numpy arrays

Integer and slicing notations:

In [52]:
range_float_array

array([-0.5,  0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

Get first element:

In [53]:
range_float_array[1]

0.0

Get a slice (right index is not included):

In [56]:
range_float_array[1:7]

array([0. , 0.5, 1. , 1.5, 2. , 2.5])

Get a slice:

In [57]:
range_float_array[:5]

array([-0.5,  0. ,  0.5,  1. ,  1.5])

Get a slice with negative indices:

In [58]:
range_float_array[-5:-2]

array([2.5, 3. , 3.5])

Indexing 2D arrays:

In [59]:
arr

array([[7., 2., 3.],
       [3., 9., 6.],
       [7., 2., 3.],
       [3., 9., 6.]])

In [60]:
arr[:1, 1:]

array([[2., 3.]])

In [63]:
arr[0, 1:]

array([2., 3.])

In [65]:
arr[0, ::2]

array([7., 3.])

Generally, basic indexing works very similar to usual Python lists, but in many dimensions.

## Boolean and fancy indexing

In [66]:
random_array = np.random.randn(10)
random_array

array([-1.11204182,  0.46910808, -0.68408549,  0.69242926,  1.15223867,
        1.17339163, -0.11879617,  0.06423215, -1.2978598 , -0.64957899])

Most operations (arithmetic, logical, etc.) are vectorized for NumPy arrays and we do not need loops at all. For example, to create a boolean **mask** (i.e. `>` is a **vectorized** operation):

In [67]:
random_array>0

array([False,  True, False,  True,  True,  True, False,  True, False,
       False])

Boolean masks can be used for indexing (including logical operations on mask themselves, as they are vectorized as well!):

In [69]:
random_array[random_array>0]

array([0.46910808, 0.69242926, 1.15223867, 1.17339163, 0.06423215])

In [73]:
random_array[((random_array>0) | (random_array<-1))]

array([-1.11204182,  0.46910808,  0.69242926,  1.15223867,  1.17339163,
        0.06423215, -1.2978598 ])

In [74]:
random_array>0

array([False,  True, False,  True,  True,  True, False,  True, False,
       False])

In [75]:
random_array<-1

array([ True, False, False, False, False, False, False, False,  True,
       False])

In [76]:
random_array

array([-1.11204182,  0.46910808, -0.68408549,  0.69242926,  1.15223867,
        1.17339163, -0.11879617,  0.06423215, -1.2978598 , -0.64957899])

In [77]:
random_array[(random_array>0) & (random_array<1)]

array([0.46910808, 0.69242926, 0.06423215])

Instead of using boolean masks, fancy indexing provides an alternative way with index arrays:

In [78]:
np.where(random_array>0)

(array([1, 3, 4, 5, 7], dtype=int64),)

In [79]:
ix0, = np.where(random_array>0)

In [80]:
ix0

array([1, 3, 4, 5, 7], dtype=int64)

In [81]:
random_array[random_array>0]

array([0.46910808, 0.69242926, 1.15223867, 1.17339163, 0.06423215])

In [82]:
random_array[ix0]

array([0.46910808, 0.69242926, 1.15223867, 1.17339163, 0.06423215])

In [83]:
random_array = np.random.randn(3, 4)
random_array

array([[ 1.26308259,  1.10781119, -0.96620663,  0.7825883 ],
       [ 1.22047832,  1.28593266, -0.12777788,  0.36335958],
       [ 0.3168423 , -1.32085224, -0.91976381, -0.11364747]])

You can use other iterabes as indexers, but note the difference:

In [85]:
random_array[0,2]

-0.9662066290103926

In [86]:
random_array[0, 2]

-0.9662066290103926

## View vs. copy

We'll figure this out a bit later, but can already test one of the main sources of bugs in numerical code:

In [87]:
arr

array([[7., 2., 3.],
       [3., 9., 6.],
       [7., 2., 3.],
       [3., 9., 6.]])

Creating a **view** and a **copy**:

In [88]:
arr_view = arr[:, :]
arr_copy = arr.copy()

In [89]:
arr_view

array([[7., 2., 3.],
       [3., 9., 6.],
       [7., 2., 3.],
       [3., 9., 6.]])

In [90]:
arr_copy

array([[7., 2., 3.],
       [3., 9., 6.],
       [7., 2., 3.],
       [3., 9., 6.]])

Changing an element(-s) in a view (note how assignment works despite fancy indexing):

In [91]:
arr_view[1:, 1:] = 41

In [92]:
arr_new = arr_view[1:, 1:]

In [112]:
arr_view

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.],
       [ 7., 41., 41.],
       [ 3., 41., 41.]])

In [115]:
arr_view[:,np.arange(2,-1,-1)]

array([[ 3.,  2.,  7.],
       [41., 41.,  3.],
       [41., 41.,  7.],
       [41., 41.,  3.]])

In [94]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.],
       [ 7., 41., 41.],
       [ 3., 41., 41.]])

Changing an element(-s) in a copy:

In [95]:
arr_copy[1:, 1:] = 32

In [96]:
arr_copy

array([[ 7.,  2.,  3.],
       [ 3., 32., 32.],
       [ 7., 32., 32.],
       [ 3., 32., 32.]])

In [97]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.],
       [ 7., 41., 41.],
       [ 3., 41., 41.]])

How to check an array is a view to another array:

In [98]:
arr_view.base is arr

True

In [102]:
print(arr_view.base)

[[ 7.  2.  3.]
 [ 3. 41. 41.]
 [ 7. 41. 41.]
 [ 3. 41. 41.]]


## Changing array shape

In [116]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.],
       [ 7., 41., 41.],
       [ 3., 41., 41.]])

We can simply reshape it:

In [118]:
arr.reshape((12,1))

array([[ 7.],
       [ 2.],
       [ 3.],
       [ 3.],
       [41.],
       [41.],
       [ 7.],
       [41.],
       [41.],
       [ 3.],
       [41.],
       [41.]])

In [120]:
arr.reshape((12,))

array([ 7.,  2.,  3.,  3., 41., 41.,  7., 41., 41.,  3., 41., 41.])

In [121]:
arr.reshape((2,1,6))

array([[[ 7.,  2.,  3.,  3., 41., 41.]],

       [[ 7., 41., 41.,  3., 41., 41.]]])

Or expand dimensions (adding dimensions of shape `1`):

In [122]:
np.expand_dims(arr, axis=1)

array([[[ 7.,  2.,  3.]],

       [[ 3., 41., 41.]],

       [[ 7., 41., 41.]],

       [[ 3., 41., 41.]]])

Or transpose it:

In [123]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.],
       [ 7., 41., 41.],
       [ 3., 41., 41.]])

In [124]:
arr.T

array([[ 7.,  3.,  7.,  3.],
       [ 2., 41., 41., 41.],
       [ 3., 41., 41., 41.]])

In [127]:
arr.T.shape

(4, 3)

Default transposing of `3+D` arrays:

In [130]:
np.expand_dims(arr, axis=-1).T.shape, np.expand_dims(arr, axis=-1).shape,np.expand_dims(arr,axis=0).shape

((1, 3, 4), (4, 3, 1), (1, 4, 3))

In [131]:
np.expand_dims(arr, axis=(-1, -2)).T.shape, np.expand_dims(arr, axis=(-1, -2)).shape

TypeError: '>' not supported between instances of 'tuple' and 'int'

Generic rranspose of an array:

In [132]:
arr_t = np.transpose(np.expand_dims(arr, axis=-1), axes=(1,2,0))

In [133]:
arr_t.shape

(3, 1, 4)

Note, that `arr_t` (and other arrays created with a similar operation) are **views** into the original array:

In [134]:
arr_t.base is arr

True

## Changing array type

It very simple in general:

In [136]:
arr.dtype

dtype('float64')

In [137]:
arr>2

array([[ True, False,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [142]:
(arr>2).astype(float64)

array([[1., 0., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [143]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.],
       [ 7., 41., 41.],
       [ 3., 41., 41.]])

There are some peculiarities, though:

In [144]:
arr[1,2] = -67

In [145]:
arr

array([[  7.,   2.,   3.],
       [  3.,  41., -67.],
       [  7.,  41.,  41.],
       [  3.,  41.,  41.]])

Note, how `-67` transforms to `189` (`189 + 67 = 256` - the largest value for `uint8`, see also [Integer numbers storage in computer memory](https://medium.com/@luischaparroc/integer-numbers-storage-in-computer-memory-47af4b59009)):

In [151]:
arr.astype(complex)

array([[  7.+0.j,   2.+0.j,   3.+0.j],
       [  3.+0.j,  41.+0.j, -67.+0.j],
       [  7.+0.j,  41.+0.j,  41.+0.j],
       [  3.+0.j,  41.+0.j,  41.+0.j]])

In [152]:
arr.astype(np.float32)

array([[  7.,   2.,   3.],
       [  3.,  41., -67.],
       [  7.,  41.,  41.],
       [  3.,  41.,  41.]], dtype=float32)

In [156]:
arr.astype(np.complex128)+17.j

array([[  7.+17.j,   2.+17.j,   3.+17.j],
       [  3.+17.j,  41.+17.j, -67.+17.j],
       [  7.+17.j,  41.+17.j,  41.+17.j],
       [  3.+17.j,  41.+17.j,  41.+17.j]])

## Stacking arrays

In [157]:
arr_1 = np.random.randint(10, size=(10,))
arr_2 = np.random.randint(10, size=(10,))

In [161]:
arr_1, arr_2

(array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8]), array([0, 7, 6, 0, 2, 6, 6, 7, 0, 3]))

Stacking arrays vertically:

In [159]:
arr_1.reshape((2,5))

array([[9, 8, 2, 2, 1],
       [5, 3, 4, 2, 8]])

In [160]:
arr_2.reshape((2,5))

array([[0, 7, 6, 0, 2],
       [6, 6, 7, 0, 3]])

In [162]:
np.vstack([arr_1.reshape((2,5)), arr_2.reshape((2,5))])

array([[9, 8, 2, 2, 1],
       [5, 3, 4, 2, 8],
       [0, 7, 6, 0, 2],
       [6, 6, 7, 0, 3]])

Stacking arrays horizontally:

In [163]:
np.hstack([arr_1, arr_2])

array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8, 0, 7, 6, 0, 2, 6, 6, 7, 0, 3])

Stacking along additional dimension:

In [164]:
np.hstack([np.expand_dims(arr_1, 1), np.expand_dims(arr_2, 1)])

array([[9, 0],
       [8, 7],
       [2, 6],
       [2, 0],
       [1, 2],
       [5, 6],
       [3, 6],
       [4, 7],
       [2, 0],
       [8, 3]])

Stacking `1D` arrays:

In [168]:
np.vstack([arr_1, arr_2])

array([[9, 8, 2, 2, 1, 5, 3, 4, 2, 8],
       [0, 7, 6, 0, 2, 6, 6, 7, 0, 3]])

In [169]:
np.vstack([arr_1, arr_2]).T

array([[9, 0],
       [8, 7],
       [2, 6],
       [2, 0],
       [1, 2],
       [5, 6],
       [3, 6],
       [4, 7],
       [2, 0],
       [8, 3]])

All of these costs about the same, as transpose and expand operations only create views and `np.vstack` is the same. Check also `np.dstack` and `np.column_stack`.

## Universal functions

For a full list of universal functions, see [ufunc reference](https://docs.scipy.org/doc/numpy-1.15.1/reference/ufuncs.html).

In [176]:
arr_1

array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8])

In [177]:
arr_2

array([0, 7, 6, 0, 2, 6, 6, 7, 0, 3])

In [178]:
arr

array([[  7.,   2.,   3.],
       [  3.,  41., -67.],
       [  7.,  41.,  41.],
       [  3.,  41.,  41.]])

Sum all elements:

In [179]:
arr.sum()

163.0

Sum elements along specific axis:

In [183]:
arr.sum(axis=0)

array([ 20., 125.,  18.])

In [184]:
arr

array([[  7.,   2.,   3.],
       [  3.,  41., -67.],
       [  7.,  41.,  41.],
       [  3.,  41.,  41.]])

Sum element along specific axis, but preserve dimensions:

In [185]:
arr.sum(axis=1, keepdims=True,)

array([[ 12.],
       [-23.],
       [ 89.],
       [ 85.]])

`mean` is also a `ufunc`:

In [None]:
arr.mean(axis=0)

In [186]:
arr_1, arr_2

(array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8]), array([0, 7, 6, 0, 2, 6, 6, 7, 0, 3]))

Using masking with `where` argument:

In [187]:
arr_1 + arr_2

array([ 9, 15,  8,  2,  3, 11,  9, 11,  2, 11])

In [188]:
np.add(arr_1, arr_2, where=(arr_2<6))

array([         9, 1599226198, 1230196560,          2,          3,
       1509949440, 1380013579, 1497713503,          2,         11])

In [191]:
np.add(arr_1, arr_2, where=(arr_2<6), out=arr_1)-arr_1

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [None]:
arr_1

In [None]:
arr_2

Inplace operations are straightforward:

In [211]:
np.add(arr_1, arr_2, where=(arr_2<16), out=arr_2)

array([18, 23, 16, 16, 17, 16, 18, 19, 16, 17])

In [None]:
arr_2