# NumPy basics

## Import convention

In [2]:
import numpy as np
np.random.seed(657743)

In [3]:
%pylab inline
plt.style.use("bmh")

Populating the interactive namespace from numpy and matplotlib


## Creating arrays from Python sequences

Core data structure in NumPy:

In [14]:
?np.ndarray

Creating a $2\times 2$ array of floating point numbers (note the garbage in the resulting array):

In [7]:
np.ndarray((2,2), dtype=float)

array([[2.12199579e-314, 6.01346930e-154],
       [4.76279283e-321, 1.08433985e-311]])

By default `float` is 64-bit floating point number:

In [8]:
np.ndarray((2,2), dtype=float).dtype

dtype('float64')

It works, but is not very convenient. A more convenient high-level option is

In [9]:
?np.array

In [17]:
arr = np.array([[7, 2, 3.], [3, 9, 6]])
arr

array([[7., 2., 3.],
       [3., 9., 6.]])

In [12]:
type(arr)

numpy.ndarray

Each array has known `shape`, `size` and `ndim`:

In [15]:
print("Array shape is", arr.shape)
print("Array size is", arr.size)
print(f"Array has {arr.ndim} dimensions")

Array shape is (2, 3)
Array size is 6
Array has 2 dimensions


In [None]:
# arr

And `dtype`, `itemsize` and `nbytes`:

In [16]:
print("Array dtype is", arr.dtype)
print(f"Each item takes {arr.itemsize} bytes")
print(f"Array takes {arr.nbytes} bytes")

Array dtype is float64
Each item takes 8 bytes
Array takes 48 bytes


## Creating arrays of special shape and/or type

Array with specific `shape` and `dtype`, filled with `0`'s:

In [17]:
zeros_array = np.zeros((2,6), dtype=bool)
zeros_array

array([[False, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [None]:
arr

Array of `0`'s with the same shape as `arr`, but of different `dtype`:

In [None]:
zeros_like_array = np.zeros_like(arr, dtype=np.complex128)
zeros_like_array

Array with specific `shape` and `dtype`, filled with `1`'s:

In [25]:
ones_array = np.ones((3,9), dtype=np.float32)
ones_array

array([[1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)

Array of `1`'s with the same shape as `zeros_array`, and of different `dtype`:

In [18]:
zeros_array

array([[False, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [None]:
ones_like_array = np.ones_like(zeros_array, dtype=np.float32)
ones_like_array

Range arrays are very common for indexing and as a drop-in replacement for built-int `range`.

The most simple form is `np.arange(n)`: **start** at default `0`, **increment** by default `1`, **end** at `n` (exclusive):

In [19]:
range_array = np.arange(10)
range_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Or you can specify both **starting** (inclusive) and **ending** points (exclusive):

In [20]:
range_array = np.arange(-5, 5)
range_array

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

Or all three:

In [21]:
range_array = np.arange(0, 5, 2)
range_array

array([0, 2, 4])

Negative increment (or *step*) works as usual, but beware of bounds ordering:

In [22]:
range_array = np.arange(0, 10, -2)

In [None]:
range_array

In [23]:
range_array = np.arange(10, 0, -2)
range_array

array([10,  8,  6,  4,  2])

It's not only integer (hence, it's a generalization of `range`):

In [9]:
range_float_array = np.arange(-0.5, 5., 0.5)
range_float_array

array([-0.5,  0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

## Basic indexing of numpy arrays

Integer and slicing notations:

In [10]:
range_float_array

array([-0.5,  0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

Get first element:

In [11]:
range_float_array[1]

0.0

Get a slice (right index is not included):

In [12]:
range_float_array[1:2]

array([0.])

Get a slice:

In [13]:
range_float_array[:5]

array([-0.5,  0. ,  0.5,  1. ,  1.5])

Get a slice with negative indices:

In [None]:
range_float_array[-5:-2]

Indexing 2D arrays:

In [None]:
arr

In [None]:
arr[:1, 1:]

In [None]:
arr[0, 1:]

In [None]:
arr[0, ::2]

Generally, basic indexing works very similar to usual Python lists, but in many dimensions.

## Boolean and fancy indexing

In [46]:
random_array = np.random.randn(10)
random_array

array([ 1.18956295, -0.85700108,  0.53938533, -0.88600821,  1.75043633,
       -1.6240769 , -1.21640932,  1.12313518, -0.41146246, -0.97928363])

Most operations (arithmetic, logical, etc.) are vectorized for NumPy arrays and we do not need loops at all. For example, to create a boolean **mask** (i.e. `>` is a **vectorized** operation):

In [47]:
random_array>0

array([ True, False,  True, False,  True, False, False,  True, False,
       False])

Boolean masks can be used for indexing (including logical operations on mask themselves, as they are vectorized as well!):

In [48]:
random_array[random_array>0]

array([1.18956295, 0.53938533, 1.75043633, 1.12313518])

In [36]:
random_array[(random_array>0) | (random_array<-1)]

array([ 1.26308259,  1.10781119,  0.7825883 ,  1.22047832,  1.28593266,
        0.36335958,  0.3168423 , -1.32085224])

In [37]:
random_array>0

array([ True,  True, False,  True,  True,  True, False,  True,  True,
       False])

In [7]:
random_array<-1

array([ True, False, False, False, False, False, False, False,  True,
       False])

In [9]:
random_array

array([-1.11204182,  0.46910808, -0.68408549,  0.69242926,  1.15223867,
        1.17339163, -0.11879617,  0.06423215, -1.2978598 , -0.64957899])

In [50]:
random_array[(random_array>0) & (random_array<1)]

array([0.53938533])

Instead of using boolean masks, fancy indexing provides an alternative way with index arrays:

In [49]:
np.where(random_array>0)

(array([0, 2, 4, 7], dtype=int64),)

In [40]:
ix0, = np.where(random_array>0)

In [41]:
ix0

array([0, 1, 3, 4, 5, 7, 8], dtype=int64)

In [51]:
random_array[random_array>0]

array([1.18956295, 0.53938533, 1.75043633, 1.12313518])

In [52]:
random_array[ix0]

array([ 1.18956295, -0.85700108, -0.88600821,  1.75043633, -1.6240769 ,
        1.12313518, -0.41146246])

In [43]:
random_array = np.random.randn(3, 4)
random_array

array([[-0.91976381, -0.11364747,  1.14403016,  1.55033902],
       [ 0.27006441,  0.98792807, -1.33433559,  1.06917692],
       [ 1.34406605,  1.35150088, -0.9779534 ,  0.08241911]])

You can use other iterabes as indexers, but note the difference:

In [44]:
random_array[[0],[2]]

array([1.14403016])

In [45]:
random_array[0, 2]

1.1440301606002883

## View vs. copy

We'll figure this out a bit later, but can already test one of the main sources of bugs in numerical code:

In [20]:
arr

array([[7., 2., 3.],
       [3., 9., 6.]])

Creating a **view** and a **copy**:

In [21]:
arr_view = arr[:, :]
arr_copy = arr.copy()

In [25]:
arr_view

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

In [23]:
arr_copy

array([[7., 2., 3.],
       [3., 9., 6.]])

Changing an element(-s) in a view (note how assignment works despite fancy indexing):

In [26]:
arr_view[1:, 1:] = 41

In [27]:
arr_new = arr_view[1:, 1:]

In [28]:
arr_view

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

In [29]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

Changing an element(-s) in a copy:

In [30]:
arr_copy[1:, 1:] = 32

In [31]:
arr_copy

array([[ 7.,  2.,  3.],
       [ 3., 32., 32.]])

In [None]:
arr

How to check an array is a view to another array:

In [None]:
arr_view.base is arr

In [None]:
arr_copy.base is arr

## Changing array shape

In [13]:
arr

NameError: name 'arr' is not defined

We can simply reshape it:

In [32]:
arr.reshape((6,1))

array([[ 7.],
       [ 2.],
       [ 3.],
       [ 3.],
       [41.],
       [41.]])

In [33]:
arr.reshape((6,))

array([ 7.,  2.,  3.,  3., 41., 41.])

In [34]:
arr.reshape((2,1,3))

array([[[ 7.,  2.,  3.]],

       [[ 3., 41., 41.]]])

Or expand dimensions (adding dimensions of shape `1`):

In [None]:
np.expand_dims(arr, axis=1)

Or transpose it:

In [None]:
arr

In [None]:
arr.T

In [None]:
arr

Default transposing of `3+D` arrays:

In [None]:
np.expand_dims(arr, axis=-1).T.shape, np.expand_dims(arr, axis=-1).shape

In [None]:
np.expand_dims(arr, axis=(-1, -2)).T.shape, np.expand_dims(arr, axis=(-1, -2)).shape

Generic rranspose of an array:

In [None]:
arr_t = np.transpose(np.expand_dims(arr, axis=-1), axes=(1,2,0))

In [None]:
arr_t.shape

Note, that `arr_t` (and other arrays created with a similar operation) are **views** into the original array:

In [None]:
arr_t.base is arr

## Changing array type

It very simple in general:

In [None]:
arr.dtype

In [None]:
arr>2

In [None]:
(arr>2).astype(np.int8)

In [None]:
arr

There are some peculiarities, though:

In [None]:
arr[1,2] = -67

In [None]:
arr

Note, how `-67` transforms to `189` (`189 + 67 = 256` - the largest value for `uint8`, see also [Integer numbers storage in computer memory](https://medium.com/@luischaparroc/integer-numbers-storage-in-computer-memory-47af4b59009)):

In [None]:
arr.astype(np.uint8)

In [None]:
arr.astype(np.float32)

In [None]:
arr.astype(np.complex128)

## Stacking arrays

In [53]:
arr_1 = np.random.randint(10, size=(10,))
arr_2 = np.random.randint(10, size=(10,))

In [61]:
arr_1, arr_2

(array([5, 8, 5, 6, 3, 4, 2, 1, 9, 4]), array([7, 0, 7, 7, 6, 3, 9, 5, 7, 2]))

Stacking arrays vertically:

In [55]:
arr_1.reshape((2,5))

array([[5, 8, 5, 6, 3],
       [4, 2, 1, 9, 4]])

In [56]:
arr_2.reshape((2,5))

array([[7, 0, 7, 7, 6],
       [3, 9, 5, 7, 2]])

In [57]:
np.vstack([arr_1.reshape((2,5)), arr_2.reshape((2,5))])

array([[5, 8, 5, 6, 3],
       [4, 2, 1, 9, 4],
       [7, 0, 7, 7, 6],
       [3, 9, 5, 7, 2]])

Stacking arrays horizontally:

In [58]:
np.hstack([arr_1, arr_2])

array([5, 8, 5, 6, 3, 4, 2, 1, 9, 4, 7, 0, 7, 7, 6, 3, 9, 5, 7, 2])

Stacking along additional dimension:

In [59]:
np.hstack([np.expand_dims(arr_1, 1), np.expand_dims(arr_2, 1)])

array([[5, 7],
       [8, 0],
       [5, 7],
       [6, 7],
       [3, 6],
       [4, 3],
       [2, 9],
       [1, 5],
       [9, 7],
       [4, 2]])

In [60]:
arr_1.T

array([5, 8, 5, 6, 3, 4, 2, 1, 9, 4])

Stacking `1D` arrays:

In [62]:
np.vstack([arr_1, arr_2])

array([[5, 8, 5, 6, 3, 4, 2, 1, 9, 4],
       [7, 0, 7, 7, 6, 3, 9, 5, 7, 2]])

In [63]:
np.vstack([arr_1, arr_2]).T

array([[5, 7],
       [8, 0],
       [5, 7],
       [6, 7],
       [3, 6],
       [4, 3],
       [2, 9],
       [1, 5],
       [9, 7],
       [4, 2]])

All of these costs about the same, as transpose and expand operations only create views and `np.vstack` is the same. Check also `np.dstack` and `np.column_stack`.

## Universal functions

For a full list of universal functions, see [ufunc reference](https://docs.scipy.org/doc/numpy-1.15.1/reference/ufuncs.html).

In [70]:
arr_1

array([5, 8, 5, 6, 3, 4, 2, 1, 9, 4])

In [71]:
arr_2

array([7, 8, 7, 7, 6, 7, 9, 6, 7, 6])

In [69]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

Sum all elements:

In [72]:
arr.sum()

97.0

Sum elements along specific axis:

In [74]:
arr.sum(axis=0)

array([10., 43., 44.])

In [75]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

Sum element along specific axis, but preserve dimensions:

In [78]:
arr.sum(axis=1, keepdims=True)

array([[12.],
       [85.]])

`mean` is also a `ufunc`:

In [None]:
arr.mean(axis=0)

In [64]:
arr_1, arr_2

(array([5, 8, 5, 6, 3, 4, 2, 1, 9, 4]), array([7, 0, 7, 7, 6, 3, 9, 5, 7, 2]))

Using masking with `where` argument:

In [65]:
arr_1 + arr_2

array([12,  8, 12, 13,  9,  7, 11,  6, 16,  6])

In [66]:
np.add(arr_1, arr_2, where=(arr_2<6))

array([      0,       8,       0,     567,    1156,       7,     768,
             6, 6881280,       6])

In [67]:
np.add(arr_1, arr_2, where=(arr_2<6), out=np.zeros_like(arr_1))

array([0, 8, 0, 0, 0, 7, 0, 6, 0, 6])

In [None]:
arr_1

In [None]:
arr_2

Inplace operations are straightforward:

In [68]:
np.add(arr_1, arr_2, where=(arr_2<6), out=arr_2)

array([7, 8, 7, 7, 6, 7, 9, 6, 7, 6])

In [None]:
arr_2