# NumPy basics

## Import convention

In [2]:
import numpy as np
np.random.seed(657743) # creates a standard random number, can be recreated by others

In [3]:
%pylab inline
plt.style.use("bmh")

Populating the interactive namespace from numpy and matplotlib


## Creating arrays from Python sequences

Core data structure in NumPy:

In [None]:
?np.ndarray 

Creating a $2\times 2$ array of floating point numbers (note the garbage in the resulting array):

In [5]:
np.ndarray((2,2), dtype=float) #These values are memory junk, interpreted as a floating point number

array([[2.12199579e-314, 1.03765594e-311],
       [6.10665138e-321, 1.03766423e-311]])

By default `float` is 64-bit floating point number:

In [6]:
np.ndarray((2,2), dtype=float).dtype # we specified type, so can't push a string into our array

dtype('float64')

It works, but is not very convenient. A more convenient high-level option is

In [7]:
?np.array 
# nd.array creates something more similar to a python array. 

In [10]:
arr = np.array([[7, 2, 3.], [3, 9, 6]]) # Creating an array like this automatically creates the size for us.

In [9]:
type(arr)

numpy.ndarray

Each array has known `shape`, `size` and `ndim`:

In [12]:
arr

array([[7., 2., 3.],
       [3., 9., 6.]])

In [13]:
print("Array shape is", arr.shape) # What is the dimensions of the array?
print("Array size is", arr.size) ##How many elements are in the array?
print(f"Array has {arr.ndim} dimensions") # How many dimensions are there in the array?

Array shape is (2, 3)
Array size is 6
Array has 2 dimensions


And `dtype`, `itemsize` and `nbytes`:

In [11]:
print("Array dtype is", arr.dtype) # Type of the elements
print(f"Each item takes {arr.itemsize} bytes") # how many bytes each item of this array takes
print(f"Array takes {arr.nbytes} bytes") # how many bytes the array has

Array dtype is float64
Each item takes 8 bytes
Array takes 48 bytes


In [None]:
#Python list vs. ndarray: (slides)

## Creating arrays of special shape and/or type

Array with specific `shape` and `dtype`, filled with `0`'s:

In [14]:
zeros_array = np.zeros((2,6), dtype=bool)
zeros_array

array([[False, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [15]:
arr

array([[7., 2., 3.],
       [3., 9., 6.]])

Array of `0`'s with the same shape as `arr`, but of different `dtype`:

In [17]:
zeros_like_array = np.zeros_like(arr, dtype=np.complex128)
zeros_like_array

array([[0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j]])

Array with specific `shape` and `dtype`, filled with `1`'s:

In [18]:
ones_array = np.ones((3,9), dtype=np.float32)
ones_array

array([[1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)

Array of `1`'s with the same shape as `zeros_array`, and of different `dtype`:

In [19]:
zeros_array

array([[False, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [20]:
ones_like_array = np.ones_like(zeros_array, dtype=np.float32)
ones_like_array

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]], dtype=float32)

Range arrays are very common for indexing and as a drop-in replacement for built-int `range`.

The most simple form is `np.arange(n)`: **start** at default `0`, **increment** by default `1`, **end** at `n` (exclusive):

In [21]:
range_array = np.arange(10)
range_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Or you can specify both **starting** (inclusive) and **ending** points (exclusive):

In [24]:
range_array = np.arange(-5, 5)
range_array

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

Or all three:

In [25]:
range_array = np.arange(0, 5, 2)
range_array

array([0, 2, 4])

Negative increment (or *step*) works as usual, but beware of bounds ordering:

In [26]:
range_array = np.arange(0, 10, -2)

In [27]:
range_array

array([], dtype=int32)

In [30]:
range_array = np.arange(10, 0, -2)
range_array

array([10,  8,  6,  4,  2])

It's not only integer (hence, it's a generalization of `range`):

In [32]:
range_float_array = np.arange(-0.5, 5., 0.5)
range_float_array

array([-0.5,  0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

## Basic indexing of numpy arrays

Integer and slicing notations:

In [33]:
range_float_array

array([-0.5,  0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

Get first element:

In [None]:
range_float_array[1]

Get a slice (right index is not included):

In [None]:
range_float_array[1:2]

Get a slice:

In [None]:
range_float_array[:5]

Get a slice with negative indices:

In [35]:
range_float_array[-5:-2]

array([2.5, 3. , 3.5])

Indexing 2D arrays:

In [36]:
arr

array([[7., 2., 3.],
       [3., 9., 6.]])

In [37]:
arr[:1, 1:]

array([[2., 3.]])

In [40]:
arr[0, 1:]

array([2., 3.])

In [41]:
arr[0, ::2]

array([7., 3.])

Generally, basic indexing works very similar to usual Python lists, but in many dimensions.

## Boolean and fancy indexing

In [42]:
random_array = np.random.randn(10)
random_array

array([-1.11204182,  0.46910808, -0.68408549,  0.69242926,  1.15223867,
        1.17339163, -0.11879617,  0.06423215, -1.2978598 , -0.64957899])

Most operations (arithmetic, logical, etc.) are vectorized for NumPy arrays and we do not need loops at all. For example, to create a boolean **mask** (i.e. `>` is a **vectorized** operation):

In [43]:
random_array>0

array([False,  True, False,  True,  True,  True, False,  True, False,
       False])

Boolean masks can be used for indexing (including logical operations on mask themselves, as they are vectorized as well!):

In [None]:
random_array[random_array>0] #Select elements grater than 0

In [44]:
random_array[(random_array>0) | (random_array<-1)] # this is using the "or" operation

array([-1.11204182,  0.46910808,  0.69242926,  1.15223867,  1.17339163,
        0.06423215, -1.2978598 ])

In [None]:
random_array>0

In [None]:
random_array<-1

In [None]:
random_array

In [None]:
random_array[(random_array>0) & (random_array<1)]

Instead of using boolean masks, fancy indexing provides an alternative way with index arrays:

In [45]:
np.where(random_array>0)

(array([1, 3, 4, 5, 7], dtype=int64),)

In [46]:
ix0, = np.where(random_array>0)

In [47]:
ix0

array([1, 3, 4, 5, 7], dtype=int64)

In [48]:
random_array[random_array>0]

array([0.46910808, 0.69242926, 1.15223867, 1.17339163, 0.06423215])

In [49]:
random_array[ix0]

array([0.46910808, 0.69242926, 1.15223867, 1.17339163, 0.06423215])

In [None]:
#Using multiple demensions

In [50]:
random_array = np.random.randn(3, 4)
random_array

array([[ 1.26308259,  1.10781119, -0.96620663,  0.7825883 ],
       [ 1.22047832,  1.28593266, -0.12777788,  0.36335958],
       [ 0.3168423 , -1.32085224, -0.91976381, -0.11364747]])

You can use other iterabes as indexers, but note the difference:

In [51]:
random_array[[0],[2]] #Fancy Indexing

array([-0.96620663])

In [52]:
random_array[0, 2] #Basic indexing

-0.9662066290103926

In [None]:
#The main difference, even thouhg the values are the same, is in whether or not it is a copy or a view (as we will see).

## View vs. copy

We'll figure this out a bit later, but can already test one of the main sources of bugs in numerical code:

In [53]:
arr

array([[7., 2., 3.],
       [3., 9., 6.]])

Creating a **view** and a **copy**:

In [54]:
arr_view = arr[:, :]
arr_copy = arr.copy()
#basic indexing always creates a view, and points to the same memory. Boolean and Fancy always make a copy though

In [None]:
arr_view

In [None]:
arr_copy

Changing an element(-s) in a view (note how assignment works despite fancy indexing):

In [55]:
arr_view[1:, 1:] = 41

In [56]:
arr_new = arr_view[1:, 1:]

In [57]:
arr_view

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

In [58]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

Changing an element(-s) in a copy:

In [59]:
arr_copy[1:, 1:] = 32

In [60]:
arr_copy

array([[ 7.,  2.,  3.],
       [ 3., 32., 32.]])

In [61]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

How to check an array is a view to another array:

In [62]:
arr_view.base is arr

True

In [63]:
arr_copy.base is arr

False

## Changing array shape

In [109]:
arr

array([[  7.,   2.,   3.],
       [  3.,  41., -67.]])

We can simply reshape it:

In [65]:
arr.reshape((6,1))

array([[ 7.],
       [ 2.],
       [ 3.],
       [ 3.],
       [41.],
       [41.]])

In [70]:
arr.reshape((6,))

array([ 7.,  2.,  3.,  3., 41., 41.])

In [112]:
arr.reshape((2,2, 3)) 

ValueError: cannot reshape array of size 6 into shape (2,2,3)

Or expand dimensions (adding dimensions of shape `1`):

In [80]:
np.expand_dims(arr, axis=1)

array([[[ 7.,  2.,  3.]],

       [[ 3., 41., 41.]]])

Or transpose it:

In [81]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

In [82]:
arr.T

array([[ 7.,  3.],
       [ 2., 41.],
       [ 3., 41.]])

In [83]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

Default transposing of `3+D` arrays:

In [84]:
np.expand_dims(arr, axis=-1).T.shape, np.expand_dims(arr, axis=-1).shape
#For 3d arrays, we have multiple ways to do the default transpose. The first line transforms the array to a (1,3,2) shape.
#The second pary transforms it back to a (2,3,1) shape.

((1, 3, 2), (2, 3, 1))

In [85]:
#similarly for 4d
np.expand_dims(arr, axis=(-1, -2)).T.shape, np.expand_dims(arr, axis=(-1, -2)).shape 


((1, 1, 3, 2), (2, 3, 1, 1))

Generic transpose of an array:

In [86]:
arr_t = np.transpose(np.expand_dims(arr, axis=-1), axes=(1,2,0))
#We want 1 to move to zero, 2 to move to 1, and 0 to move to end

In [87]:
arr_t.shape

(3, 1, 2)

Note, that `arr_t` (and other arrays created with a similar operation) are **views** into the original array:

In [88]:
arr_t.base is arr

True

## Changing array type

It very simple in general:

In [89]:
arr.dtype

dtype('float64')

In [90]:
arr>2

array([[ True, False,  True],
       [ True,  True,  True]])

In [91]:
(arr>2).astype(np.int8)

array([[1, 0, 1],
       [1, 1, 1]], dtype=int8)

In [92]:
arr

array([[ 7.,  2.,  3.],
       [ 3., 41., 41.]])

There are some peculiarities, though:

In [93]:
arr[1,2] = -67

In [94]:
arr

array([[  7.,   2.,   3.],
       [  3.,  41., -67.]])

Note, how `-67` transforms to `189` (`189 + 67 = 256` - the largest value for `uint8`, see also [Integer numbers storage in computer memory](https://medium.com/@luischaparroc/integer-numbers-storage-in-computer-memory-47af4b59009)):

In [95]:
arr.astype(np.uint8)

array([[  7,   2,   3],
       [  3,  41, 189]], dtype=uint8)

In [96]:
arr.astype(np.float32)

array([[  7.,   2.,   3.],
       [  3.,  41., -67.]], dtype=float32)

In [97]:
arr.astype(np.complex128)

array([[  7.+0.j,   2.+0.j,   3.+0.j],
       [  3.+0.j,  41.+0.j, -67.+0.j]])

## Stacking arrays

In [98]:
arr_1 = np.random.randint(10, size=(10,))
arr_2 = np.random.randint(10, size=(10,))

In [99]:
arr_1, arr_2

(array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8]), array([0, 7, 6, 0, 2, 6, 6, 7, 0, 3]))

Stacking arrays vertically:

In [100]:
arr_1.reshape((2,5))

array([[9, 8, 2, 2, 1],
       [5, 3, 4, 2, 8]])

In [101]:
arr_2.reshape((2,5))

array([[0, 7, 6, 0, 2],
       [6, 6, 7, 0, 3]])

In [102]:
np.vstack([arr_1.reshape((2,5)), arr_2.reshape((2,5))])

array([[9, 8, 2, 2, 1],
       [5, 3, 4, 2, 8],
       [0, 7, 6, 0, 2],
       [6, 6, 7, 0, 3]])

Stacking arrays horizontally:

In [103]:
np.hstack([arr_1, arr_2])

array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8, 0, 7, 6, 0, 2, 6, 6, 7, 0, 3])

Stacking along additional dimension:

In [104]:
np.hstack([np.expand_dims(arr_1, 1), np.expand_dims(arr_2, 1)])

array([[9, 0],
       [8, 7],
       [2, 6],
       [2, 0],
       [1, 2],
       [5, 6],
       [3, 6],
       [4, 7],
       [2, 0],
       [8, 3]])

In [105]:
arr_1.T #Transform is not defined for one dimensional arrays, as there is nothing to transpose

array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8])

Stacking `1D` arrays:

In [106]:
np.vstack([arr_1, arr_2])

array([[9, 8, 2, 2, 1, 5, 3, 4, 2, 8],
       [0, 7, 6, 0, 2, 6, 6, 7, 0, 3]])

In [107]:
np.vstack([arr_1, arr_2]).T #This is the shorte version than the np.expand line, so we should use this

array([[9, 0],
       [8, 7],
       [2, 6],
       [2, 0],
       [1, 2],
       [5, 6],
       [3, 6],
       [4, 7],
       [2, 0],
       [8, 3]])

All of these costs about the same, as transpose and expand operations only create views and `np.vstack` is the same. Check also `np.dstack` and `np.column_stack`.

## Universal functions

If you are writing a loop for a numpy array, you are probably doing something wrong-- numpy has many functions already implemented that run vectorized, operating element-wise, which are very fast.

For a full list of universal functions, see [ufunc reference](https://docs.scipy.org/doc/numpy-1.15.1/reference/ufuncs.html).

In [113]:
arr_1

array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8])

In [114]:
arr_2

array([0, 7, 6, 0, 2, 6, 6, 7, 0, 3])

In [115]:
arr

array([[  7.,   2.,   3.],
       [  3.,  41., -67.]])

Sum all elements:

In [116]:
arr.sum()

-11.0

Sum elements along specific axis:

In [119]:
arr.sum(axis=1)

array([ 12., -23.])

In [120]:
arr

array([[  7.,   2.,   3.],
       [  3.,  41., -67.]])

Sum element along specific axis, but preserve dimensions:

In [121]:
arr.sum(axis=1, keepdims=True)

array([[ 12.],
       [-23.]])

`mean` is also a `ufunc`:

In [122]:
arr.mean(axis=0)

array([  5. ,  21.5, -32. ])

In [128]:
arr_1, arr_2

(array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8]), array([0, 7, 6, 0, 2, 6, 6, 7, 0, 3]))

Using masking with `where` argument:

In [124]:
arr_1 + arr_2

array([ 9, 15,  8,  2,  3, 11,  9, 11,  2, 11])

In [129]:
np.add(arr_1, arr_2, where=(arr_2<6))

array([  9,   1,   0,   2,   3,   0, 768, 489,   2,  11])

In [130]:
np.add(arr_1, arr_2, where=(arr_2<6), out=np.zeros_like(arr_1))

array([ 9,  0,  0,  2,  3,  0,  0,  0,  2, 11])

In [131]:
arr_1

array([9, 8, 2, 2, 1, 5, 3, 4, 2, 8])

In [132]:
arr_2

array([0, 7, 6, 0, 2, 6, 6, 7, 0, 3])

Inplace operations are straightforward:

In [134]:
np.add(arr_1, arr_2, where=(arr_2<6), out=arr_2) 
#This will replace the values validated by "where" into arr_2 itself. This is called "Inplace" functions

array([ 9,  7,  6,  4,  4,  6,  6,  7,  4, 11])

In [135]:
arr_2

array([ 9,  7,  6,  4,  4,  6,  6,  7,  4, 11])