# NumPy

<img src="../logos/numpy.png" width="256px" />

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

In this tutorial, I cover the most useful functions provided by NumPy for numerical and mathematical operations.

## Part I: Introduction

In [1]:
import numpy as np
import sys
import time

np.random.seed(313)

---

### Why NumPy Arrays?

NumPy are preferred to tradition Python arrays/lists for three reasons:

1. Less Memory Usage
2. Faster Computation/Runtime
3. Convenience

**I. Less Memory Usage**

In [2]:
lst = range(1000)
print(sys.getsizeof(1) * len(lst))

28000


In [3]:
arr = np.arange(1000)
print(arr.size * arr.itemsize)

8000


**II. Faster Computation**

NumPy arrays are faster than python lists, in terms of mathematical operations.

In [4]:
SIZE = 100000

lst_1 = range(SIZE)
lst_2 = range(SIZE)

arr_1 = np.arange(SIZE)
arr_2 = np.arange(SIZE)

start_time = time.time()
res = [(x + y) for x, y in zip(lst_1, lst_2)]
print("Python list took: ", (time.time() - start_time) * 1000)

start_time = time.time()
res = arr_1 + arr_2
print("Numpy array took: ", (time.time() - start_time) * 1000)

Python list took:  8.744955062866211
Numpy array took:  1.7895698547363281


**III. Convenience**

It is easy to do mathematical operations on NumPy arrays, as below:

In [5]:
arr_1, arr_2 = np.array([1, 2, 3]), np.array([4, 5, 6])

In [6]:
print('Element-wise Sum: {}, Sub: {}, Mul: {}.'.format(arr_1 + arr_2, 
                                                       arr_1 - arr_2, 
                                                       arr_1 * arr_2))

Element-wise Sum: [5 7 9], Sub: [-3 -3 -3], Mul: [ 4 10 18].


---

### Generating NumPy Arrays

#### 1. From Python List

It is possible to create NumPy arrays from typical python arrays, as follows:

In [7]:
arr_1d = np.asarray([2, 3, 6, 1, 5])
arr_2d = np.asarray([[4, 5, 6], [2, 0, 8]], dtype=np.float16)

print('The 1D array: {} \n'.format(arr_1d))
print('The 2D array:\n {}'.format(arr_2d))

The 1D array: [2 3 6 1 5] 

The 2D array:
 [[4. 5. 6.]
 [2. 0. 8.]]


#### 2. Zeros and Ones

We can generate arrays of ones and zeros as below:

In [8]:
ones = np.ones((2, 10))
zeros = np.zeros((2, 10))

print(ones)
print(zeros)

[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


#### 3. Range

To create a NumPy array from a range, we do:

In [9]:
arr = np.arange(10, 25)
arr

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])

#### 4. linspace

It returns evenly spaced numbers over a specified interval.

**Signature**

`np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)`

Please note that the stopping point is not included.

In [10]:
np.linspace(0, 5, 20)

array([0.        , 0.26315789, 0.52631579, 0.78947368, 1.05263158,
       1.31578947, 1.57894737, 1.84210526, 2.10526316, 2.36842105,
       2.63157895, 2.89473684, 3.15789474, 3.42105263, 3.68421053,
       3.94736842, 4.21052632, 4.47368421, 4.73684211, 5.        ])

In [11]:
np.linspace(0, 5, 21)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 ,
       2.75, 3.  , 3.25, 3.5 , 3.75, 4.  , 4.25, 4.5 , 4.75, 5.  ])

#### 5. eye

The **eye** function returns an identity matrix, i.e., a 2-D array where elements on the diagonal are equal to 1.

**Signature**

`np.eye(N, M=None, k=0, dtype=<class 'float'>, order='C')`

**Parameters**

* N: int - Number of rows in the output.
* M: int, optional - Number of columns in the output. If None, defaults to `N`.
* k: int, optional - Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal.

In [12]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [13]:
np.eye(N=4, M=3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

In [14]:
np.eye(N=4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

#### 6. Random Functions

The `np.random` let us generate arrays of random numbers. The most popular random functions are:

* **random()**: It returns a random number from the interval `[0.0, 1.0)`.
    
* **rand(d0, d1, ..., dn):** It generates an array, with a dimension of (d0, d1, ..., dn), which is "uniformly" filled by random numbers.

* **randn(d0, d1, ..., dn):** It generates an array, with a dimension of (d0, d1, ..., dn), filled by random numbers from the "standard" normal distribution.

* **randint(low, high=None, size=None, dtype='l'):** It generates an array, with a dimension of (d0, d1, ..., dn), which is filled by random integers from the interval of `[low, high)`.

In [15]:
np.random.random()

0.1655396242835081

In [16]:
# uniform distribution
np.random.rand(3, 4)

array([[0.55010437, 0.8608718 , 0.61793372, 0.94624639],
       [0.56085797, 0.86958749, 0.17240047, 0.45221703],
       [0.52613267, 0.43633415, 0.78297911, 0.5921082 ]])

In [17]:
# normal distribution, with a zero mean
np.random.randn(3, 4)

array([[ 0.79920493, -0.45609488,  0.04940156, -0.74214685],
       [-1.09281066,  2.30604849, -1.51152898, -0.84547472],
       [ 0.42579229, -1.16183798,  0.13640745,  1.50104789]])

In [18]:
np.random.randint(0, 100, (3, 4))

array([[87, 62, 27, 35],
       [78, 53, 84, 37],
       [22, 70, 27, 84]])

---

### NumPy Array Functions

There are some functions or features associated with arrays in NumPy:

1. **min()** and **max()** return the minimum and maximum numbers in NumPy arrays.
2. **argmin()** and **argmax()** return the index of minimum and maximum numbers in NumPy arrays.
3. **shape** returns the shape of an array.
4. **dtype** returns the data type of an array.
5. **reshape(n, m)** reshapes an array with a new dimension of (n, m).

In [19]:
arr = np.random.randint(0, 100, (4, 9))
arr

array([[88, 82,  6, 42, 51, 28, 92, 66, 80],
       [ 3, 42, 73, 19, 56, 44, 43, 61, 60],
       [63, 28, 34, 81, 91, 44, 59, 23, 94],
       [26, 86, 47,  1, 13, 58, 51, 53, 66]])

In [20]:
print('The min and max numbers are {} and {}, respectively.'.format(arr.min(), arr.max()))

The min and max numbers are 1 and 94, respectively.


In [21]:
print('The index of min and max numbers are {} and {}, respectively.'.format(arr.argmin(), arr.argmax()))

The index of min and max numbers are 30 and 26, respectively.


In [22]:
print('The shape of array is {}, and the data type of array is {}.'.format(arr.shape, arr.dtype))

The shape of array is (4, 9), and the data type of array is int64.


In [23]:
arr = arr.reshape(3, 12)
arr

array([[88, 82,  6, 42, 51, 28, 92, 66, 80,  3, 42, 73],
       [19, 56, 44, 43, 61, 60, 63, 28, 34, 81, 91, 44],
       [59, 23, 94, 26, 86, 47,  1, 13, 58, 51, 53, 66]])

---

### Indexing and Slicing of NumPy Arrays

#### 1. Indexing and Slicing of 1D Arrays

Let's create a typical Python and a NumPy array:

In [24]:
lst = list(range(0, 10))
lst

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [25]:
arr = np.arange(0, 10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [26]:
print(arr[...])

[0 1 2 3 4 5 6 7 8 9]


Retrieving all elements of a list by `...`, e.g., `lst[...]`, is not possible.

In [27]:
print('{} vs. {}'.format(lst[5], arr[5]))

5 vs. 5


In [28]:
print('{} vs. {}'.format(lst[1:5], arr[1:5]))

[1, 2, 3, 4] vs. [1 2 3 4]


In [29]:
print('{} vs. {}'.format(lst[5:], arr[5:]))

[5, 6, 7, 8, 9] vs. [5 6 7 8 9]


#### 2. Slicing and Broadcasting

For the python lists, `arr[0:5] = 100` is not possible! On the other, we can do it with an NumPy array.

In [30]:
arr = np.arange(0, 10)

In [31]:
arr[:5] = 100
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9])

In [32]:
sliced = arr[5:7]
sliced

array([5, 6])

Note that if we change the elements of the sliced array, the elements of the original array are also changed.

In [33]:
sliced[:] = 80
sliced

array([80, 80])

In [34]:
arr

array([100, 100, 100, 100, 100,  80,  80,   7,   8,   9])

To bypass this case, we can usem the `copy` function:

In [35]:
arr_copy = arr.copy()[:2]
arr_copy

array([100, 100])

In [36]:
arr_copy[:] = 75
arr_copy

array([75, 75])

We have no change in the original array:

In [37]:
arr

array([100, 100, 100, 100, 100,  80,  80,   7,   8,   9])

#### 3. Indexing and Slicing of 2D Arrays

Let's create a 2D array, i.e., a matrix, as follows:

In [38]:
mat = np.random.rand(3, 4)
mat

array([[0.17682077, 0.38666533, 0.73258622, 0.42644438],
       [0.54252846, 0.62054132, 0.56371885, 0.97001606],
       [0.84377151, 0.87853951, 0.54671285, 0.35772691]])

Elements can be retrieved by either single or double brucket(s) notations. However, the **single brucket** notation is recommended.

In [39]:
print('{} and {}'.format(mat[0, 2], mat[0][2]))

0.7325862165956133 and 0.7325862165956133


A matrix can be sliced as follows:

In [40]:
mat[:, 0:2]

array([[0.17682077, 0.38666533],
       [0.54252846, 0.62054132],
       [0.84377151, 0.87853951]])

Double brucket notation does not work for slicing a matrix. For instance, in the following example `mat[:][0:2]` is equal to `mat[0:2]`. So, the result is not what we want.

In [41]:
mat[:][0:2]

array([[0.17682077, 0.38666533, 0.73258622, 0.42644438],
       [0.54252846, 0.62054132, 0.56371885, 0.97001606]])

In [42]:
mat[0:2]

array([[0.17682077, 0.38666533, 0.73258622, 0.42644438],
       [0.54252846, 0.62054132, 0.56371885, 0.97001606]])

You should notice the difference between the two slicing approaches below:

In [43]:
mat[:, 1:2]

array([[0.38666533],
       [0.62054132],
       [0.87853951]])

In [44]:
mat[:, 1]

array([0.38666533, 0.62054132, 0.87853951])

---

# Part II: Basic Operations

The NumPy operations are placed into three groups of:

1. Array with Array
2. Array with Scalars
3. Universal Array Functions

I explain examples for each as follows.

---

### 1. Array with Array

We can have element-wise array-array operations as follows:

In [45]:
arr_1 = np.arange(10, 20)
arr_2 = np.arange(20, 30)
print(arr_1, arr_2)

[10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24 25 26 27 28 29]


In [46]:
sum_ew = arr_1 + arr_2
sub_ew = arr_1 - arr_2
mul_ew = arr_1 * arr_2
div_ew = arr_1 / arr_2
mod_ew = arr_1 % arr_2

In [47]:
print('Sum: {}\nSub: {}\nMul: {}\nDiv: {}\nMod: {}'.format(sum_ew, sub_ew, 
                                                           mul_ew, np.round(div_ew, 3), 
                                                           mod_ew))

Sum: [30 32 34 36 38 40 42 44 46 48]
Sub: [-10 -10 -10 -10 -10 -10 -10 -10 -10 -10]
Mul: [200 231 264 299 336 375 416 459 504 551]
Div: [0.5   0.524 0.545 0.565 0.583 0.6   0.615 0.63  0.643 0.655]
Mod: [10 11 12 13 14 15 16 17 18 19]


---

### 2. Array with Scalars

We can have mathematical operations between NumPy arrays and scalars. For example, if we add a NumPy array with a scalar, the value of that scalar is added to every elements of the array.

In [48]:
arr = np.arange(10, 20)

In [49]:
arr + 100

array([110, 111, 112, 113, 114, 115, 116, 117, 118, 119])

In [50]:
arr - 2

array([ 8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

In [51]:
arr ** 2

array([100, 121, 144, 169, 196, 225, 256, 289, 324, 361])

---

### 3. Universal Function

Universal functions perform a specific operation on every elements of arrays. Here, we look at common ones. Please note that a complete list of universal functions is available at: https://docs.scipy.org/doc/numpy-1.15.1/reference/ufuncs.html

In [52]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The number of dimensions:

In [53]:
arr.ndim

1

Retrieving the indices of non-zero elements:

In [54]:
arr.nonzero()

(array([1, 2, 3, 4, 5, 6, 7, 8, 9]),)

The data type of elements inside an array:

In [55]:
arr.dtype

dtype('int64')

The size of every element inside an array:

In [56]:
arr.itemsize

8

The total number of elements in an array:

In [57]:
arr.size

10

The shape of an array:

In [58]:
arr.shape

(10,)

**Other Functions:**

In [59]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [60]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

We can modify a sequence in-place by shuffling its contents as follows:

In [61]:
np.random.shuffle(arr)
arr

array([7, 5, 9, 2, 1, 0, 8, 3, 4, 6])

In [62]:
print('min: {}, argmin: {}'.format(np.min(arr), np.argmin(arr)))
print('max: {}, argmax: {}'.format(np.max(arr), np.argmax(arr)))

min: 0, argmin: 5
max: 9, argmax: 2


In [63]:
print('mean: {}, median: {}, std: {}.'.format(np.mean(arr), np.median(arr), np.std(arr)))

mean: 4.5, median: 4.5, std: 2.8722813232690143.


In [64]:
np.sin(arr)

array([ 0.6569866 , -0.95892427,  0.41211849,  0.90929743,  0.84147098,
        0.        ,  0.98935825,  0.14112001, -0.7568025 , -0.2794155 ])

In [65]:
np.sum(arr)

45

In [66]:
np.round(np.sqrt(arr), 2)

array([2.65, 2.24, 3.  , 1.41, 1.  , 0.  , 2.83, 1.73, 2.  , 2.45])

---

## Part III: Matrix Operations

### 1. Basics

In [67]:
mat = np.arange(1, 11).reshape(2, 5)
mat

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [68]:
mat.ndim

2

In [69]:
mat.size

10

In [70]:
mat.shape

(2, 5)

We can flatten a matrix by the ravel() or the flatten() functions:

In [71]:
mat.ravel()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [72]:
print('Row: {}'.format(mat.flatten(order='C')))
print('Col: {}'.format(mat.flatten(order='F')))

Row: [ 1  2  3  4  5  6  7  8  9 10]
Col: [ 1  6  2  7  3  8  4  9  5 10]


In [73]:
print('sum: by column is {}, by row is {}.'.format(mat.sum(axis=0), mat.sum(axis=1)))

sum: by column is [ 7  9 11 13 15], by row is [15 40].


In [74]:
print('mean: by column is {}, by row is {}.'.format(mat.mean(axis=0), mat.mean(axis=1)))

mean: by column is [3.5 4.5 5.5 6.5 7.5], by row is [3. 8.].


The matrix transpose operation can be done in two forms:

In [75]:
mat.T

array([[ 1,  6],
       [ 2,  7],
       [ 3,  8],
       [ 4,  9],
       [ 5, 10]])

In [76]:
mat.transpose()

array([[ 1,  6],
       [ 2,  7],
       [ 3,  8],
       [ 4,  9],
       [ 5, 10]])

### 2. Matrix Inversion

Please recall that a matrix should have a square shape to be inversable.

In [77]:
mat = np.random.rand(3, 3)
mat

array([[0.52946674, 0.65505593, 0.72953871],
       [0.24214079, 0.81405912, 0.71775051],
       [0.84653142, 0.48948247, 0.38435432]])

In [78]:
mat_inv = np.linalg.pinv(mat)
mat_inv

array([[ 0.38600138, -1.05763933,  1.24239066],
       [-5.1668537 ,  4.15808775,  2.04225678],
       [ 5.72992661, -2.96597722, -2.73542298]])

In [79]:
np.matmul(mat, mat_inv).round()

array([[ 1.,  0., -0.],
       [-0.,  1.,  0.],
       [-0.,  0.,  1.]])

---

## Part IV: Handling Exceptions

NumPy automatically handles the exceptions, for example, for `0/0` it outputs `nan`, and for `1/0` it outputs `inf`.

In [80]:
import warnings
warnings.filterwarnings('ignore')

In [81]:
arr = np.arange(0, 6)

In [82]:
arr / arr

array([nan,  1.,  1.,  1.,  1.,  1.])

In [83]:
1 / arr

array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       ])

---

## Part V: Advanced Operations

### 1. Stacking

Numpy lets us stack N-D arrays vertically or horizontally.

In [84]:
arr_1 = np.array([[1, 2], [3, 4]])
arr_1

array([[1, 2],
       [3, 4]])

In [85]:
arr_2 = np.array([[5, 6], [8, 5]])
arr_2

array([[5, 6],
       [8, 5]])

**Vertical Stacking**

In [86]:
print(np.vstack((arr_1, arr_2)))

[[1 2]
 [3 4]
 [5 6]
 [8 5]]


**Horizontal Stacking**

In [87]:
print(np.hstack((arr_1, arr_2)))

[[1 2 5 6]
 [3 4 8 5]]


### 2. Splitting

In [88]:
arr = np.arange(30).reshape(2, 15)
arr

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

**Vertical Splitting**

In [89]:
v_splits = np.vsplit(arr, 2)

for i, split in enumerate(v_splits):
    print('Split {}: \n{}.\n'.format(i + 1, split))

Split 1: 
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]].

Split 2: 
[[15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]].



**Horizontal Splitting**

In [90]:
h_splits = np.hsplit(arr, 5)

for i, split in enumerate(h_splits):
    print('Split {}: \n{}.\n'.format(i + 1, split))

Split 1: 
[[ 0  1  2]
 [15 16 17]].

Split 2: 
[[ 3  4  5]
 [18 19 20]].

Split 3: 
[[ 6  7  8]
 [21 22 23]].

Split 4: 
[[ 9 10 11]
 [24 25 26]].

Split 5: 
[[12 13 14]
 [27 28 29]].



---

## Part VI: Iteration

In [91]:
arr = np.arange(11, 23).reshape(3, 4)
arr

array([[11, 12, 13, 14],
       [15, 16, 17, 18],
       [19, 20, 21, 22]])

In [92]:
for row in arr:
    for cell in row:
        print(cell, end=' ')

11 12 13 14 15 16 17 18 19 20 21 22 

In [93]:
for cell in arr.flatten():
    print(cell, end=' ')

11 12 13 14 15 16 17 18 19 20 21 22 

### Iteration by nditr

**nditr** is an efficient multi-dimensional iterator object that iterates over arrays: <br/>
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.nditer.html

In [94]:
arr = np.arange(11, 23).reshape(3, 4)
arr

array([[11, 12, 13, 14],
       [15, 16, 17, 18],
       [19, 20, 21, 22]])

In [95]:
for cell in np.nditer(arr, order='C'):
    print(cell, end=' ')

11 12 13 14 15 16 17 18 19 20 21 22 

In [96]:
for cell in np.nditer(arr, order='F'):
    print(cell, end=' ')

11 15 19 12 16 20 13 17 21 14 18 22 

In [97]:
for column in np.nditer(arr, order='F', flags=['external_loop']):
    print(column)

[11 15 19]
[12 16 20]
[13 17 21]
[14 18 22]


In [98]:
for row in np.nditer(arr, op_flags=['readwrite']):
    row[...] = row + 100

print(arr)

[[111 112 113 114]
 [115 116 117 118]
 [119 120 121 122]]


**Iteration through two arrays:**

In [99]:
arr = np.arange(11, 23).reshape(3, 4)
arr

array([[11, 12, 13, 14],
       [15, 16, 17, 18],
       [19, 20, 21, 22]])

In [100]:
arr_ = np.arange(101, 113).reshape(3, 4)
arr_

array([[101, 102, 103, 104],
       [105, 106, 107, 108],
       [109, 110, 111, 112]])

In [101]:
for x, y in np.nditer((arr, arr_), order='F'):
    print(x, y)

11 101
15 105
19 109
12 102
16 106
20 110
13 103
17 107
21 111
14 104
18 108
22 112


---

## Part VII: Conditional Selection

Let's create a matrix as follows:

In [102]:
mat = np.random.rand(3, 5)
mat

array([[0.06693328, 0.84674717, 0.81242223, 0.0687165 , 0.2030992 ],
       [0.31703253, 0.61917921, 0.93238904, 0.50418195, 0.79852153],
       [0.9521861 , 0.05560746, 0.91190262, 0.59652453, 0.75316748]])

To see if a condition is met, we can do:

In [103]:
mat_b = mat > 0.5
mat_b

array([[False,  True,  True, False, False],
       [False,  True,  True,  True,  True],
       [ True, False,  True,  True,  True]])

We can then retrieve the elements which meet the foregoing condition as:

In [104]:
mat[mat_b]

array([0.84674717, 0.81242223, 0.61917921, 0.93238904, 0.50418195,
       0.79852153, 0.9521861 , 0.91190262, 0.59652453, 0.75316748])

In one shot, we can have:

In [105]:
mat[mat > 0.5]

array([0.84674717, 0.81242223, 0.61917921, 0.93238904, 0.50418195,
       0.79852153, 0.9521861 , 0.91190262, 0.59652453, 0.75316748])

We can also modify the elements of the matrix as below:

In [106]:
mat[mat > 0.5] = 2
mat

array([[0.06693328, 2.        , 2.        , 0.0687165 , 0.2030992 ],
       [0.31703253, 2.        , 2.        , 2.        , 2.        ],
       [2.        , 0.05560746, 2.        , 2.        , 2.        ]])