# Chapter 10 - Numpy
## Building Machine Learning and Deep Learning Models on Google Cloud Platform
### Ekaba Bisong

In [0]:
# load numpy package
import numpy as np

## NumPy 1-D Array

In [0]:
# create a simple 1-D NumPy array
my_array = np.array([2,4,6,8,10])
my_array

array([ 2,  4,  6,  8, 10])

In [0]:
# the data-type of a NumPy array is the ndarray
type(my_array)

numpy.ndarray

In [0]:
# a NumPy 1-D array can also be seen a vector with 1 dimension
my_array.ndim

1

In [0]:
# check the shape to get the number of rows and columns in the array \
# read as (rows, columns)
my_array.shape

(5,)

### Create an array from a Python list

In [0]:
# create an array from a Python list
my_list = [9, 5, 2, 7]

In [0]:
type(my_list)

list

In [0]:
# convert a list to a numpy array
list_to_array = np.array(my_list) # or np.asarray(my_list)

In [0]:
type(list_to_array)

numpy.ndarray

### Other useful methods for creating arrays

In [0]:
# create an array from a range of numbers
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [0]:
# create an array from start to end (exclusive) via a step size - (start, stop, step)
np.arange(2, 10, 2)

array([2, 4, 6, 8])

In [0]:
# create a range of points between two numbers
np.linspace(2, 10, 5)

array([ 2.,  4.,  6.,  8., 10.])

In [0]:
# create an array of ones
np.ones(5)

array([1., 1., 1., 1., 1.])

In [0]:
# create an array of zeros
np.zeros(5)

array([0., 0., 0., 0., 0.])

## NumPy Datatypes
Let’s explore a bit with NumPy datatypes:

In [0]:
# ints
my_ints = np.array([3, 7, 9, 11])
my_ints.dtype

dtype('int64')

In [0]:
# floats
my_floats = np.array([3., 7., 9., 11.])
my_floats.dtype

dtype('float64')

In [0]:
# non-contiguous types - default: float
my_array = np.array([3., 7., 9, 11])
my_array.dtype

dtype('float64')

In [0]:
# manually assigning datatypes
my_array = np.array([3, 7, 9, 11], dtype="float64")
my_array.dtype

dtype('float64')

## Indexing + Fancy Indexing (1-D)

We can index a single element of a NumPy 1-D array similar to how we index a Python list.

In [0]:
# create a random numpy 1-D array
my_array = np.random.rand(10)
my_array

array([0.78883692, 0.7565035 , 0.35089127, 0.60050156, 0.13770897,
       0.22563364, 0.50824592, 0.9852307 , 0.27490702, 0.52787594])

In [0]:
# index the first element
my_array[0]

0.7888369208826349

In [0]:
# index the last element
my_array[-1]

0.5278759407455007

### Boolean Mask

Let’s index all the even integers in the array using a boolean mask.

In [0]:
# create 10 random integers between 1 and 20
my_array = np.random.randint(1, 20, 10)
my_array

array([ 9, 11,  5, 18, 15,  5,  8, 18, 17, 19])

In [0]:
# index all even integers in the array using a boolean mask
my_array[my_array % 2 == 0]

array([18,  8, 18])

Observe that the code `my_array % 2 == 0` output’s an array of booleans

In [0]:
my_array % 2 == 0

array([False, False, False,  True, False, False,  True,  True, False,
       False])

### Integer Mask

Let’s select all elements with even indices in the array.

In [0]:
# create 10 random integers between 1 and 20
my_array = np.random.randint(1, 20, 10)
my_array

array([ 1, 18,  8, 12, 10,  2, 17,  4, 17, 17])

In [0]:
my_array[np.arange(1,10,2)]

array([18, 12,  2,  4, 17])

Remember that array indices are indexed from 0. So the second element, 18 is in index 1.

In [0]:
np.arange(1,10,2)

array([1, 3, 5, 7, 9])

## Slicing a 1-D Array

Slicing a NumPy array is also similar to slicing a Python list.

In [0]:
my_array = np.array([14,  9,  3, 19, 16,  1, 16,  5, 13,  3])
my_array

array([14,  9,  3, 19, 16,  1, 16,  5, 13,  3])

In [0]:
# slice the first 2 elements
my_array[:2]

array([14,  9])

In [0]:
# slice the last 3 elements
my_array[-3:]

array([ 5, 13,  3])

## Basic Math Operations on Arrays: Universal Functions
We’ll explore a couple of basic arithmetic with NumPy 1-D arrays.

In [0]:
# create an array of even numbers between 2 and 10
my_array = np.arange(2,11,2)
my_array

array([ 2,  4,  6,  8, 10])

In [0]:
# sum of array elements
np.sum(my_array) # or my_array.sum()

30

In [0]:
# square root
np.sqrt(my_array)

array([1.41421356, 2.        , 2.44948974, 2.82842712, 3.16227766])

In [0]:
# log
np.log(my_array)

array([0.69314718, 1.38629436, 1.79175947, 2.07944154, 2.30258509])

In [0]:
# exponent
np.exp(my_array)

array([7.38905610e+00, 5.45981500e+01, 4.03428793e+02, 2.98095799e+03,
       2.20264658e+04])

## Higher-Dimensional Arrays
Previously, we covered the creation of 1-D arrays (or vectors) in NumPy to get a feel of how NumPy works.  
This section will now consider working with 2-D and 3-D arrays. 2-D arrays are ideal for storing data for analysis.  
Also, other data forms like images are adequately represented using 3-D arrays.

### Creating 2-D arrays (Matrices)

Let us construct a simple 2-D array

In [0]:
# construct a 2-D array
my_2D = np.array([[2,4,6],
                    [8,10,12]])
my_2D

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [0]:
# check the number of dimensions
my_2D.ndim

2

In [0]:
# get the shape of the 2-D array - this example has 2 rows and 3 columns: (r, c)
my_2D.shape

(2, 3)

Let’s explore common methods in practice for creating 2-D NumPy arrays, which are also matrices.

In [0]:
# create a 3x3 array of ones
np.ones([3,3])

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [0]:
# create a 3x3 array of zeros
np.zeros([3,3])

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [0]:
# create a 3x3 array of a particular scalar - full(shape, fill_value)
np.full([3,3], 2)

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [0]:
# create a 3x3, empty uninitialized array
np.empty([3,3])

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [0]:
# create a 4x4 identity matrix - i.e., a matrix with 1's on its diagonal
np.eye(4) # or np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

### Creating 3-D arrays

Let’s construct a basic 3-D array.

In [0]:
# construct a 3-D array
my_3D = np.array([[
                     [2,4,6],
                     [8,10,12]
                    ],[
                     [1,2,3],
                     [7,9,11]
                    ]])
my_3D

array([[[ 2,  4,  6],
        [ 8, 10, 12]],

       [[ 1,  2,  3],
        [ 7,  9, 11]]])

In [0]:
# check the number of dimensions
my_3D.ndim

3

In [0]:
# get the shape of the 3-D array - this example has 2 pages, 2 rows and 3 columns: (p, r, c)
my_3D.shape

(2, 2, 3)

We can also create 3-D arrays with methods such as `ones`, `zeros`, `full`, and `empty` by passing the configuration for `[page, row, columns]` into the shape parameter of the methods. For example:

In [0]:
# create a 2-page, 3x3 array of ones
np.ones([2,3,3])

array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])

In [0]:
# create a 2-page, 3x3 array of zeros
np.zeros([2,3,3])

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

### Indexing/ Slicing of Matrices

Let’s see some examples of indexing and slicing two dimensional arrays. The concept extend nicely from doing the same with 1-D arrays.

In [0]:
# create a 3x3 array contain random normal numbers
my_3D = np.random.randn(3,3)
my_3D

array([[-1.02147655, -0.79475226, -0.99177578],
       [ 0.54051695, -0.45740188, -0.13316795],
       [ 0.65666835, -0.63720061, -0.25384248]])

In [0]:
# select a particular cell (or element) from a 2-D array.
# In this case, the cell at the 2nd row and column
my_3D[1,1]    

-0.4574018774910409

In [0]:
# slice the last 3 columns
my_3D[:,1:3]

array([[-0.79475226, -0.99177578],
       [-0.45740188, -0.13316795],
       [-0.63720061, -0.25384248]])

In [0]:
# slice the first 2 rows and columns
my_3D[0:2, 0:2]

array([[-1.02147655, -0.79475226],
       [ 0.54051695, -0.45740188]])

## Matrix Operations: Linear Algebra
Linear Algebra is a convenient and powerful system for manipulating a set of data features and is one of the strong points of NumPy.

### Matrix Multiplication (dot product)
First let’s create random integers using the method `np.random.randint(low, high=None, size=None,)` which returns random integers from low (inclusive) to high (exclusive).

In [0]:
# create a 3x3 matrix of random integers in the range of 1 to 50
A = np.random.randint(1, 50, size=[3,3])
B = np.random.randint(1, 50, size=[3,3])

In [0]:
# print the array A
A

array([[38,  4, 20],
       [40, 17, 49],
       [19, 46, 40]])

In [0]:
# print the array A
B

array([[ 9, 10,  5],
       [45, 30, 32],
       [27, 45, 21]])

We can use the following routines for matrix multiplication, `np.matmul(a,b)` or `a @ b` if using Python 3.6.  
Remember that when multiplying matrices, the inner matrix dimensions must agree. For example, if A
is an `m×n` matrix and B is an `n×p` matrix, the product of the matrices will be an `m×p` matrix with the inner dimensions of the respective matrices `n` agreeing.  
![Matrix multiplication](https://ekababisong.org/assets/seminar_IEEE/matrix_mul.png)

In [0]:
# multiply the two matrices A and B (dot product)
A @ B    # or np.matmul(A,B)

array([[1062, 1400,  738],
       [2448, 3115, 1773],
       [3321, 3370, 2407]])

In [0]:
np.matmul(A,B)

array([[1062, 1400,  738],
       [2448, 3115, 1773],
       [3321, 3370, 2407]])

### Element-wise operations
Element-wise matrix operations involve matrices operating on themselves in an element-wise fashion. The action can be an addition, subtraction, division or multiplication (which is commonly called the Hadamard product). The matrices must be of the same shape. **Please note** that while a matrix is of shape `n×n`, a vector is of shape `n×1`. These concepts easily apply to vectors as well.
![Element-wise matrix operations](https://ekababisong.org/assets/seminar_IEEE/element-wise.png)

Let’s have some examples.

In [0]:
# Hadamard multiplication of A and B
A * B

array([[ 342,   40,  100],
       [1800,  510, 1568],
       [ 513, 2070,  840]])

In [0]:
# add A and B
A + B

array([[47, 14, 25],
       [85, 47, 81],
       [46, 91, 61]])

In [0]:
# subtract A from B
B - A

array([[-29,   6, -15],
       [  5,  13, -17],
       [  8,  -1, -19]])

In [0]:
# divide A with B
A / B

array([[4.22222222, 0.4       , 4.        ],
       [0.88888889, 0.56666667, 1.53125   ],
       [0.7037037 , 1.02222222, 1.9047619 ]])

### Scalar Operation

A matrix can be acted upon by a scalar (i.e., a single numeric entity) in the same way element-wise fashion. This time the scalar operates upon each element of the matrix or vector.
![Scalar operations](https://ekababisong.org/assets/seminar_IEEE/scalar-op.png)

Let’s look at some examples.

In [0]:
# Hadamard multiplication of A and a scalar, 0.5
A * 0.5

array([[19. ,  2. , 10. ],
       [20. ,  8.5, 24.5],
       [ 9.5, 23. , 20. ]])

In [0]:
# add A and a scalar, 0.5
A + 0.5

array([[38.5,  4.5, 20.5],
       [40.5, 17.5, 49.5],
       [19.5, 46.5, 40.5]])

In [0]:
# subtract a scalar 0.5 from B
B - 0.5

array([[ 8.5,  9.5,  4.5],
       [44.5, 29.5, 31.5],
       [26.5, 44.5, 20.5]])

In [0]:
# divide A and a scalar, 0.5
A / 0.5

array([[76.,  8., 40.],
       [80., 34., 98.],
       [38., 92., 80.]])

### Matrix Transposition

Transposition is a vital matrix operation that reverses the rows and columns of a matrix by flipping the row and column indices. The transpose of a matrix is denoted as $A^T$. Observe that the diagonal elements remain unchanged.
![Matrix transpose](https://ekababisong.org/assets/seminar_IEEE/matrix-transpose.png)

Let’s see an example.

In [0]:
A = np.array([[15, 29, 24],
                [ 5, 23, 26],
                [30, 14, 44]])
# transpose A
A.T   # or A.transpose()

array([[15,  5, 30],
       [29, 23, 14],
       [24, 26, 44]])

### The Inverse of a Matrix
**Note:** A square matrix is a matrix that has the same number of rows and columns.

Let’s use NumPy to get the inverse of a matrix. Some linear algebra modules are found in a sub-module of NumPy called `linalg`.

In [0]:
A = np.array([[15, 29, 24],
                [ 5, 23, 26],
                [30, 14, 44]])
# find the inverse of A
np.linalg.inv(A)

array([[ 0.05848375, -0.08483755,  0.01823105],
       [ 0.05054152, -0.00541516, -0.02436823],
       [-0.05595668,  0.05956679,  0.01805054]])

NumPy also implement the ***Moore-Penrose pseudo inverse***, which gives an inverse derivation for degenerate matrices. Here, we use the `pinv` to find the inverses of invertible matrices.

In [0]:
# using pinv()
np.linalg.pinv(A)

array([[ 0.05848375, -0.08483755,  0.01823105],
       [ 0.05054152, -0.00541516, -0.02436823],
       [-0.05595668,  0.05956679,  0.01805054]])

## Reshaping

A NumPy array can be restructured to take-on a different shape. Let’s convert a 1-D array to a `m×n` matrix.

In [0]:
# make 20 elements evenly spaced between 0 and 5
a = np.linspace(0,5,20)
a

array([0.        , 0.26315789, 0.52631579, 0.78947368, 1.05263158,
       1.31578947, 1.57894737, 1.84210526, 2.10526316, 2.36842105,
       2.63157895, 2.89473684, 3.15789474, 3.42105263, 3.68421053,
       3.94736842, 4.21052632, 4.47368421, 4.73684211, 5.        ])

In [0]:
# observe that a is a 1-D array
a.shape

(20,)

In [0]:
# reshape into a 5 x 4 matrix
A = a.reshape(5, 4)
A

array([[0.        , 0.26315789, 0.52631579, 0.78947368],
       [1.05263158, 1.31578947, 1.57894737, 1.84210526],
       [2.10526316, 2.36842105, 2.63157895, 2.89473684],
       [3.15789474, 3.42105263, 3.68421053, 3.94736842],
       [4.21052632, 4.47368421, 4.73684211, 5.        ]])

In [0]:
# The vector a has been reshaped into a 5 by 4 matrix A
A.shape

(5, 4)

### Reshape vs. Resize Method

NumPy has the `np.reshape` and `np.resize` methods. The reshape method returns an ndarray with a modified shape without changing the original array, whereas the resize method changes the original array. Let’s see an example.

In [0]:
# generate 9 elements evenly spaced between 0 and 5
a = np.linspace(0,5,9)
a

array([0.   , 0.625, 1.25 , 1.875, 2.5  , 3.125, 3.75 , 4.375, 5.   ])

In [0]:
# the original shape
a.shape

(9,)

In [0]:
# call the resahpe method
a.reshape(3,3)

array([[0.   , 0.625, 1.25 ],
       [1.875, 2.5  , 3.125],
       [3.75 , 4.375, 5.   ]])

In [0]:
# the original array maintained it's shape
a.shape

(9,)

In [0]:
# call the resize method - resize does not return an array
a.resize(3,3)

In [0]:
# the resize method has changed the shape of the original array
a.shape

(3, 3)

### Stacking Arrays

NumPy has methods for concatenating arrays - also called stacking. The methods `hstack` and `vstack` are used to stack several arrays along the horizontal and vertical axis respectively.

In [0]:
# create a 2x2 matrix of random integers in the range of 1 to 20
A = np.random.randint(1, 50, size=[3,3])
B = np.random.randint(1, 50, size=[3,3])

In [0]:
# print out the array A
A

array([[16, 29, 20],
       [12,  3, 33],
       [ 1, 36, 29]])

In [0]:
# print out the array B
B

array([[12, 39, 31],
       [36, 21,  3],
       [49,  2, 34]])

Let’s stack `A` and `B` horizontally using `hstack`. To use `hstack`, the arrays must have the same number of rows. Also, the arrays to be stacked are passed as a tuple to the `hstack` method.

In [0]:
# arrays are passed as tuple to hstack
np.hstack((A,B))

array([[16, 29, 20, 12, 39, 31],
       [12,  3, 33, 36, 21,  3],
       [ 1, 36, 29, 49,  2, 34]])

To stack `A` and `B` vertically using `vstack` the arrays must have the same number of columns. The arrays to be stacked are also passed as a tuple to the `vstack` method.

In [0]:
# arrays are passed as tuple to hstack
np.vstack((A,B))

array([[16, 29, 20],
       [12,  3, 33],
       [ 1, 36, 29],
       [12, 39, 31],
       [36, 21,  3],
       [49,  2, 34]])

## Broadcasting
NumPy has an elegant mechanism for arithmetic operation on arrays with different dimensions or shapes. As an example, when a scalar is added to a vector (or 1-D array). The scalar value is conceptually broadcasted or stretched across the rows of the array and added element-wise.
![Broadcasting example of adding a scalar to a vector (or 1-D array)](https://ekababisong.org/assets/seminar_IEEE/scalar-broadcast.png)

Matrices with different shapes can be broadcasted to perform arithmetic operations by stretching the dimension of the smaller array. Broadcasting is another vectorized operation for speeding up matrix processing. However, not all arrays with different shapes can be broadcasted. **For broadcasting to occur, the trailing axes for the arrays must be the same size or 1.**

See the figure below for more illustration.
![Matrix broadcasting example](https://ekababisong.org/assets/seminar_IEEE/matrix-broadcast.png)
Let’s see this in code.

In [0]:
# create a 4 X 3 matrix of random integers between 1 and 10
A = np.random.randint(1, 10, [4, 3])
A

array([[1, 9, 8],
       [3, 6, 5],
       [4, 5, 2],
       [1, 2, 7]])

In [0]:
# create a 4 X 1 matrix of random integers between 1 and 10
B = np.random.randint(1, 10, [4, 1])
B

array([[6],
       [8],
       [2],
       [2]])

In [0]:
# add A and B
A + B

array([[ 7, 15, 14],
       [11, 14, 13],
       [ 6,  7,  4],
       [ 3,  4,  9]])

The example below cannot be broadcasted and will result in a `ValueError: operands could not be broadcast together with shapes (4,3) (4,2)` because the matrix `A` and `B` have different columns and does not fit with the afore-mention rules of broadcasting that the trailing axes for the arrays must be the same size or 1.

In [0]:
A = np.random.randint(1, 10, [4, 3])
B = np.random.randint(1, 10, [4, 2])
A + B

ValueError: operands could not be broadcast together with shapes (4,3) (4,2) 

## Loading Data
Loading data is an important process in the data analysis/ machine learning pipeline. Data usually comes in `.csv` format. `csv` files can be loaded into Python by using the `loadtxt` method. The parameter `skiprows` skips the first row of the dataset - it is usually the header row of the data.

In [0]:
np.loadtxt(open("the_file_name.csv", "rb"), delimiter=",", skiprows=1)

Pandas is a preferred package for loading data in Python. More about Pandas for data manipulation in the notebook `pandas.ipynb`.