# MSBD5001 - Foundations of Data Analytics
# Tutorial 3
# NumPy


NumPy is a fundamental package for scientific computing in Python.

NumPy is a short form for "Numerical Python".

<url>https://numpy.org/</url>

To install NumPy in Jupyter Notebook using pip, run the following cell:

In [None]:
!pip install numpy 

Import the NumPy package

In [2]:
import numpy as np

## Arrays
- [Standard Python Library array](https://docs.python.org/3/library/array.html) only handles one-dimensional arrays and offers less functionality.
- NumPy’s array class is called ndarray (the N-dimensional array). 
- It provides a powerful N-dimensional array object. 
- It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers.
- In NumPy, dimensions are called axes.
- The shape of an ndarray is a tuple of integers, which each integer is the size of the array along each dimensions.

### Create and Initialize a NumPy Array (ndarray)

- To create an array, we can use the numpy.array() method.
- We can initialize an ndarray from a Python list of elements. 

In [3]:
list1 = [5, 3, 6]
array1 = np.array(list1)
print (array1)

[5 3 6]


- To obtain the shape of an array, we can use the [numpy.ndarray.shape](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html) attribute.

- For the above example, since array1 only has 1 dimension, the shape of it is (3, ), which indicates that dimension 1 has 3 elements.

In [4]:
print(array1.shape)

(3,)


- To access each element in the array, we use square brackets and the corresponding indices for each dimension.

In [5]:
print(array1[0], array1[1], array1[2], sep=",")

5,3,6


#### One-dimensional Array

In [6]:
a = np.array([1, 2, 3, 4]) # Create an array with 1 axes and a length of 3
print (a)
print (type(a)) # Return the type of the object, a

[1 2 3 4]
<class 'numpy.ndarray'>


In [7]:
print (a.ndim) # Return the number of axes (dimensions) of the array
print (a.shape) # Return the dimensions of the array
print (a.size) # Return the number of elements
print (a.dtype.name) # Return the type of the elements in the array

1
(4,)
4
int32


In [8]:
print(a[0], a[1], a[2], a[3])
print(a[-1])

1 2 3 4
4


#### Two-dimensional Array

In [9]:
b = np.array([[ 1., 0., 0.], 
              [ 0., 1., 2.]]) # Create an array with 2 axes
                              # 1st axis has a length of 2, 
                              # 2nd axis has a length of 3 
print (b)
print (type(b))

[[1. 0. 0.]
 [0. 1. 2.]]
<class 'numpy.ndarray'>


In [10]:
print (b.ndim)
print (b.shape)
print (b.size)
print (b.dtype.name)

2
(2, 3)
6
float64


In [11]:
print (b[0], b[1], sep="\t")
print (b[0, 0], b[0, 1], b[0, 2], b[1, 0], b[1, 1], b[1, 2], sep=", ")

[1. 0. 0.]	[0. 1. 2.]
1.0, 0.0, 0.0, 0.0, 1.0, 2.0


#### Three-dimensional Array

In [12]:
c = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print (c)

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


In [13]:
print (c.ndim)
print (c.shape)
print (c.size)

3
(2, 2, 3)
12


In [14]:
print (c[0, 0, 0], c[1, 1, 1])

1 11


### More about Creating Arrays

NumPy offers several functions to create arrays with initial placeholder content.
- numpy.zeros(shape), which creates an array of zeros with specified shape 
- numpy.ones(shape), which creates an array of ones with specified shape
- numpy.full(shape, constant), which creates a constant array with specified shape
- numpy.eye(i), which creates a ixi identify matrix
- numpy.random.random(shape), which creates an array of random values with specified shape
- numpy.arange(), which creates an array with a sequence of evenly spaced numbers within a given interval 
- numpy.linspace()

In [15]:
np.zeros( (3, 4) ) # Create an 3x4 array full of zeros 
                   # (i.e. a two dimensional array with 3 rows and 4 elements each row)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [16]:
np.ones( (3, 4) ) # Create an array full of ones

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [17]:
np.full( (3, 4), 5) # Create a constant array

array([[5, 5, 5, 5],
       [5, 5, 5, 5],
       [5, 5, 5, 5]])

In [18]:
np.eye(3) # Return a 3x3 identify matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [19]:
np.random.random((3, 4)) # Return a 3x4 array filled with random values

array([[0.95807152, 0.02277773, 0.10667925, 0.36269869],
       [0.51258402, 0.22810189, 0.03542813, 0.14546954],
       [0.45486006, 0.79664748, 0.79413054, 0.78954938]])

- arange(end): the sequence of values are from 0 to end (exclusive) 
- arange(start, end): the sequence of values are from start to end (exclusive)
- arnage(start, end, step): the values are within start to end (exclusive) and with the spacing between two values given by step

In [20]:
np.arange(10) # Return an array with a sequence of numbers from 0 to 10 (exclusive)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
np.arange(1, 10) # Return an array with a sequence of numbers from 1 to 10 (exclusive)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
np.arange(1, 10, 2) # Return an array with a sequence of numbers from 1 to 10 (exlusive) 
                    # and the step is 2 between two numbers in the sequence.

array([1, 3, 5, 7, 9])

- linspace(start, end, num=50)

In [23]:
print (np.linspace(0, 9)) # Return an array with 50 evenly spaced numbers between start to end (exclusive)

[0.         0.18367347 0.36734694 0.55102041 0.73469388 0.91836735
 1.10204082 1.28571429 1.46938776 1.65306122 1.83673469 2.02040816
 2.20408163 2.3877551  2.57142857 2.75510204 2.93877551 3.12244898
 3.30612245 3.48979592 3.67346939 3.85714286 4.04081633 4.2244898
 4.40816327 4.59183673 4.7755102  4.95918367 5.14285714 5.32653061
 5.51020408 5.69387755 5.87755102 6.06122449 6.24489796 6.42857143
 6.6122449  6.79591837 6.97959184 7.16326531 7.34693878 7.53061224
 7.71428571 7.89795918 8.08163265 8.26530612 8.44897959 8.63265306
 8.81632653 9.        ]


In [24]:
print (np.linspace(0, 9, 3)) # Return an array with 3 evenly spaced numbers
                             # between 0 to 9

[0.  4.5 9. ]


### Changing the shape of the array

#### Reshaping
- Through reshaping, we can create a new array by adding or removing dimensions or changing the number of elements in each dimension
- [numpy.ndarray.reshape(new_shape)](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html?highlight=reshape#numpy.ndarray.reshape)

In [25]:
a = np.array([ [0, 1, 2, 3], [4, 5, 6, 7]]) # creates a 2x4 2d array
print ("a =", a)
b = a.reshape(4, 2) # change the shape to a 2x4 array
print ("b =", b)

a = [[0 1 2 3]
 [4 5 6 7]]
b = [[0 1]
 [2 3]
 [4 5]
 [6 7]]


#### Resizing
- We can change the shape and size of an array in-place.
- [numpy.ndarray.resize(new_shape)](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.resize.html?highlight=resize#numpy.ndarray.resize)

In [26]:
print("a = ", a)
a.resize(4, 2) 
print ("after resizing, a =", a)

a =  [[0 1 2 3]
 [4 5 6 7]]
after resizing, a = [[0 1]
 [2 3]
 [4 5]
 [6 7]]


#### Flatten an Array
- [numpy.ndarray.ravel()] returns a flattened array

In [27]:
print ("a = ", a)
c = a.ravel()
print ("c =", c)

a =  [[0 1]
 [2 3]
 [4 5]
 [6 7]]
c = [0 1 2 3 4 5 6 7]


### More about printing an array

If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners.

In [28]:
print (np.arange(10000))

[   0    1    2 ... 9997 9998 9999]


In [29]:
print (np.arange(10000).reshape(500, 20)) # reshape() returns an array 
                                          # with a modified shape

[[   0    1    2 ...   17   18   19]
 [  20   21   22 ...   37   38   39]
 [  40   41   42 ...   57   58   59]
 ...
 [9940 9941 9942 ... 9957 9958 9959]
 [9960 9961 9962 ... 9977 9978 9979]
 [9980 9981 9982 ... 9997 9998 9999]]


### Transposing the array

In [30]:
a = np.array([ [0, 1, 2, 3], [4, 5, 6, 7]]) # 2 x 4 array
print ("a =", a)
print ("shape = ", a.shape)

b1 = a.transpose() # standard transpose of the 2D array
print ("b1 =", b1)

b2 = a.transpose(1, 0) # transpose of the 2D array with permutation of axes [1, 0]
                       # i.e. from (2, 4) to (4, 2)
print ("b1 =", b1)

b2 = a.transpose(1, 0) # transpose of the 2D array with permutation of axes [0, 1]
                       # i.e. from (2, 4) to (2, 4)
print ("b1 =", b1)

c = a.reshape(4, 2) # reshape the array c to a 2D array with 4 rows and 2 cols
print ("c =", c)


a = [[0 1 2 3]
 [4 5 6 7]]
shape =  (2, 4)
b1 = [[0 4]
 [1 5]
 [2 6]
 [3 7]]
b1 = [[0 4]
 [1 5]
 [2 6]
 [3 7]]
b1 = [[0 4]
 [1 5]
 [2 6]
 [3 7]]
c = [[0 1]
 [2 3]
 [4 5]
 [6 7]]


### Flip the array
- reverse the elements in the array with the shape unchanged.

In [31]:
a = np.array([ [0, 1, 2, 3], [4, 5, 6, 7]]) # 2 x 4 array
print ("a =", a)
print ("shape = ", a.shape)

b = np.flip(a)
print ("b = ", b)
print ("shape = ", b.shape)

a = [[0 1 2 3]
 [4 5 6 7]]
shape =  (2, 4)
b =  [[7 6 5 4]
 [3 2 1 0]]
shape =  (2, 4)


### Reshape vs Transpose vs Flip

In [32]:
a = np.arange(12).reshape(3, 4) # numpy.ndarray.reshape() equivalent to numpy.reshape()
print ("a =", a)
print ("shape = ", a.shape)

b = a.reshape(4, 3)
print ("b =", b)

c = a.transpose() # numpy.ndarray.transpose() equivalent to numpy.transpose()
print ("c =", c)

d = np.flip(a) # numpy.flip()
print ("d = ", d)


a = [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
shape =  (3, 4)
b = [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
c = [[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]
d =  [[11 10  9  8]
 [ 7  6  5  4]
 [ 3  2  1  0]]


## Array Arithmetics

Arithmetic operators on arrays apply elementwise. 

In [33]:
a = np.arange(0, 12).reshape(3, 4)
print (a)
print (a * 10)
print (a < 10)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[  0  10  20  30]
 [ 40  50  60  70]
 [ 80  90 100 110]]
[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True False False]]


### Matrix product
- Matrix multiplication can be done by:
  - numpy.ndarray.dot()
  - numpy.matmul() (@ operator)

In [34]:
a = np.array([[1, 0], [0, 1]])
b = np.array([[1, 2], [3, 4]])

print (a.dot(b)) # Return matrix product

print (a @ b) # Return matrix product
print (np.matmul(a, b)) # Return matrix product

[[1 2]
 [3 4]]
[[1 2]
 [3 4]]
[[1 2]
 [3 4]]


### Array Multiplication vs Matrix Multiplication

In [35]:
a = np.array([[1, 0], [0, 1]])
b = np.array([[1, 2], [3, 4]])

print (a * b) # Return elementwise product
print (a @ b) # Return matrix product
print (a.dot(b)) # Return matrix product

[[1 0]
 [0 4]]
[[1 2]
 [3 4]]
[[1 2]
 [3 4]]


## Other NumPy Functions
NumPy also provides many functions for doing computations on arrays, for example,

### [Mathematical functions](https://numpy.org/doc/stable/reference/routines.math.html)
  - numpy.ndarray.sum(axis=None)
    - Return the sum of the array elements over a given axis
    
### [Statistics](https://numpy.org/doc/stable/reference/routines.statistics.html)
  - numpy.mean(a, axis=None)
    - Return the average along a given axis
  - numpy.average(a, axis=None, weights=None)
    - Return the weighted average along a given axis
  - numpy.median(a, axis=None)
    - Compute the median along the specified axis
  - numpy.percentile(a, q, axis=None)
    - Compute the q-th percentile of array a along the given axis

In [36]:
a = np.arange(0, 12).reshape(3, 4)
print(a)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [37]:
print (a.sum()) # sum all elements
print (a.sum(0)) # sum the elements along axis 0 (i.e. each column)
print (a.sum(1)) # sum the elements along axis 1 (i.e. each row)

66
[12 15 18 21]
[ 6 22 38]


In [38]:
print (np.mean(a))
print (np.mean(a, 0))
print (np.percentile(a, 50))
print (np.median(a))

5.5
[4. 5. 6. 7.]
5.5
5.5


## Indexing and Slicing

- Similar to Python lists, NumPy arrays can be indexed and sliced.
- If an array is multi-dimensional, then we must specify a slice for each dimension of the array.

For example, if a is a NumPy array,
### Indexing:
- a[i]
  - Select the element at index i
- a[-i]
  - Select the element from the end

### Slicing
- a[i:j]
  - Select the elements from index i to j-1
- a[:]
  - Select all elements in the corresponding dimension (axis)
- a[0:]
  - Select all elements in the corresponding dimension (axis)
- a[i:]
  - Select all elements from index i to the end (inclusive)
- a[:j]
  - Select all elements from index 0 to index j-1
- a[i:j:n]
  - Select the elements from index i to j, with a step of n
- a[::-1]
  - Select all elements in the reversed order

#### Indexing and slicing on one-dimensional array

In [39]:
a = np.arange(0, 8, 2) # creates a 1d array with 4 even numbers
print (a)
print (a[2]) # Indexing, access the 3rd element with index 2
print (a[:]) # Slicing, access all elements
print (a[1:3]) # Slicing, access the elements with index starting at 1 and ending at 3-1
print (a[::-1]) # Slicing, access all elements in the reversed order

[0 2 4 6]
4
[0 2 4 6]
[2 4]
[6 4 2 0]


#### Indexing and slicing on two-dimensional array

In [40]:
# Indexing and slicing on multi-dimensional array
b = np.arange(0, 24, 2).reshape(3, 4)
print (b)
print (b[1, 2])  # Indexing, indices are given in a list of numbers separated by commas
print (b[0:2, 1]) # Slicing, returns each row in the second column
print (b[:, 1]) # Slicing, returns each row in the second column
print (b[::-1, ::-1]) # Slicing, access all rows in reversed order, and also all elements in each row in reversed order
print (b[:, ::-1]) # Slicing, access all rows, and in each row, access the elements in reversed order

[[ 0  2  4  6]
 [ 8 10 12 14]
 [16 18 20 22]]
12
[ 2 10]
[ 2 10 18]
[[22 20 18 16]
 [14 12 10  8]
 [ 6  4  2  0]]
[[ 6  4  2  0]
 [14 12 10  8]
 [22 20 18 16]]


#### Mixing Indexing with Slicing

In [41]:
a = np.arange(0, 12).reshape(3, 4)
print (a)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [42]:
print(a[1, :]) # Returns the second row of a as a 1d array
print(a[0:2, :]) # Returns the 1st and 2nd rows of a as a 2d array

[4 5 6 7]
[[0 1 2 3]
 [4 5 6 7]]


In [43]:
print(a[:, 3]) # Returns the last elements in each row of a as 1d array
print(a[:, 0:2]) # Returns the first two elements in each row of a as a 2d array

[ 3  7 11]
[[0 1]
 [4 5]
 [8 9]]


### Indexing with Array of Indices
- We can create an array using data from another array

In [44]:
a = np.arange(9).reshape(3, 3) # [[0,1,2],[3,4,5],[6,7,8]]
print (a[[0, 1, 2], [0, 1, 0]]) # a[0, 0], a[1, 1], a[2, 0] are selected
print (a[[0, 2], [0, 1]]) # a[0, 0], a[2, 1] are selected
print (a[[0, 0], [2, 2]]) # a[0, 2] are selected twice

[0 4 6]
[0 7]
[2 2]


### Indexing with Boolean Array
- We create create an array by selecting elements of an array that satisfy certain condition

In [45]:
a = np.arange(9).reshape(3, 3) # [[0,1,2],[3,4,5],[6,7,8]]
print (a)
print (a > 0) # Returns an ndarray of Boolean values with the same shape as a
              # If the element in a is larger than 0 (>0), the corresponding position in the returned array is True

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[False  True  True]
 [ True  True  True]
 [ True  True  True]]


In [46]:
a = np.arange(9).reshape(3, 3) # [[0,1,2],[3,4,5],[6,7,8]]
print (a[ a<5 ]) # Returns an 1d array with elements that are smaller than 5
print (a[ a%2 == 0 ]) # Returns an 1d array with elements that are even

[0 1 2 3 4]
[0 2 4 6 8]


## Iterating

In [47]:
b = np.arange(0, 24, 2).reshape(3, 4)
for row in b:
    print (row)

[0 2 4 6]
[ 8 10 12 14]
[16 18 20 22]


In [48]:
for x in b.flat:
    print (x)

0
2
4
6
8
10
12
14
16
18
20
22


## Copying

In [49]:
a = np.arange(6).reshape(2, 3) # [[0,1,2],[3,4,5]]
print (a)

b = a # b is a reference of a, not a new array object
b[0, 2] = 100 # Modifying one of b's elements
print (a) # a is modified 

[[0 1 2]
 [3 4 5]]
[[  0   1 100]
 [  3   4   5]]


In [50]:
a = np.arange(6).reshape(2, 3) # [[0,1,2],[3,4,5]]
b = a.copy() # b is a new array object

b[0, 2] = 100 # Modifying one of b's elements
print (a) # Only b is modified, but not a

[[0 1 2]
 [3 4 5]]


In [51]:
a = np.arange(6).reshape(2, 3) # [[0,1,2],[3,4,5]]
b = a.view() # b is a new array object

b[0, 2] = 100 # Modifying one of b's elements
b.resize(3, 2) # b's shape is changed
print (a) # a's data is changed but shape is not changed

[[  0   1 100]
 [  3   4   5]]


In [52]:
a = np.arange(9).reshape(3, 3) # [[0,1,2],[3,4,5],[6,7,8]]
b = a[:2] # b is a new array object: [[0,1,2],[3,4,5]]
b[0, 2] = 100 # Modifying one of b's elements
print (a) # a is also modified

[[  0   1 100]
 [  3   4   5]
 [  6   7   8]]


## Data Types
- NumPy tries to guess the datatype when we are creating an array.
- We can also create an array specifying the datatype explicitly.
- For more about the datatypes, https://numpy.org/doc/stable/reference/arrays.scalars.html

In [53]:
x = np.double([1, 2, 3]) # double-precision floating-point number
print (x)
print (type(x[0]))

[1. 2. 3.]
<class 'numpy.float64'>


In [54]:
y = np.intc([0.5, 1.5, -2.5]) # signed integers
print (y)
print (type(y[0]))

[ 0  1 -2]
<class 'numpy.intc'>


In [55]:
z = np.arange(5, dtype=np.int64) # create an array of integer values
print (z)

[0 1 2 3 4]


In [56]:
w = np.array([0.5, 1.5, 2.5], dtype=np.double) # create an array of double datatype values
print (w)

[0.5 1.5 2.5]


In [57]:
a = w.astype(int) # Return a copy of the array casted to the given type
print (a)

[0 1 2]


## Broadcasting

- NumPy broadcasting allows us to work with arrays of different shapes when performing arithmetic operations.

In [58]:
a = np.arange(1, 13).reshape(4, 3)
print (a)
b = np.array([1,0,1])
print (b)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[1 0 1]


In [59]:
x = a + b # add the vector v to each row vector in matrix a with broadcasting
print (x)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


- The use of broadcasting above is the same as using an explicit loop below.
- Broadcasting will be more efficient than the explicit loop when the matrix is very large.

In [60]:
y = np.zeros((4, 3), dtype=int)
for i in range(4):
    y[i, :] = a[i, :] + b
print (y)    

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]
