# Introduction

Before we start, let's have a quick recap of the lecture.

## Numpy

Numpy is a very famous open-source Python package for scientific computing and is used very often in the field of machine learning researches. By using vectorization and pre-compiled binaries, Numpy greatly accelerate our computation which would otherwise take a long time normally in plain Python. Here's an example,


In [1]:
import time

x = list(range(1000000))
y = list(range(1000000))
count = 0
start_time = time.time()
for x, y in zip(x, y):
    count += x * y
print(count)
end_time = time.time()
print(f'the operation took {end_time - start_time} seconds')

333332833333500000
the operation took 0.1813645362854004 seconds


In [2]:
import numpy as np

x = np.arange(1000000)
y = np.arange(1000000)
start_time = time.time()
count = (x * y).sum()
print(count)
end_time = time.time()
print(f'the operation took {end_time - start_time} seconds')

333332833333500000
the operation took 0.006233692169189453 seconds


While it's not like we very often need to sum up arrays of a million elementse, Numpy also provide many useful function for getting statics. For example, standard deviation, mean, media etc. Also there are lots of functions of generating randoms, which are very important and useful for scientific researches, especially machine learning.


# Array

All numpy array is under the class `numpy.ndarray`. Compare to the original List in Python, there isn't much difference between the two except the fact that `numpy.ndarray` has a fixed size after construct. That means both `numpy.array` and `List` can store any thing we assign to it. However, one important thing is that most of the operations in Numpy requires the array to have numerical types. Another problems that might arise with such versitility is that some operations might require specific type of the data. If we don't take extra pre-cautions to the data types of the array, it could cause catastrophic results. (Imagine you have trained a model for 2 days only to find that the data type is wrong and the result is garbbage.)


In [15]:
class EmptyClass:
    pass


print(f'The type of numpy array is {type(x := np.array([1,2,3, EmptyClass()]))}')  # := is only available after python 3.8
print(f'The data type of the array is {x.dtype}')
x + 1  # This line causes to error!


The type of numpy array is <class 'numpy.ndarray'>
The data type of the array is object


TypeError: unsupported operand type(s) for +: 'EmptyClass' and 'int'

## Creating an numpy array

There are many functions provided by numpy to create arrays. Here are some of the examples:


In [24]:
def print_array(arr):
    print(f'Content of the array\n{arr}\nShape of the array {arr.shape}')


a = np.array([[1, 2, 3], [2, 3, 4]])  # Create array from other iterables (e.g. List)
b = np.ones_like(a)  # Create array with same shape and same type as a
c = np.zeros(a.shape)  # Create array with shape of other array
d = np.ones((3, 3, 3))  # Create array with provided shape (3, 3, 3)
e = np.arange(10)  # Create array with 10 consective numbers starting from 0
print_array(a)
print_array(b)
print_array(c)
print_array(d)
print_array(e)

Content of the array
[[1 2 3]
 [2 3 4]]
Shape of the array (2, 3)
Content of the array
[[1 1 1]
 [1 1 1]]
Shape of the array (2, 3)
Content of the array
[[0. 0. 0.]
 [0. 0. 0.]]
Shape of the array (2, 3)
Content of the array
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
Shape of the array (3, 3, 3)
Content of the array
[0 1 2 3 4 5 6 7 8 9]
Shape of the array (10,)


## Attributes of numpy array

As shown above, we have access attribute `shape` of the array. This shows the number of elements on each axis. Later in the lab when we mentioned first axis, nth axis, it would be derived from this attribute. This also show the order of the dimension when you're indexing the array, which will be discussed later.

Other attributes of the array includes,

-   ndim
-   dtype
-   et cetra


In [87]:
a = np.arange(6)
print(a.shape)
print(a.ndim)
print(a.dtype)

(6,)
1
int64


There are still some different functions that can create new arrays. You can refer to the lecture notes, or you can look into the [official tutorial of Numpy](https://numpy.org/doc/stable/user/basics.creation.html).

## Indexing

Numpy supports the Pythonic way to index ndarray arrays.


In [49]:
a = np.arange(100)
print(a[2], 'the second elements')
print(a[-1], 'the last elements')
print(a[10:20], 'select from index 10 to 20')
print(a[:10], 'select from index 0 to 10')
print(a[90:], 'select from index 90 to the end of the list')
print(a[-10:], 'select from index 10 (count from the end of list) to the end of list')
print(a[:], 'select from the start of list to the end of list')
print(a[1:7:2], 'select from index 1 to 7 (excluding 7 itself) with increment 2')
print(a[7:1:-1], 'select from index 7 to 1 (excluding 1 itself) with increment -1')


2 the second elements
99 the last elements
[10 11 12 13 14 15 16 17 18 19] select from index 10 to 20
[0 1 2 3 4 5 6 7 8 9] select from index 0 to 10
[90 91 92 93 94 95 96 97 98 99] select from index 90 to the end of the list
[90 91 92 93 94 95 96 97 98 99] select from index 10 (count from the end of list) to the end of list
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99] select from the start of list to the end of list
[1 3 5] select from index 1 to 7 (excluding 7 itself) with increment 2
[7 6 5 4 3 2] select from index 7 to 1 (excluding 1 itself) with increment -1


When there are more than 1 dimension, we can either do `a[0][1]` like normal Python code, or `a[0, 1]`. The axis here has the same order as what ```shape``` shows.


In [92]:
a = np.arange(1000).reshape(10, 10, 10)
print(a[5, 2:6, 1:3])
print(a[3:5, ...].shape)  # We can use ... to select all other dimensions


[[521 522]
 [531 532]
 [541 542]
 [551 552]]
(2, 10, 10)


## Integer Array Indexing

Besides the Pythonic way of indexing (slicing) numpy arrays, there's a more advance way to index numpy arrays using **integer** arrays. (Could be list or numpy.ndarray with integer dtype)


In [54]:
a = np.arange(6).reshape(3, 2)
print(a)
print(a[[0, 1, 2], [0, 1, 0]])  # select from index 0, 1, 2 at first axis and index 0, 1, 0 from second axis


[[0 1]
 [2 3]
 [4 5]]
[0 3 4]


With the indexing methods above, we can copy and amend the values easily without writing long codes.

## Boolean Array Indexing

We can also use True and False to choose individual values. However, this is probably more useful when we combine it with [logic functions](https://numpy.org/doc/stable/reference/routines.logic.html) from Numpy.


In [63]:
a = np.arange(12).reshape(3, 4)
selection = np.array([[True, True, True, False], [True, True, True, True], [False, True, False, False]])
print(a[selection])  # Boolean Array Indexing


[0 1 2 4 5 6 7 9]


In [67]:
selection1 = a < 6  # Comparison
selection2 = a > 10  # Comparison
selection = np.logical_or(selection1, selection2)  # Logical OR
selection = selection1 | selection2 # same as previous line
print(selection)

a[selection]

[[ True  True  True  True]
 [ True  True False False]
 [False False False  True]]


array([ 0,  1,  2,  3,  4,  5, 11])

## Data types

Numpy array has type. Numpy usually decides the type of the array for us, but sometimes we might want to declare or convert the datatype of the array. E.g. when we're trying to index an array.


In [75]:
class empty:
    pass


print(np.array([1, 2]).dtype)
print(np.array([1, 2, 3.]).dtype)  # the third element is of type float, float is a more general type than int
print(np.array([1, 2, 3., "https://www.youtube.com/watch?v=dQw4w9WgXcQ", empty()]).dtype)  # <object> is the most general type in Python


int64
float64
object


In [79]:
a = np.arange(10)
b = np.arange(10, dtype=np.float32)

print(a[b[:5].astype(np.int32)])  # select from index 0 to 5 and convert to int32
a[b]  # index a using b, but b is of type float32, so error is raised


[0 1 2 3 4]


IndexError: arrays used as indices must be of integer (or boolean) type

## Array Arithmetic

Numpy supports elementwise arithmetic operations (+, -, \*, /) and basic matrix operations (e.g. dot product, transpose. It's okay if you don't know what are these).


In [81]:
a = np.arange(6).reshape(2, 3)
b = np.arange(10, 16).reshape(2, 3)
print(a + b)
print(a @ b.T)  # a @ b.T is equivalent to np.dot(a, b.T). This is called matrix multiplication where b.T is the transpose of b. (Again, not necessary for you to know what this means exactly in this course)


[[10 12 14]
 [16 18 20]]
[[ 35  44]
 [134 170]]


## Numpy Functions

Numpy provides many different functions for performing computations on arrays. One of them would be numpy.sum()


In [94]:
a = np.arange(6).reshape(2, 3)
print(a)
print(np.sum(a))
print(a.sum())  # this is equivalent to the previous line
print(a.sum(axis=0))  # sum along the first axis
print(a.sum(axis=1))  # sum along the second axis (0+1+2), (3+4+5)
print(a.min())  # this is equivalent to the previous line
print(a.min(axis=0))  # min along the first axis
print(a.min(axis=1))  # min along the second axis


[[0 1 2]
 [3 4 5]]
15
15
[3 5 7]
[ 3 12]
0
[0 1 2]
[0 3]


## Broadcasting

Broadcasting allows us to perform operations on arrays with different shapes. This function greatly reduce the redundancy of our codes and make it much more readible and shorter.

However, there are a few rules (and steps) for this to works. (You can also refer to notes if it's unclear)
1. Dimensions are matched from the last dimension to the first dimension (according to attribute ```shape```). If one of the array has less dimension, it is broadcastable as long as the array with lower dimensions can match all its shapes to the larger arrays.
2. During matching, the dimension could be match if and only if,
   1. Both of them are equal, or
   2. One of the is equal to 1
3. The content is copied across that specific dimension when the array is broadcasted.

In [110]:
a = np.ones((2, 4, 3, 3))
b = np.ones((4, 3, 3))

_ = a + b  # Ok! Rule 1, b can match all if its dimensions to a from right to left

c = np.ones((4, 3, 1))
_ = a + c # Ok! Rule 2.2, dimension can be matched if they're equal or one of them is 1

d = np.ones((2, 4, 3))
_ = a + d # Error ! Rule 1, dimension is matched from the right to left

e = np.ones((2, 2, 3, 3))
_ = a + e # Error! Rule 2, at the second dimension, the dimension is neither matched nor neither is 1

ValueError: operands could not be broadcast together with shapes (2,4,3,3) (2,2,3,3) 

In [2]:
import numpy as np
x = np.arange(12).reshape(3, 4)
offset = np.array([1, 0, 0, 1])
offset2 = np.array([1, 0, 1])

print(x + offset) # offset is copied to all the elements of x
print(x + offset2.reshape(3, 1)) # this line adds a new axis so that the shape of offset2 becomes (3, 1)
print(x + offset2[:, np.newaxis]) # Serve the same purpose than the previous line, but not neccessarily the same all the time. Please read the docs if you're interested
print(offset2.reshape(3, -1).shape)
print(x + offset2)


[[ 1  1  2  4]
 [ 5  5  6  8]
 [ 9  9 10 12]]
[[ 1  2  3  4]
 [ 4  5  6  7]
 [ 9 10 11 12]]
[[ 1  2  3  4]
 [ 4  5  6  7]
 [ 9 10 11 12]]
(3, 1)


ValueError: operands could not be broadcast together with shapes (3,4) (3,) 

# Conclusion

Numpy provides a wide variety functions to enable efficient scientific researches and engineering calculations. It keeps the simplicity of Python while allowing users to do complex computation tasks. It is a very important and useful tools if you want to dive deeper into the field of machine learnings. Even though you might use other libraries in the future, many of the functions in Numpy is transferable.
