This jupyter notebook is prepared by [Chun-Kit Yeung](https://ckyeungac.com).

# Tutorial 2
In the last tutorial, we introduce the `list` data type in Python. If you remember, `list` can store heterogeneous types of data in the same list for conveniency. This comes with a cost, an expensive cost in computation. That is the reason why Python has long been blamed for the inefficiency in computation compared with C++ or Java in which data type has to be specified during instantiation. Even though Python is not a high-performance programming language, people love it (including me) because it provides a rapid cycle in development. For example, 20 lines of C++ code could be done in just 1 line of Python code. 

So as to enabling fast computation in Python, there are many libraries (or called packages if we wanna be more pythonic) devised. `Numpy` is one of them to empower fast computation in list object. This package is widely used in many machine learning or deep learning libraries, such as scikit-learn, tensorflow and mxnet. Thus, in this tutorial, I would like to introduce the Numpy package and show you the basic usage of it.

First of all, let's import the package. These are the ways to import the package in python.
```python
import xxx
import xxx as x
from xxx import yyy
from xxx import yyy as y
```

In [1]:
import numpy as np # we import the numpy package and name it as np

## Start playing with numpy

In [2]:
a = [1,2,3]
print(a, type(a))

[1, 2, 3] <class 'list'>


In [3]:
b = np.array(a)
print(b, type(b))

[1 2 3] <class 'numpy.ndarray'>


### Useful attributes of `ndarray`

In [4]:
# Let's create a rank-2 array
# rank-2 means it is a two-dimensional array
a = np.array(
    [[1,2,3,4], 
     [2,3,4,5],
     [3,4,5,6]]
)
print(a)

[[1 2 3 4]
 [2 3 4 5]
 [3 4 5 6]]


In [5]:
# the dimensions of the array
print('a.shape:', a.shape)

# the number of axes (dimensions) of the array
print('a.ndim:', a.ndim)

# an object describing the type of the elements in the array
print('a.dtype:', a.dtype)

# the total number of elements of the array.
print('a.size:', a.size) 

a.shape: (3, 4)
a.ndim: 2
a.dtype: int64
a.size: 12


### Creating ndarray

In [6]:
# Create a rank-2 array with shape (2, 3) filled up with 0.
a = np.zeros(shape=(2,3))
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[[ 0.  0.  0.]
 [ 0.  0.  0.]]
shape: (2, 3)
dtype: float64


In [7]:
# Create a rank-3 array filled up with 1 and integer type.
a = np.ones(shape=(2,3,4), dtype=np.int16)
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[[[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]

 [[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]]
shape: (2, 3, 4)
dtype: int16


In [8]:
# Create a rank-3 array filled up with 2.8.
a = np.full(shape=(2,3,4), fill_value=2.8)
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[[[ 2.8  2.8  2.8  2.8]
  [ 2.8  2.8  2.8  2.8]
  [ 2.8  2.8  2.8  2.8]]

 [[ 2.8  2.8  2.8  2.8]
  [ 2.8  2.8  2.8  2.8]
  [ 2.8  2.8  2.8  2.8]]]
shape: (2, 3, 4)
dtype: float64


In [9]:
# Create a rank-2 identity matrix
a = np.eye(3)
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]
shape: (3, 3)
dtype: float64


In [10]:
# create a rank-2 array with random values
a = np.random.random((2,3))
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[[ 0.06767046  0.25211727  0.33678748]
 [ 0.42879528  0.49077682  0.57845785]]
shape: (2, 3)
dtype: float64


In [11]:
# create a list same as range()
a = np.arange(100)
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
shape: (100,)
dtype: int64


In [12]:
# reshape the array from rank-1 to rank-2
a = np.arange(100).reshape(5,20)
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]]
shape: (5, 20)
dtype: int64


In [13]:
# it accepts float arguments
a = np.arange( 0, 2, 0.3 )
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[ 0.   0.3  0.6  0.9  1.2  1.5  1.8]
shape: (7,)
dtype: float64


When `arange` is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite floating point precision. For this reason, it is usually better to use the function `linspace` that receives as an argument the number of elements that we want, instead of the step:

In [14]:
# create 9 numbers from 0 to 2 with equal spacing
a = np.linspace( 0, 2, 9 )
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[ 0.    0.25  0.5   0.75  1.    1.25  1.5   1.75  2.  ]
shape: (9,)
dtype: float64


In [15]:
# create 100 numbers from 0 to pi with equal spacing
a = np.linspace( 0, np.pi, 100 )
print(a)
print('shape:', a.shape)
print('dtype:', a.dtype)

[ 0.          0.03173326  0.06346652  0.09519978  0.12693304  0.1586663
  0.19039955  0.22213281  0.25386607  0.28559933  0.31733259  0.34906585
  0.38079911  0.41253237  0.44426563  0.47599889  0.50773215  0.53946541
  0.57119866  0.60293192  0.63466518  0.66639844  0.6981317   0.72986496
  0.76159822  0.79333148  0.82506474  0.856798    0.88853126  0.92026451
  0.95199777  0.98373103  1.01546429  1.04719755  1.07893081  1.11066407
  1.14239733  1.17413059  1.20586385  1.23759711  1.26933037  1.30106362
  1.33279688  1.36453014  1.3962634   1.42799666  1.45972992  1.49146318
  1.52319644  1.5549297   1.58666296  1.61839622  1.65012947  1.68186273
  1.71359599  1.74532925  1.77706251  1.80879577  1.84052903  1.87226229
  1.90399555  1.93572881  1.96746207  1.99919533  2.03092858  2.06266184
  2.0943951   2.12612836  2.15786162  2.18959488  2.22132814  2.2530614
  2.28479466  2.31652792  2.34826118  2.37999443  2.41172769  2.44346095
  2.47519421  2.50692747  2.53866073  2.57039399  2.6

### Array indexing
Numpy offers several ways to index into arrays.

#### Integer indexing

In [16]:
a = np.arange(100).reshape(5,20)
print(a)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]]


In [17]:
print('a[0]:', a[0])
print('a[-1]:', a[-1])

a[0]: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
a[-1]: [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]


In [18]:
print('a[0,0]:', a[0,0])
print('a[1,2]:', a[1,2])
print('a[-1,14]:', a[-1,14])

a[0,0]: 0
a[1,2]: 22
a[-1,14]: 94


In [19]:
# assign a value 
a[-1,-1] = 1234
a[0,0] = 4321
print(a)

[[4321    1    2    3    4    5    6    7    8    9   10   11   12   13
    14   15   16   17   18   19]
 [  20   21   22   23   24   25   26   27   28   29   30   31   32   33
    34   35   36   37   38   39]
 [  40   41   42   43   44   45   46   47   48   49   50   51   52   53
    54   55   56   57   58   59]
 [  60   61   62   63   64   65   66   67   68   69   70   71   72   73
    74   75   76   77   78   79]
 [  80   81   82   83   84   85   86   87   88   89   90   91   92   93
    94   95   96   97   98 1234]]


#### Slicing

Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array.

In [20]:
a = np.arange(100).reshape(5,20)
print(a)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]]


In [21]:
print('a[:2, 1:3]:\n', a[:3, 1:10])
print('shape:', a[:3, 1:10].shape)

a[:2, 1:3]:
 [[ 1  2  3  4  5  6  7  8  9]
 [21 22 23 24 25 26 27 28 29]
 [41 42 43 44 45 46 47 48 49]]
shape: (3, 9)


In [22]:
# reminder: a[start_ix:end_ix:step_size]
# e.g. a[1:4:2], start from index 1, end up to index 4, with step size 2

print('a[::2, 1:10]:\n', a[::3, 1:10])
print('shape:', a[::3, 1:10].shape)

a[::2, 1:10]:
 [[ 1  2  3  4  5  6  7  8  9]
 [61 62 63 64 65 66 67 68 69]]
shape: (2, 9)


In [23]:
# assign a value 
a[0,:] = -1
print(a)

[[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]]


In [24]:
# assign a value 
a[:,0] = -2
print(a)

[[-2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-2 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
 [-2 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
 [-2 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
 [-2 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]]


#### Bool indexing

Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. 

In [25]:
a = np.arange(100).reshape(5,20)
print(a)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]]


In [26]:
# select the array element that is even
target_idx = (a % 2 == 0)
print(target_idx)

[[ True False  True False  True False  True False  True False  True False
   True False  True False  True False  True False]
 [ True False  True False  True False  True False  True False  True False
   True False  True False  True False  True False]
 [ True False  True False  True False  True False  True False  True False
   True False  True False  True False  True False]
 [ True False  True False  True False  True False  True False  True False
   True False  True False  True False  True False]
 [ True False  True False  True False  True False  True False  True False
   True False  True False  True False  True False]]


In [27]:
print(a[target_idx])

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]


In [28]:
# we can also combine them together
print(a[a % 2 == 0])

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]


### Array math
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

In [29]:
a = np.arange(1,5)
b = np.arange(11,15)
print('a:', a)
print('b:', b)

a: [1 2 3 4]
b: [11 12 13 14]


In [30]:
print('a+b:', a+b)
print('a-b:', a-b)
print('a*b:', a*b)
print('a/b:', a/b)
print('a**b:', a**b)

a+b: [12 14 16 18]
a-b: [-10 -10 -10 -10]
a*b: [11 24 39 56]
a/b: [ 0.09090909  0.16666667  0.23076923  0.28571429]
a**b: [        1      4096   1594323 268435456]


In [31]:
# creating 2x2 matrix
A = np.array(
    [[1,2],
     [3,4]]
)
B = np.array(
    [[5,6],
     [7,8]]
)

print('A:\n', A)
print('B:\n', B)

A:
 [[1 2]
 [3 4]]
B:
 [[5 6]
 [7 8]]


In [32]:
A.T # transponse

array([[1, 3],
       [2, 4]])

In [33]:
A*B # elementwise product

array([[ 5, 12],
       [21, 32]])

In [34]:
A.dot(B) # matrix product

array([[19, 22],
       [43, 50]])

In [35]:
np.dot(A, B) # another matrix product

array([[19, 22],
       [43, 50]])

In [36]:
# sum the element in np array
print('A:\n', A)
print('sum:',np.sum(A))

A:
 [[1 2]
 [3 4]]
sum: 10


In [37]:
print('A:\n', A)
print()
print(np.sum(A, axis=0)) # sum A along the first axis (i.e. column)

A:
 [[1 2]
 [3 4]]

[4 6]


In [38]:
print('A:\n', A)
print()
# sum A along the second axis (i.e. row)
print(np.sum(A, axis=1)) 

A:
 [[1 2]
 [3 4]]

[3 7]


### Broadcasting
Broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

In [39]:
a = np.arange(10).reshape(2,5)
print(a)

[[0 1 2 3 4]
 [5 6 7 8 9]]


In [40]:
10*a

array([[ 0, 10, 20, 30, 40],
       [50, 60, 70, 80, 90]])

It is broadcasting :)

10 is just a single integer. It is boardcast to an array with shape (2,5) that filled with 10.
```
[[10, 10, 10, 10, 10],
 [10, 10, 10, 10, 10]]
```
Then, the elementwise operation is applied.

In [41]:
a = np.arange(100).reshape(25,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31],
       [32, 33, 34, 35],
       [36, 37, 38, 39],
       [40, 41, 42, 43],
       [44, 45, 46, 47],
       [48, 49, 50, 51],
       [52, 53, 54, 55],
       [56, 57, 58, 59],
       [60, 61, 62, 63],
       [64, 65, 66, 67],
       [68, 69, 70, 71],
       [72, 73, 74, 75],
       [76, 77, 78, 79],
       [80, 81, 82, 83],
       [84, 85, 86, 87],
       [88, 89, 90, 91],
       [92, 93, 94, 95],
       [96, 97, 98, 99]])

In [42]:
b = np.array([4,2,1,1])
b

array([4, 2, 1, 1])

In [43]:
a+b

array([[  4,   3,   3,   4],
       [  8,   7,   7,   8],
       [ 12,  11,  11,  12],
       [ 16,  15,  15,  16],
       [ 20,  19,  19,  20],
       [ 24,  23,  23,  24],
       [ 28,  27,  27,  28],
       [ 32,  31,  31,  32],
       [ 36,  35,  35,  36],
       [ 40,  39,  39,  40],
       [ 44,  43,  43,  44],
       [ 48,  47,  47,  48],
       [ 52,  51,  51,  52],
       [ 56,  55,  55,  56],
       [ 60,  59,  59,  60],
       [ 64,  63,  63,  64],
       [ 68,  67,  67,  68],
       [ 72,  71,  71,  72],
       [ 76,  75,  75,  76],
       [ 80,  79,  79,  80],
       [ 84,  83,  83,  84],
       [ 88,  87,  87,  88],
       [ 92,  91,  91,  92],
       [ 96,  95,  95,  96],
       [100,  99,  99, 100]])

### Boardcasting occurs only when two array are compatible

Compatible means the dimensions of the arrays are
    1. the same
    2. one of them is 1
    
For example, the following can be boardcast
```python
A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5
```

The following cannot be boardcast
```python
A      (1d array):  3
B      (1d array):  4 # trailing dimensions do not match

A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched
```

In [44]:
a = np.arange(2*3*4).reshape((2,3,4))
b = np.random.randint(low=0, high=2, size=(2,1,4)) # create a random array with 0 and 1 of shape (2,1,4)
print('a:\n{} \n\nb:\n{}'.format(a,b))

a:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]] 

b:
[[[0 1 0 1]]

 [[1 0 0 1]]]


In [45]:
a * b

array([[[ 0,  1,  0,  3],
        [ 0,  5,  0,  7],
        [ 0,  9,  0, 11]],

       [[12,  0,  0, 15],
        [16,  0,  0, 19],
        [20,  0,  0, 23]]])