# NumPy Basics

For this course, we will be using NumPy extensively for coding components. This tutorial intends to give everyone a brief overview of what we can do with NumPy.

NumPy is a numerical computing library for Python. "Nearly every scientist working in Python draws on the power of NumPy". It is widely used in research and the industry where machine learning, data processing, data analytics, etc. are required.

Reference: https://numpy.org/doc/1.19/

**NOTE:** Make sure you have the environment setup correctly. Please refer to the course website: https://www.cs.toronto.edu/~fleet/courses/C11/index.html.

In [1]:
import numpy as np

## N-D Arrays
NumPy's main object is the homogeneous multidimensional array. In other words, it is a "table" consisting elements of same type (usually numbers) that can be indexed by integers. In NumPy, dimensions are known as axes. For this course, you will mostly deal with dimensions up to 3.

In [2]:
a = np.array([[1, 2, 3],
              [2, 3, 4]])

print(a)

# This checks the type of the array
print(a.dtype)

# This checks the dimensions (shape) of the array
print(a.shape)

# This checks the number of dimensions
print(a.ndim)

# This checks the total number of entries in the array
print(a.size)

[[1 2 3]
 [2 3 4]]
int64
(2, 3)
2
6


## Array Creation

In [3]:
# Create zero array
zero_array = np.zeros(shape=(5, 2))

# Create one array
one_array = np.ones(shape=(5, 2))

# Creating arrays filled with specific values
fill_array = np.full(shape=(5, 2), fill_value=10, dtype=np.float)
fill_array = fill_array.shape[1]

# Create Identity matrix
identity = np.eye(10)

# Create an array from 1 to 10
seq = np.arange(1, 11)

# Create an array from 0 to 1 spaced by 0.1
seq_2 = np.arange(0, 1.1, 0.1)

print(zero_array)
print(one_array)
print(fill_array)
print(identity)
print(seq)
print(seq_2)

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
2
[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
[ 1  2  3  4  5  6  7  8  9 10]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


## Shape Manipulation and Indexing
### 2D arrays

In [4]:
seq = np.arange(1, 13)
print(seq)

# Reshape to a 2D array of shape (4, 3)
# Note that the new shape needs to match the number of elements in the array
reshaped_seq = seq.reshape((4, 3))
print(reshaped_seq)

# Transpose a 2D array
reshaped_seq = reshaped_seq.T
print(reshaped_seq)

# Take first row, second column of reshaped_seq
print(reshaped_seq[0, 1])

# Take first row
print(reshaped_seq[0])

# Take third column
print(reshaped_seq[:, 2])

# Take columns 0, 2, 3 of both rows
print(reshaped_seq[:, [0, 2, 3]])

# You can take elements based on conditions (advance indexing)
# Take elements greater than 4
print(reshaped_seq[reshaped_seq > 4])

[ 1  2  3  4  5  6  7  8  9 10 11 12]
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 1  4  7 10]
 [ 2  5  8 11]
 [ 3  6  9 12]]
4
[ 1  4  7 10]
[7 8 9]
[[ 1  7 10]
 [ 2  8 11]
 [ 3  9 12]]
[ 7 10  5  8 11  6  9 12]


### Higher Dimensional Arrays
It is important to get comfortable with multidimensional arrays with dimension > 2.
In this case, we will refer each dimension using index notation
```
(d_0, d_1, d_2, ..., d_n)
```

Please make sure you understand how these work! You will be able to write elegant and efficient code with this!

In [5]:
# Reshape to a 3D array of shape (2, 2, 3)
reshaped_seq = seq.reshape((2, 2, 3))
print(reshaped_seq)

# Transpose an array. You can consider this as swapping axes
# Swap first and last axes
reshaped_seq = reshaped_seq.transpose(2, 1, 0)
print(reshaped_seq)

# Take first d_0
print(reshaped_seq[0])

# Take first d_1
print(reshaped_seq[:, 0])

# Take first d_2
print(reshaped_seq[..., 0])

# Take third d_0 and second d_2
print(reshaped_seq[2, :, 1])

# You can take elements based on conditions (advance indexing)
# Take elements greater than 4
print(reshaped_seq[reshaped_seq > 4])

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
[[[ 1  7]
  [ 4 10]]

 [[ 2  8]
  [ 5 11]]

 [[ 3  9]
  [ 6 12]]]
[[ 1  7]
 [ 4 10]]
[[1 7]
 [2 8]
 [3 9]]
[[1 4]
 [2 5]
 [3 6]]
[ 9 12]
[ 7 10  8  5 11  9  6 12]


## Math Operations

In [50]:
a = np.arange(0, 10).reshape(5, 2)
b = np.arange(0, 1, 0.1).reshape(5, 2)

# Element-wise addition
print(a + b)

# Element-wise multiplication
print(a * b)

# Element-wise division
print(a / b)

# Element-wise exponential
print(a ** b)

# Matrix multiplication
print(a @ b.T)
print(np.matmul(a, b.T))

[[0.  1.1]
 [2.2 3.3]
 [4.4 5.5]
 [6.6 7.7]
 [8.8 9.9]]
[[0.  0.1]
 [0.4 0.9]
 [1.6 2.5]
 [3.6 4.9]
 [6.4 8.1]]
[[nan 10.]
 [10. 10.]
 [10. 10.]
 [10. 10.]
 [10. 10.]]
[[1.         1.        ]
 [1.14869835 1.39038917]
 [1.74110113 2.23606798]
 [2.93015605 3.90452878]
 [5.27803164 7.22467406]]
[[ 0.1  0.3  0.5  0.7  0.9]
 [ 0.3  1.3  2.3  3.3  4.3]
 [ 0.5  2.3  4.1  5.9  7.7]
 [ 0.7  3.3  5.9  8.5 11.1]
 [ 0.9  4.3  7.7 11.1 14.5]]
[[ 0.1  0.3  0.5  0.7  0.9]
 [ 0.3  1.3  2.3  3.3  4.3]
 [ 0.5  2.3  4.1  5.9  7.7]
 [ 0.7  3.3  5.9  8.5 11.1]
 [ 0.9  4.3  7.7 11.1 14.5]]


  # This is added back by InteractiveShellApp.init_path()


### Higher Dimensional Arrays
Most operations are the same in higher dimensional arrays. But you might be curious about how we can perform matrix multiplcation (if we can).

In [6]:
a = np.arange(0, 24).reshape(4, 2, 3)
b = np.arange(0, 2.4, 0.1).reshape(4, 2, 3)

# You can perform batch matrix multiplcation
# This treats the 3D arrays as d_0 2D arrays, and perform matrix multiplication for each i = 1, ..., d_0
print(a @ b.transpose(0, 2, 1))
print(np.matmul(a, b.transpose(0, 2, 1)))

# What if we want to do the same element-wise multiplication on the last two axes of a 3D array given a 2D array?
# There is a notion of broadcasting. See here: https://numpy.org/doc/1.19/user/basics.broadcasting.html
c = np.arange(0, 6).reshape(2, 3)
print(a * c)

[[[  0.5   1.4]
  [  1.4   5. ]]

 [[ 14.9  21.2]
  [ 21.2  30.2]]

 [[ 50.9  62.6]
  [ 62.6  77. ]]

 [[108.5 125.6]
  [125.6 145.4]]]
[[[  0.5   1.4]
  [  1.4   5. ]]

 [[ 14.9  21.2]
  [ 21.2  30.2]]

 [[ 50.9  62.6]
  [ 62.6  77. ]]

 [[108.5 125.6]
  [125.6 145.4]]]
[[[  0   1   4]
  [  9  16  25]]

 [[  0   7  16]
  [ 27  40  55]]

 [[  0  13  28]
  [ 45  64  85]]

 [[  0  19  40]
  [ 63  88 115]]]


## Some Other Dirty Tricks?

In [7]:
a = np.arange(10)
b = np.arange(20).reshape(2, 10, 1)

# Say you want to do matrix multiplcation between b and a,
# but clearly a is an axis short. We can "expand dimension" quickly
#print(b @ a.reshape((1, 10)))
#print(b @ a[None, ...])

# If you want to apply a vector of exponents onto a single scalar
a = 10
b = np.arange(3)
print(b)
print(a ** b)

# Create one-hot encoding
# Say you want a one-hot encoding such that vector is in R^5, and the fourth element is 1
print(np.eye(5)[3])

[0 1 2]
[  1  10 100]
[0. 0. 0. 1. 0.]


## Common types
Depending on the situation, you may prefer one type over the other. Here is the list of most commonly used types:
```
np.uint8
np.int32
np.int64
np.float32
np.float64
np.bool
```

Understanding which type to use based on the context is important as you deal with large datasets. Imagine a 84x84 RGB image (i.e. `shape=(84, 84, 3)`). Storing 10000 images in `np.float64` will take up approximately 1.577GB of memory, whereas `np.uint8` only takes up approximately 197MB.

In [8]:
# You can enforce the type of a numpy array
a = np.array([[1, 2, 3],
              [2, 3, 4]],
             np.float32)

print(a)
print(a.dtype)

[[1. 2. 3.]
 [2. 3. 4.]]
float32


## NaN and Inf
For different scenarios, you may encounter `nan` (Not a Number) and `inf` (Infinity) values.
`nan` usually occurs when your function input is not part of the function domain (e.g. log of a negative value).
`inf` usually occurs when your function computes a number that is too large (e.g. e^x).

In [8]:
print(np.log(-1))
print(np.array(0.) / np.array(0.))
print(np.exp(1000))

nan
nan
inf
  print(np.log(-1))
  print(np.array(0.) / np.array(0.))
  print(np.exp(1000))


# Final Words
Read [here](https://numpy.org/doc/1.19/user/quickstart.html) for more tutorials! They are short and concise