# ECBM E4040 2024 Fall Recitations

## Session 1.2 - NumPy Tutorial

## References:
* https://docs.scipy.org/doc/numpy/user/quickstart.html
* https://realpython.com/numpy-tensorflow-performance/
* http://cs231n.github.io/python-numpy-tutorial/

**Numpy is a core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.**

## Array

NumPy’s main object is the homogeneous multidimensional **array**. NumPy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality. The more important attributes of an ndarray object are:
- `ndarray.ndim`

    The number of axes (dimensions) of the array.

- `ndarray.shape`

    The dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with *n* rows and *m* columns, `shape` will be `(n,m)`. The length of the `shape` tuple is therefore the number of axes, `ndim`.

- `ndarray.size`

    The total number of elements of the array. This is equal to the product of the elements of `shape`.

- `ndarray.dtype`

    An object describing the type of the elements in the array. One can create or specify dtype using standard Python types. Additionally NumPy provides types of its own. `numpy.int32`, `numpy.int16`, and `numpy.float64` are some examples.

- `ndarray.itemsize`

    The size in bytes of each element of the array. For example, an array of elements of type `float64` has `itemsize` *8 (=64/8)*, while one of type `complex32` has `itemsize` *4 (=32/8)*. It is equivalent to `ndarray.dtype.itemsize`.

- `ndarray.data`

    The buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

> More information at https://numpy.org/doc/stable/index.html

In [1]:
import numpy as np

### Array Creation

In [2]:
# Basically, in Python we can use nested list(tuple) to represent multidimensional arrays
array_list = [[1,2,3],[4,5,6],[7,8,9]]
print(array_list)
print(type(array_list), type(array_list[0]))

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
<class 'list'> <class 'list'>


In [3]:
# We can initialize numpy arrays from nested Python lists, and access elements using square brackets
array_np = np.array(array_list)
print(array_np)
print(type(array_np))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
<class 'numpy.ndarray'>


In [4]:
# some attributes of ndarray object
print('shape:', array_np.shape)
print('data type:', array_np.dtype)
print('size:', array_np.size)

shape: (3, 3)
data type: int64
size: 9


In [5]:
# A frequent error consists in passing multiple numeric arguments to the array constructor, 
# rather than providing a single list of numbers as an argument.
a = np.array(1,2,3,4)

TypeError: array() takes from 1 to 2 positional arguments but 4 were given

In [6]:
# Numpy also provides many functions to create arrays:
a = np.zeros((2,2))   # Create an array of all zeros with a pre-specified shape
print(a)

b = np.ones((1,2))    # Create an array of all ones a pre-specified shape
print(b)

c = np.ones((3,3), dtype=np.int16)    # dtype can also be specified
print(c)

[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[1 1 1]
 [1 1 1]
 [1 1 1]]


### Array Indexing and Slicing

In [7]:
# One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences. 
a = np.arange(10)**3

print(a[2])
print(a[2:5])
print(a[-1])
print(a[::-1])

8
[ 8 27 64]
729
[729 512 343 216 125  64  27   8   1   0]


The expression within brackets in `b[i]` is treated as an `i` followed by as many instances of `:` as needed to represent the remaining axes. 

NumPy also allows you to write this using dots as `b[i,...]`.

The dots (`...`) represent as many colons as needed to produce a complete indexing tuple. 
For example, if `x` is an array with 5 axes, then:

- `x[1,2,...]` is equivalent to `x[1,2,:,:,:]`
- `x[...,3]` is equivalent to `x[:,:,:,:,3]`
- `x[4,...,5,:]` is equivalent to `x[4,:,:,5,:]`

In [8]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2)
b = a[:2, 1:3]
print(b)

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])
b[0, 0] = 66    # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])

[[2 3]
 [6 7]]
2
66


In [9]:
# You can also mix integer indexing with slice indexing. 
# However, doing so will yield an array of smaller number of dimensions than the original array.
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

row_r1 = a[1, :]    # view of the second row of a with dim=1
row_r2 = a[1:2, :]  # view of the second row of a with dim=2
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)

col_r1 = a[:, 3]
col_r2 = a[:, 3:4]
print(col_r1, col_r1.shape)
print(col_r2, col_r2.shape)

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[ 4  8 12] (3,)
[[ 4]
 [ 8]
 [12]] (3, 1)


In [11]:
# When you index into numpy arrays using slicing, 
# the resulting array view will always be a subarray of the original array.
# [[1 2]
#  [3 4]
#  [5 6]]
a = np.array([[1,2], [3,4], [5,6]])

print(a[[0,1,2], [0,1,0]])
print(np.array([a[0,0], a[1,1], a[2,0]]))

[1 4 5]
[1 4 5]


### Array Math

In [12]:
# Arithmetic operators on arrays apply elementwise.
a = np.array([20,30,40,50])
b = np.arange(4) # [0, 1, 2, 3]

print(a-b)
print(b**2)

[20 29 38 47]
[0 1 4 9]


In [13]:
# Unlike many matrix languages (like Matlab), the product operator * operates elementwise in NumPy arrays. 
# The matrix product can be performed using the @ operator (in python >=3.5) or the matmul function:
A = np.array([[1,1], [0,1]])
B = np.array([[2,0], [3,4]])

print(A*B)
print(A@B)
print(np.matmul(A, B))

[[2 0]
 [0 4]]
[[5 4]
 [3 4]]
[[5 4]
 [3 4]]


**Note:**
- Some resources may recommand using [`np.dot`](https://numpy.org/doc/stable/reference/generated/numpy.dot.html) for matrix multiplication. We, however, discourage the usage of this function since it generally operates on vector-vector, vector-matrix, matrix-matrix and even scalar multiplications. Don't use this function unless you know *exactly* what you're doing.
- When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one (a behavior known as upcasting). However, it would be a good habit to avoid all computations between different data types in general.

### Changing Shape

In [14]:
# An array has a shape given by the number of elements along each axis:
a = np.floor(10*np.random.random((3,4)))
a.shape

(3, 4)

In [15]:
# The shape of an array can be changed with various commands. 
# Note that the following three commands all return a modified array, but DO NOT change the original array:
# returns the array, flattened
print(a.ravel(), a.ravel().shape)
print(a.reshape(6,2))
print(a.T)

[4. 3. 5. 4. 3. 5. 2. 2. 6. 4. 3. 8.] (12,)
[[4. 3.]
 [5. 4.]
 [3. 5.]
 [2. 2.]
 [6. 4.]
 [3. 8.]]
[[4. 3. 6.]
 [3. 5. 4.]
 [5. 2. 3.]
 [4. 2. 8.]]


In [16]:
# The reshape function returns its argument with a modified shape, 
# whereas the ndarray.resize method modifies the array ITSELF:
a.resize((2,6))
print(a)

[[4. 3. 5. 4. 3. 5.]
 [2. 2. 6. 4. 3. 8.]]


In [17]:
# If a dimension is given as -1, the shape along this dimension is automatically calculated:
a.reshape(3,-1)

array([[4., 3., 5., 4.],
       [3., 5., 2., 2.],
       [6., 4., 3., 8.]])

### Stacking and Spliting

In [18]:
# Several arrays can be stacked together along different axes:
a = np.floor(10*np.random.random((2,2)))
b = np.floor(10*np.random.random((2,2)))

print(a)
print(b)
print(np.vstack((a,b)))
print(np.hstack((a,b)))

[[1. 2.]
 [3. 6.]]
[[9. 6.]
 [2. 6.]]
[[1. 2.]
 [3. 6.]
 [9. 6.]
 [2. 6.]]
[[1. 2. 9. 6.]
 [3. 6. 2. 6.]]


In [18]:
# Using hsplit, you can split an array along its horizontal axis, 
# either by specifying the number of equally shaped arrays to return, 
# or by specifying the columns after which the division should occur:
a = np.floor(10*np.random.random((2,12)))
print(a)

# Split a into 3
print(np.hsplit(a,3))

# Split a after the third and the fourth column
print(np.hsplit(a,(3,4)))

[[8. 5. 8. 2. 7. 0. 7. 8. 0. 1. 3. 9.]
 [9. 6. 3. 4. 9. 6. 3. 6. 5. 6. 4. 4.]]
[array([[8., 5., 8., 2.],
       [9., 6., 3., 4.]]), array([[7., 0., 7., 8.],
       [9., 6., 3., 6.]]), array([[0., 1., 3., 9.],
       [5., 6., 4., 4.]])]
[array([[8., 5., 8.],
       [9., 6., 3.]]), array([[2.],
       [4.]]), array([[7., 0., 7., 8., 0., 1., 3., 9.],
       [9., 6., 3., 6., 5., 6., 4., 4.]])]


## Copies and views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:

### No Copy

In [27]:
# Simple assignments make no copy of array objects or of their data.
a = np.arange(12)
b = a    # no new object is created
print(a)
print(b)

b[0] = 100
print(b)
print(a)

b.shape = 3,4
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
[ 0  1  2  3  4  5  6  7  8  9 10 11]
[100   1   2   3   4   5   6   7   8   9  10  11]
[100   1   2   3   4   5   6   7   8   9  10  11]
[[100   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]


### Shallow Copy - `view`

In [28]:
# Different array objects can share the same data.
# The view method creates a new array object that looks at the same data
c = a.view()

print(c is a) # the key word `is` compares the address of the object
print(c.base is a)

# shape is a property exclusive of each view object
# so the same patch of data can be viewed as different shapes
c.shape = 2,6
print(c)
print(a.shape)

# but the data themselves are shared between objects
c[0,0] = 999
print(a)

False
True
[[100   1   2   3   4   5]
 [  6   7   8   9  10  11]]
(3, 4)
[[999   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]


In [29]:
# Slicing an array returns a view of it:
s = a[:,1:3]
s[:] = 10    # s[:] is a view of s.
print(a)

[[999  10  10   3]
 [  4  10  10   7]
 [  8  10  10  11]]


### Deep Copy

In [30]:
# The copy method makes a complete copy of the array and its data.
d = a.copy()
print(d is a)
print(d.base is a)

d[0,0] = 0
print(d)
print(a)

False
False
[[ 0 10 10  3]
 [ 4 10 10  7]
 [ 8 10 10 11]]
[[999  10  10   3]
 [  4  10  10   7]
 [  8  10  10  11]]


## Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

Say we wish to add a vector `v` to each row of a matrix `x`. Compare the following 3 algorithms:

In [37]:
import time
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

In [49]:
# Add the vector v to each row of the matrix x with an explicit loop
start = time.time_ns()
for i in range(4):
    y[i, :] = x[i, :] + v
elapsed = time.time_ns() - start
print(y)
print(elapsed / 1000)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]
357.0


In [50]:
# using matrix
start = time.time_ns()
vv = np.tile(v, (4, 1))   # Stack 4 copies of v on top of each other
y = x + vv  # Add x and vv elementwise
elapsed = time.time_ns() - start
print(y)
print(elapsed / 1000)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]
172.0


In [51]:
start = time.time_ns()
y = x + v  # Add v to each row of x using broadcasting
elapsed = time.time_ns() - start
print(y)
print(elapsed / 1000)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]
135.0


Broadcasting two arrays together follows these **rules**:

- If the arrays do not have the same number of dimensions, wrap the lower dimensional array with a new array until both shapes have the same length (`[1, 2] -> [[1, 2]] -> [[[1, 2]]] -> ...`).
- The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension (all three shapes are compatible: `a.shape = (3,4), b.shape=(3,4), c.shape=(1,4)`).
- The arrays can be broadcasted together if they are compatible in ***all*** dimensions, or can be compatible in all dimensions after wrapping.
- In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension (`[1] + [[2],[3]] -> [[1]] + [[2],[3]] -> [[1],[1]] + [[2],[3]]`).

> More information: https://numpy.org/doc/stable/user/basics.broadcasting.html

There are much more about NumPy that we could not cover in this short recitation. You are welcome to explore by yourselves.