This notebook is a modified version of "04_scipython_notes-Copy1.ipynb" by Sebastian Raschka, which 
was released under a Creative Commons Attribution-Non-Commercial license.

# NumPy Tutorial - Working with Numerical Arrays

### Introduction to NumPy

NumPy, which is short for Numerical Python (created by Travis Oliphant in 2005), is one of the main packages for carrying out scientific calculations using Python. For this class, NumPy is important because `scikit-learn`, the machine learning package we will be using, uses the NumPy array as the fundamental data structure. This tutorial offers a quick introduction to NumPy. 

NumPy is attractive to the scientific community because it provides a convenient Python interface for working with multi-dimensional array data structures efficiently; the NumPy array data structure is also called `ndarray`, which is short for *n*-dimensional array. For our purposes, we will mostly be working with 1- and 2-dimensional arrays.

Besides being more efficient for numerical computations than native Python code, NumPy can also be more elegant and readable due to vectorized operations and broadcasting. 

### N-dimensional Arrays

NumPy is built around [`ndarrays`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) objects, which are high-performance multi-dimensional array data structures. Intuitively, we can think of a one-dimensional NumPy array as a data structure to represent a vector of elements -- you may think of it as a fixed-size Python list where all elements share the same type (e.g., `[1, 2, 3]` or `["a", "b", "c"]`). Similarly, we can think of a two-dimensional array as a data structure to represent a matrix or a Python list of lists. 

Whenever we want to use NumPy we must first import it. The standard way to do this is as follows:

In [None]:
import numpy as np

Using `import` here is similar to how we used `library()` in R. You only install the NumPy package on your 
computer once, and then, every time you want to use it in a script file (`*.py`) or a Jupyter notebook (`*.ipynb`) 
you use the `import` statement as above. 

Now, let us get started with NumPy by calling the `array` function to create a 1-dimensional NumPy array from a list:

In [None]:
lst_1d = [1, 2, 3]
ary1d = np.array(lst_1d)
ary1d

We can just as easily create a 2-dimensional array:

In [None]:
lst = [[1, 2, 3], [4, 5, 6]]
ary2d = np.array(lst)
ary2d

If we are interested in the number of elements along each array dimension (in the context of NumPy arrays, we may also refer to them as *axes*), we can access the `shape` attribute as shown below:

In [None]:
ary2d.shape

The `shape` is always a tuple; in the code example above, the two-dimensional `ary2d` object has two *rows* and *three* columns, `(2, 3)`, if we think of it as a regular matrix.

Similarly, the `shape` of the one-dimensional array only contains a single value:

In [None]:
np.array([1, 2, 3]).shape

### Array Indexing

In this section, we will go over the basics of retrieving NumPy array elements via different indexing methods. Simple NumPy indexing and slicing works similar to Python lists, which we will demonstrate in the following code snippet, where we retrieve the first element of a one-dimensional array. Remember that Python starts indexing from 0!

In [None]:
ary = np.array([1, 2, 3])
print("The first element in this 1-dimensional array is:", ary[0])

In [None]:
print("The last element in this 1-dimensional array is:", ary[-1])

Also, the same Python semantics apply to slicing operations. The following example shows how to fetch the first two elements in `ary`:

In [None]:
ary[:2] # equivalent to ary[0:2] which is similar to ary[ [0,1] ] 

If we work with arrays that have more than one dimension or axis, we separate our indexing or slicing operations by commas as shown in the series of examples below:

In [None]:
ary = np.array([[1, 2, 3],
                [4, 5, 6]])
ary

In [None]:
ary[0, 0] # upper left element

In [None]:
ary[1, 0] # second row, first column

In [None]:
ary[0] # entire first row

In [None]:
ary[:, 0] # entire first column - note that this returns a row vector and not a column vector

In [None]:
ary[:, :2] # first two columns

In [None]:
ary[-1, -1] # lower right element: a -1 selects the last element in a list

In [None]:
print(ary)  # our original array
print()
print(ary.T) # the transpose of an array

### Reshaping Arrays

In practice, we often run into situations where existing arrays do not have the *right* shape to perform certain computations. As you might remember from the beginning of this lecture, the size of NumPy arrays is fixed. Fortunately, this does not mean that we have to create new arrays and copy values from the old array to the new one if we want arrays of different shapes -- the size is fixed, but the shape is not. NumPy provides a `reshape` method that allow us to obtain a view of an array with a different shape. 

For example, we can reshape a one-dimensional array into a two-dimensional one using `reshape` as follows:

In [None]:
ary1d = np.array([1, 2, 3, 4, 5, 6])
ary1d

In [None]:
ary2d_view = ary1d.reshape(2, 3)
ary2d_view

We need to make sure that the reshaped array has the same number of elements as the original one. However, we do not need to specify the number elements in each axis; NumPy is smart enough to figure out how many elements to put along an axis if only one axis is unspecified (by using the placeholder `-1`). So, the following code will give us the same result as before:

In [None]:
ary1d.reshape(2, -1)

In [None]:
ary1d.reshape(-1, 3)

We can, of course, also use `reshape` to flatten an array:

In [None]:
ary = np.array([[[1, 2, 3],
                [4, 5, 6]]])

ary.reshape(-1)

## Why use NumPy?

In the following two code cells, we compare multiplying two matrices, first using NumPy and then using nested `for` loops. You should see a significant time difference. Many machine learning applications require repeated mathematical operations over arrays, so this difference in speed is one of the main reasons to use the NumPy package, as this is what it is optimized for. 

In [None]:
# Multiplying two matrices using NumPy
# code in this cell adapted from https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276
from time import time

# Let's take the randomness out of random numbers (for reproducibility)
np.random.seed(0)

size = 128
A, B = np.random.random((size, size)), np.random.random((size, size))

N = 20 # do the matrix multiplication N times and then take the average to get better estimate of time

t = time() # store the start time

for i in range(N):
    np.dot(A, B)
    
delta = time() - t # duration: end time minus the start time

print('Multiplied two %dx%d matrices in %0.2f s.' % (size, size, delta / N)) # divide 'delta' by 'N' to get average time for one matrix multiplication

In [None]:
# Multiplying two matrices using nested loops
# code in this cell adapted from https://www.programiz.com/python-programming/examples/multiply-matrix

result = np.zeros((size, size))

t = time()

for trial in range(N):
    for i in range(len(A)):
       for j in range(len(B[0])):
           for k in range(len(B)):
               result[i][j] += A[i][k] * B[k][j]

delta = time() - t
print('Multiplied two %dx%d matrices in %0.2f s.' % (size, size, delta/N))

del A, B