# NumPy Learning

NumPy is implemented in C using Python as glue. This setup leads NumPy to be very fast for numerical computation. NumPy is built around ndarrays objects, which are high-performance multi-dimensional array data structures. NumPy arrays can have up to 32 dimensions out of the box if not recompiled with modifications.

In [83]:
import numpy as np
lst = [[1, 2, 3], [4, 5, 6]]
ary1d = np.array(lst)
ary1d

array([[1, 2, 3],
       [4, 5, 6]])

NumPy infers the type of the array upon construction, however if a type needs to be specified the following syntax is useful.

In [84]:
ary2d = np.array([[1, 2, 3], 
                  [4, 5, 6]], dtype='int64')
ary2d.itemsize # size of one element in bytes

8

In [85]:
# Number of elements in an array
ary2d.size # the flat size of the array (2 sets of 3)

6

In [86]:
# Number of dimensions
ary2d.ndim

2

In [87]:
# Shape of the array (essentially a matrix when appicable)
ary2d.shape # returns as a tuple (number of sets , number of elements)


(2, 3)

In [88]:
# shape of a single dimensional array will contain only 1 value
np.array([1, 2, 3]).shape

(3,)

## Array Construction

The array function works with most iterables in Python, including lists, tuples, and range objects; however, array does not support generator expression. If we want parse generators directly, however, we can use the fromiter function as demonstrated below:

In [89]:
def oddIteratorGenerator():
    for i in range(10):
        if i % 2:
            yield i

gen = oddIteratorGenerator()
np.fromiter(gen, dtype=int) # dtype is required 

array([1, 3, 5, 7, 9])

In [90]:
# shorthand using expressions rather than function 
generator_expression = (i for i in range(10) if i % 2)
np.fromiter(generator_expression, dtype=int)

array([1, 3, 5, 7, 9])

In [91]:
# create ndarrays of all ones or zeros when specifying size
# this will be really useful for lin alg routines
np.ones((3,3)) # param is a tuple (3,3)

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [92]:
np.zeros((3,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [93]:
# another foundational thing in lin alg is the identity matrix
# identity matrix is always n x n
np.eye(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [94]:
# create a diagonal of with values. 
#Adding more elements to the tuple would result in a larger nxn matrix
np.diag((3,4,3))

array([[3, 0, 0],
       [0, 4, 0],
       [0, 0, 3]])

## Array Indexing

To show some of the basics of NumPy array element interaction

In [95]:
ary = np.array([1, 2, 3])
ary[0]

1

slicing an array is similar to slicing lists outside of NumPy. The syntax follows a start to finish separated by colon, all inside square brackets. [start:finish]

In [96]:
ary[:2] # equivilent to ary[0:2]

array([1, 2])

In [97]:
accessing values in a multi dimensional array

SyntaxError: invalid syntax (<ipython-input-97-62f15a5d6488>, line 1)

In [None]:
ary2d = np.array([[1, 2, 3],
                 [4, 5, 6]])
# top left
ary2d[0,0]

In [None]:
# lower right, for when the size is not readily accessible
ary2d[-1,-1] # == ary2d[2,3]

In [None]:
ary2d[0,1] # first row second column

accessing an entire row

In [None]:
ary2d[0]

accessing an entire column

In [None]:
ary2d[:, 0] # get the entire first column

In [None]:
ary2d[:,:2] #access first n columns

## Array Math

In [None]:
ary = np.array([[1,2,3],
              [4,5,6]])

element wise scalar addition. Note that NumPy has functions for addition, subtraction, division, multiplication and exponentiation. NumPy also has overloads for the common mathematical operators +, -, /, *, and **

In [None]:
np.add(ary,1)

In [None]:
ary + 1

In [None]:
ary ** 2

computations over columns, rows,... can be done using a reduce and a numerical operation

In [None]:
# without specifying the axis it will default to column reduction (by addition)
np.add.reduce(ary)

In [None]:
# specifiying the axis as 1... sum the rows
np.add.reduce(ary, axis=1)

In [None]:
# shorthand for reduction... sum, multiply, substract, divide...
ary.sum(axis=0) # sum columns

In [None]:
# not specifying the axis on a general shorthand of sum... will result in the entire array being summed
ary.sum()

other useful functions:
- Other useful unary ufuncs are:
- mean (computes arithmetic average)
- std (computes the standard deviation)
- var (computes variance)
- np.sort (sorts an array)
- np.argsort (returns indices that would sort an array)
- np.min (returns the minimum value of an array)
- np.max (returns the maximum value of an array)
- np.argmin (returns the index of the minimum value)
- np.argmax (returns the index of the maximum value)
- array_equal (checks if two arrays have the same shape and elements)

## Reshaping Arrays

Sometimes there are situations where the array you have is in the wrong format for the use case it is needed, or maybe you want to combine two arrays as a single row, or as multiple rows...

In [None]:
# turn a one dimensional array into a two dimensional array
ary1d = np.array([1, 2, 3, 4, 5, 6])
ary2d_view = ary1d.reshape(2, 3)
ary2d_view

you can specify the length of one axis while letting the other be free form using a -1 as the length of that axis

In [None]:
ary1d.reshape(2, -1)

In [None]:
ary1d.reshape(-1, 2)

flatening an array is just as easy, free form in one axis

In [None]:
ary2d = np.array([[1, 2, 3],
                  [4, 5, 6]])

ary2d.reshape(-1)

NumPy also has a shorthand for flattening an array, revel

In [None]:
ary2d.ravel()

In some cases arrays must be combined, this however is not efficient. NumPy arrays are fixed size and so combining two arrays of different size will create a copy, therefore combining arrays should be avoided for computational efficiency sake. Regardless...!

In [None]:
ary = np.array([1,2,3])
# concatenate requires a tuple of all arrays to be combined
np.concatenate((ary,ary))

In [None]:
# stack along an axis when concatenating
ary2d = np.array([[1, 2, 3]])
np.concatenate((ary2d,ary2d), axis=0)

## Linear Algebra with NumPy Arrays

Most of the operations in machine learning and deep learning are based on concepts from linear algebra. In this section, we will take a look how to perform basic linear algebra operations using NumPy arrays.

There is a special matrix type in NumPy... NumPy matrix objects are analogous to NumPy arrays but are restricted to two dimensions. Also many of the operations on normal arrays have different implications on matrices. However because of the limitation on dimensions the data structure is not commonly used in the machine learning & data science community. Many data sets require more than two dimensions.

Intuitively, we can think of one-dimensional NumPy arrays as data structures that represent row vectors:

In [98]:
row_vector = np.array([1, 2, 3])
row_vector

array([1, 2, 3])

Similarly, we can use two-dimensional arrays to create column vectors:

In [99]:
column_vector = np.array([[1, 2, 3]]).reshape(-1, 1)
column_vector

array([[1],
       [2],
       [3]])

Instead of reshaping a one-dimensional array into a two-dimensional one, we can simply add a new axis as shown below:

In [100]:
row_vector[:, np.newaxis]

array([[1],
       [2],
       [3]])

Note that in this context, np.newaxis behaves like None:

In [101]:
row_vector[:, None]

array([[1],
       [2],
       [3]])

All three approaches listed above, using reshape(-1, 1), np.newaxis, or None yield the same results -- all three approaches create views not copies of the row_vector array.
As we remember from the Linear Algebra appendix, we can think of a column vector as a matrix consisting only of one column. To perform matrix multiplication between matrices, we learned that number of columns of the left matrix must match the number of rows of the matrix to the right. In NumPy, we can perform matrix multiplication via the matmul function:

In [102]:
matrix = np.array([[1, 2, 3], 
                   [4, 5, 6]])
np.matmul(matrix, column_vector)

array([[14],
       [32]])

However, if we are working with matrices and vectors, NumPy can be quite forgiving if the dimensions of matrices and one-dimensional arrays do not match exactly -- thanks to broadcasting. The following example yields the same result as the matrix-column vector multiplication, except that it returns a one-dimensional array instead of a two-dimensional one:

In [104]:
np.matmul(matrix, row_vector)

array([14, 32])

Similarly, we can compute the dot-product between two vectors (here: the vector norm)

In [105]:
np.matmul(row_vector, row_vector)

14

NumPy has a special dot function that behaves similar to matmul on pairs of one- or two-dimensional arrays -- its underlying implementation is different though, and one or the other can be slightly faster on specific machines and versions of BLAS:

In [106]:
np.dot(row_vector, row_vector)

14

In [107]:
np.dot(matrix, row_vector)

array([14, 32])

In [108]:
np.dot(matrix, column_vector)

array([[14],
       [32]])

Similar to the examples above we can use matmul or dot to multiply two matrices (here: two-dimensional arrays). In this context, NumPy arrays have a handy transpose method to transpose matrices if necessary:

In [109]:
matrix = np.array([[1, 2, 3], 
                   [4, 5, 6]])

matrix.transpose()

array([[1, 4],
       [2, 5],
       [3, 6]])

In [110]:
matrix.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [111]:
np.matmul(matrix, matrix.transpose())

array([[14, 32],
       [32, 77]])

## Random Number Generators

In machine learning and deep learning, we often have to generate arrays of random numbers -- for example, the initial values of our model parameters before optimization. NumPy has a random subpackage to create random numbers and samples from a variety of distributions conveniently.

In [None]:
# seed the random number generator, this ensures that the results are repoducable
np.random.seed(123) 
# now get 3 samples from a uniform distrubution
np.random.rand(3)

use a RandomState object to create the same results that we obtained via np.random.rand in the previous code snippet:

In [None]:
rng1 = np.random.RandomState(seed=123)
rng1.rand(3)

Another useful function that we will often use in practice is randn, which returns a random sample of floats from a standard normal distribution $N(\mu, \sigma^2)$, where the mean, ($\mu$) is zero and unit variance ($\sigma = 1$). The example below creates a two-dimensional array of such z-scores:

In [None]:
rng2 = np.random.RandomState(seed=123)
z_scores = rng2.randn(10,2)
z_scores