# Exercise 2.1 - Numpy

In [1]:
import numpy as np
import matplotlib.pyplot as plt

# Getting Indicies and Masking

You can get positions (indices) where elements of two different arrays match using np.where:

In [49]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])
np.where(a == b)[0]

array([1, 3, 5, 7])

We could also create a boolean mask based on our equality of interest and use it to index either array

In [50]:
mask = a == b
mask
# a[mask]

array([False,  True, False,  True, False,  True, False,  True, False,
       False])

This type of boolean logical is very powerful and allows you to perform operations on array components that meet a specific criterion by filling in two other options in np.where:

In [51]:
a = np.arange(10)
np.where(a < 5, a, 10*a)

array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])

The syntax for np.where is np.where(condition, return if True, return if False)

# Exercises 2.1.1. Filtering Numpy Arrays

Extract all odd numbers from arr (answer: array([1, 3, 5, 7, 9]). Can you doing it using a boolean mask and slicing?

In [52]:
arr = np.arange(10)

mask = arr % 2 == 1
arr[mask]
arr[1::2]

array([1, 3, 5, 7, 9])

Replace all odd numbers in arr with -1

In [53]:
arr = np.arange(10)

arr[arr % 2 == 1] = -1
arr

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

Replace all odd numbers in arr with -1 without changing arr (Hint: use np.where)

In [54]:
arr = np.arange(10)

out = np.where(arr % 2 == 1, -1, arr)
out

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

Get all items between 5 and 10 from a.  Can you do it in three ways? Use the & operator or np.logial_and() function

In [55]:
a = np.array([2, 6, 1, 9, 10, 3, 27])

# Method 1
index = np.where((a >= 5) & (a <= 10))
a[index]

# Method 2
index = np.where(np.logical_and(a>=5, a<=10))
a[index]

# Method 3
a[(a >= 5) & (a <= 10)]

array([ 6,  9, 10])

Convert a 1D array to a 2D array with 2 rows. (Hint: Read about the np.reshape function with np.reshape?

Solution should look like: array([[0, 1, 2, 3, 4],
                [5, 6, 7, 8, 9]])

In [56]:
a = np.arange(10)

np.reshape(a,(2,5))
# np.reshape(a,(2,-1)) # Setting -1 automatically decides the number of columns

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

# Matrices in Numpy

We can also define matrices (two-dimensional tensors)

In [8]:
A = np.array([[1,2],[2,1]], dtype=np.float64)
B = np.array([[5,6],[7,8]], dtype=np.float64)

In [9]:
A = np.array([[1,2],[2,1]])
B = np.array([[5,6],[7,8]])

In [10]:
z = np.zeros((4,4)); print(z)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [11]:
o = np.ones((2,5)); print(o)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


We could create the same matrix using the reshape function if we wanted to.

In [12]:
np.ones(10).reshape(2,5)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

A numpy array has a set of properties that we can query:

In [13]:
A.ndim

2

In [14]:
A.shape

(2, 2)

In [15]:
A.size

4

In [16]:
A.dtype

dtype('int64')

# Combining arrays and matrices

Concatenation allows you to take two or more arrays of the same shape and merge them together

In [17]:
a1 = np.arange(9).reshape((3,3))
a1

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [18]:
a2 = np.arange(10,19).reshape((3,3))
a2

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [19]:
np.concatenate((a1, a2))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

By default, concatenation happens on axis = 0, or along the rows, but we can set it to columns:

In [20]:
np.concatenate((a1, a2), axis=1)

array([[ 0,  1,  2, 10, 11, 12],
       [ 3,  4,  5, 13, 14, 15],
       [ 6,  7,  8, 16, 17, 18]])

There are two other useful functions that stack vertically (vstack) or horizontally (hstack):

In [21]:
np.vstack((a1, a2, a1))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18],
       [ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8]])

In [22]:
np.hstack((a2, a1, a2))

array([[10, 11, 12,  0,  1,  2, 10, 11, 12],
       [13, 14, 15,  3,  4,  5, 13, 14, 15],
       [16, 17, 18,  6,  7,  8, 16, 17, 18]])

Swapping columns can be done with slice methods.  Here we swap the first and second columns:

In [23]:
q = np.arange(9).reshape(3,3)
q
# q[:, [1, 0, 2]]

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Now we can perform mathematical operations on matrices A and B

In [24]:
print(A); print("\n"); print(B)

[[1 2]
 [2 1]]


[[5 6]
 [7 8]]


In [25]:
print(A+B)

[[6 8]
 [9 9]]


In [26]:
print(np.add(A,B))

[[6 8]
 [9 9]]


In [27]:
print(A*B)

[[ 5 12]
 [14  8]]


In [28]:
print(A @ B)
#print(np.dot(A,B)) # Using @ for the dot product was introduced in Python 3, np.dot() was used in Python 2.

[[19 22]
 [17 20]]
[[19 22]
 [17 20]]


# The zip() function

Python's zip() function takes in a iterables as arguments and returns an **iterator**.  The iterator generates a series of tuples containing elements from each iterable.  It accepts iterables of multiple types, including files, lists, tuples, arrays, etc.

The zip() function becomes very powerful when using it to iterate over multiple arrays, especially when used in combination with enumerate().

In [78]:
a3 = np.arange(1,10)
a4 = np.arange(11,20)
for i3, i4 in zip(a3,a4):
    print(i3,i4)

1 11
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19


enumerate() provides an index for all of the elements that are looped (iterated) over.

In [79]:
for i, (i3, i4) in enumerate(zip(a3,a4)):
    print(i, i3,i4)

0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18
8 9 19


# Exercises 2.1.2. Creating Arrays

Create the following pattern from the basic array a = np.array([1,2,3]) without hardcoding to create this solution:

    array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

**Hint:** You need to use the functions np.tile() and np.repeat()

In [80]:
a = np.array([1,2,3])

np.concatenate( (np.repeat(a,3), np.tile(a,3)) )

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

Get the common items between a and b using the np.intersect1d() function

In [81]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

np.intersect1d(a,b)

array([2, 4])

Find the unique values contained in a but not b.  Can you do it two different ways using either np.setdiff1d() or np.in1d()?

In [82]:
a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])

# a[~np.in1d(a,b)]
np.setdiff1d(a,b)

array([1, 2, 3, 4])

Swap the second and third rows of this matrix

In [83]:
w = np.arange(9).reshape(3,3)

w[[0,2,1],:]

array([[0, 1, 2],
       [6, 7, 8],
       [3, 4, 5]])

The function maxx accepts two scalars and returns the maximum value.  Write a new function called pair_max() that accepts two arrays and returns as array containing the elementwise max value.

**HINT:** Use the zip() function!

In [84]:
def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y

maxx(1, 5)

5

In [3]:
import numpy as np

In [4]:
a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

def pair_max(x, y):
    z=[]
    for xi, yi in zip(x,y):
        if xi >= yi:
            z.append(xi)
        else:
            z.append(yi)
    return np.array(z, dtype=float)

pair_max(a, b)
#Solution: array([ 6.,  7.,  9.,  8.,  9.,  7.,  5.])

array([6., 7., 9., 8., 9., 7., 5.])

# Matrix Equations

As an example, take 

$$ A = \begin{pmatrix} 1 & 2 \\2 & 1 \end{pmatrix}$$

and

$$ {\bf x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$$

so that 

$$ A {\bf x} =\begin{pmatrix} 1 & 2 \\2 & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}= \begin{pmatrix} x_1 + 2 x_2\\ 2 x_1 + x_2 \end{pmatrix}$$

We can now solve: $$ A {\bf x} = {\bf b}$$

for instance, $$ \begin{pmatrix} x_1 + 2 x_2\\ 2 x_1 + x_2 \end{pmatrix} = \begin{pmatrix}2\\1\end{pmatrix}$$

In Python, we do this using the linear algebra sub-module linalg.

In [86]:
A = np.array([[1, 2], [2, 1]])

In [87]:
print(A)

[[1 2]
 [2 1]]


In [88]:
np.linalg.solve(A,np.array([2,1]))

array([0., 1.])

We can do this explicitly, step-by-step, by first forming the inverse of $A$.

In [89]:
np.linalg.inv(A)

array([[-0.33333333,  0.66666667],
       [ 0.66666667, -0.33333333]])

In [90]:
b = np.array([2,1])

And now we take $$ A^{-1} {\bf b}$$

In [91]:
np.dot(np.linalg.inv(A),b)

array([0., 1.])

<h3>Eigenvalue equation:</h3> $$ A {\bf v} = \lambda {\bf v}$$

where $A$ is a square matrix, ${\bf v}$ a vector, and $\lambda$ a scalar value (possibly complex-valued)

If we align all the vectors ${\bf v}$ that satisfy the eigenvalue equation as column vectors in a new matrix $V$, then

$$
A V = V  
  \begin{pmatrix}
    \lambda_{1} & 0 & 0\\
    0 & \ddots & 0  \\
    0 &  0 & \lambda_{n}
  \end{pmatrix}
$$

where we have a diagonal matrix 
$$
\Lambda= \begin{pmatrix}
    \lambda_{1} & 0  & 0 \\
    0 & \ddots & 0  \\
    0 &  0 & \lambda_{n}
  \end{pmatrix}
$$

We can now multiply by $V^{-1}$ from the left

$$V^{-1} A V= \Lambda$$

This is called diagonalizing a matrix. 

In [92]:
evals, evecs = np.linalg.eig(A)

In [93]:
print(evals)

[ 3. -1.]


The eigenvectors are the columns of the following matrix

In [94]:
print(evecs)

[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]


In [95]:
print(B)

[[5 6]
 [7 8]]


Indexing of a matrix should be done as matrix[rows, columns]

In [96]:
B[:,0] # Gives us all rows, first column

array([5, 7])

In [97]:
B[1,:] # Gives us second row, all columns

array([7, 8])

We need to address the first eigenvector as follows

In [98]:
evecs[:,0]

array([0.70710678, 0.70710678])

and not as 

In [99]:
evecs[0]

array([ 0.70710678, -0.70710678])

which refers to the <b>first</b> row, not the first column of the eigenvector matrix. 

Let us show that the computed vectors and values satisfy the eigenvalue equation: $$ A {\bf v} = \lambda {\bf v}$$

In [100]:
evecs[:,1]

array([-0.70710678,  0.70710678])

In [101]:
np.dot(A, evecs[:,1])/evals[1] 

array([-0.70710678,  0.70710678])

In [102]:
np.dot(A, evecs[:,1]) - evals[1] * evecs[:,1]

array([7.77156117e-16, 1.11022302e-16])

The function np.allclose() is used to find if two arrays are element-wise equal within a very small tolerance

In [103]:
np.allclose(np.dot(A, evecs[:,1]), evals[1] *evecs[:,1])

True

In [104]:
np.allclose(np.dot(A, evecs[:,1]), np.array([1,2]))

False

This final part demonstrates one practical use of numpy.  Principal Component Analysis could be performed by finding eigenvectors and eigenvalues of matrices, which is one analysis that is common in data analysis of all types.

# Exercises 2.1.3. Linear Algebra

Use Matrix A for the below exercises

In [55]:
F = np.array([[67, 23, 94, 50], [12, 26, 49, 60], [27, 54, 19, 42], [29, 40, 37, 24]])
print(F)

[[67 23 94 50]
 [12 26 49 60]
 [27 54 19 42]
 [29 40 37 24]]


Use Numpy to calculate the sum of the diagonal elements in A.

In [50]:
np.trace(F)
# np.diag(F).sum()

136

Use Numpy to get the QR decomposition of matrix A.

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R. Check your answer by computing the dot product of the resulting matrices.

In [51]:
print("Original array:")
print(F)
q, r = np.linalg.qr(F)

Original array:
[[67 23 94 50]
 [12 26 49 60]
 [27 54 19 42]
 [29 40 37 24]]


In [52]:
q @ r

array([[67., 23., 94., 50.],
       [12., 26., 49., 60.],
       [27., 54., 19., 42.],
       [29., 40., 37., 24.]])

Diagonalize matrix F. Use both matrix multiplication and np.allclose to show that the computed vectors and values satisfy the eigenvalue equation:

$$ F {\bf v} = \lambda {\bf v}$$

In [53]:
Fvals, Fvecs = np.linalg.eig(F)

In [57]:
F @ Fvecs[:,1]

array([-26.63567193,  13.70094327,   6.28862982,   1.2489923 ])

In [61]:
Fvals[1] * Fvecs[:,1]

array([-26.63567193,  13.70094327,   6.28862982,   1.2489923 ])

In [63]:
np.allclose(F @ Fvecs[:,0], Fvals[0] * Fvecs[:,0])

True