Welcome to your assignment about concepts covered in Chapter 4 of *Essential Math for Data Science* by Thomas Nield. You will be using linear algebra.

In this assignment, you will apply the concepts of linear algebra to solve two real-world problems in data science. The problems are of mid to low difficulty level, and you will need to use Python and the NumPy library to implement the necessary computations.

Please read each question carefully and provide detailed explanations for your answers, including any relevant calculations or work. You are also required to provide Python solutions for the technical problems in each question.

# Problem 1

Principal Component Analysis (PCA) is a widely used technique for dimensionality reduction in data science. PCA's relationship with linear algebra is central to its functioning and effectiveness. Linear algebra provides the mathematical framework for transforming, extracting meaningful information, and reducing the dimensionality of the data while preserving its essential characteristics. Understanding the underlying linear algebra concepts is essential for grasping the theory and implementation of PCA effectively. In this question, you will use PCA to reduce the dimensionality of a dataset of your choice. For example, an iris dataset in Python looks like this:

sepal_length sepal_width petal_length petal_width species

5.1 3.5 1.4 0.2 setosa

4.9 3.0 1.4 0.2 setosa

4.7 3.2 1.3 0.2 setosa

4.6 3.1 1.5 0.2 setosa

5.0 3.6 1.4 0.2 setosa

...

The dataset contains 150 rows, each representing a flower. The columns are:

sepal_length: the length of the sepal

sepal_width: the width of the sepal

petal_length: the length of the petal

petal_width: the width of the petal

species: the species of iris (setosa, versicolor, or virginica)

We will be following these steps:

1. Load the dataset into a NumPy array.
2. Center the data by subtracting the mean from each feature.
3. Compute the covariance matrix of the centered data.
4. Compute the eigendecomposition of the covariance matrix.
5. Choose the top k eigenvectors that correspond to the k largest eigenvalues
   where k is the desired number of principal components.
6. Transform the centered data using the selected eigenvectors.
7. Reconstruct the original data using the transformed data and the
   eigenvectors.

Write a Python program that implements the above steps and reduces the dimensionality of the dataset. Print the variance explained by the selected principal components for different values of k (e.g., k=2, k=3, k=4). Explain your results in 100 words.

Step 1. Import the necessary libraries. *Nothing to change in the code below*

In [1]:
from os import read
import numpy as np
from numpy.linalg import eig, inv
from sklearn.datasets import load_iris


 Step 2. Load the dataset into a NumPy array. *Nothing to change in the code below*



In [2]:
iris = load_iris()
X = iris.data

Step 3. Center the data by subtracting the mean from each feature. *Nothing to change in the code below*

In [3]:
X_centered = X - np.mean(X,axis=0)


Step 4. Compute the covariance matrix of the centered data. *Nothing to change in the code below*


In [4]:
covariance_matrix = np.cov(X_centered, rowvar=False)

Step 5. Compute the eigendecomposition of the covariance matrix to obtain the eigenvectors and eigenvalues. *In the code below, enter the function from linalg and the variable name of the covariance matrix. Follow example 4-20 from the book*

In [5]:
eigenvalues, eigenvectors = eig(covariance_matrix)

Step 6. Choose the top k eigenvectors that correspond to the K largest eigenvalues. *Nothing to change in the code below*

In [6]:
k = 2
top_k_eigenvectors = eigenvectors[:, :k]

Step 7. Transform the centered data using the selected eigenvectors. *Follow the code in Examples 4-11 and 4-21 to use the correct operator to perform the transformation.*

In [7]:
transformed_data = X_centered @ top_k_eigenvectors

Step 8. Reconstruct the original data using the transformed data and the eigenvectors. *Nothing to change in the code below*

In [8]:
reconstructed_data = transformed_data @ top_k_eigenvectors.T + np.mean(X, axis=0)

Step 9. Print the variance explained by the selected principal components for different values of k. *Nothing to change in the code below*

In [9]:
for k in range(2, 5):
    top_k_eigenvectors = eigenvectors[:, :k]
    transformed_data = X_centered @ top_k_eigenvectors
    explained_variance = sum(eigenvalues[:k]) / sum(eigenvalues)
    print(f"Variance explained by {k} principal components: {explained_variance:.2f}")

Variance explained by 2 principal components: 0.98
Variance explained by 3 principal components: 0.99
Variance explained by 4 principal components: 1.00


Step 10. In this text box, explain what the results mean. What did we do in this problem and why did we do it?

Your response:

To be quite honest I am struggling to fully grasp what the concepts above. Therefore, I will try to explain the way I understand. The variance explained by `n` principal components, describes how much of the original data set variance is captured by using different number of principal components.

# Problem 2

Solving a System of Linear Equations using matrix inversion method

Suppose you are given a system of linear equations of the form Ax=b, where A is a square matrix and x and b are vectors.

A = [[2, 1, -1],[4, -6, 0],[-2, 7, 2]]

b = [5, -2, 9]

(you are solving these three equations:

2x +y -z = 5

4x -6y = -2

-2x +7y + 2z = 9)

Your task is to solve the system of equations using matrix inversion approach with Python.

Print the solution vector x and verify your solution by computing Ax.

What does your solution vector x represent?




Step 1. Import numpy as np. Then array from numpy. Next, get inv and solve from linalg. Follow the example 4-18 in the book.

In [11]:
import numpy as np
from numpy.linalg import inv
from numpy import array

Step 2. Create the A array and the b array. Follow the example 4-18 in the book to setup your arrays.

In [12]:
A = np.array([[2, 1, -1], [4, -6, 0], [-2, 7, 2]])
b = np.array([5, -2, 9])

Step 3. Use the solve function (from linalg) to solve Ax=b. This is not in the book, but if you need help you can read about it in the classroom.

In [14]:
A_inv = inv(A)
x = np.dot(A_inv, b)
print("The solution using solve in Python for this system of equation is         ", x)

The solution using solve in Python for this system of equation is          [2.         1.66666667 0.66666667]


Step 4. Use the matrix inversion approach to solve Ax=b. Follow example 4-18 in the book.

In [15]:
x = inv(A).dot(b)
print("The solution using matrix inversion approach for this system of equation is", x)

The solution using matrix inversion approach for this system of equation is [2.         1.66666667 0.66666667]


Step 5. Print Ax to verify if Ax=b. *Nothing to change in the code below*

In [16]:
Ax =np.dot(A,x)
print("This is to verify if Ax=b, so we print Ax=", Ax)

This is to verify if Ax=b, so we print Ax= [ 5. -2.  9.]


Step 6.  What does the solution above represent? What are these values?

Your response:

By solving the system using the matrix inversion method, we find the specific values x that make all three equations true simultaneously.

# Problem 3

Solving a System of Linear Equations using matrix inversion method

Suppose you are given a system of linear equations of the form Ax=b, where A is a square matrix and x and b are vectors.

A = [[3, -1, 2, 0],[2, 4, 0, 1],[-1, 3, 5, -2],[0, 2, -1, 4]]

b = [4, 3, 8, -1]

Your task is to solve the system of equations using matrix inversion approach with Python.

Print the solution vector x and verify your solution by computing Ax.

Step  1. Import numpy as np. Then array from numpy. Next, get inv  from linalg. Follow the example 4-18 in the book.

In [17]:
import numpy as np
from numpy import array
from numpy.linalg import inv

Step 2. Define the arrays Matrix A and vector b

In [18]:

A = array([[3, -1, 2, 0],[2, 4, 0, 1],[-1, 3, 5, -2],[0, 2, -1, 4]])
b = array([4, 3, 8, -1])

Step 3. Use the matrix inversion approach to solve Ax=b. Follow the example in 4-18 as well as Problem 2 in this assignment.

In [19]:
x = inv(A).dot(b)

print("The solution using matrix inversion approach for this system of equation is", x)

The solution using matrix inversion approach for this system of equation is [ 0.59150327  0.49346405  1.35947712 -0.15686275]


Step 4. Verify the solution is correct. Look at problem 2 to perform the dot product.

In [20]:
Ax = np.dot(A,x)
print("This is to verify if Ax=b, so we print Ax=", Ax)

This is to verify if Ax=b, so we print Ax= [ 4.  3.  8. -1.]


Step 5. What does this solution represent? Enter your response in the text box below.

Your response: