In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread

# Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

In this notebook, we'll explore two important techniques used in data science: **Singular Value Decomposition (SVD)** for image compression and **Principal Component Analysis (PCA)** for reducing dimensionality in datasets. By the end of this exercise, you should have a good understanding of how these methods work and how to implement them.

### **Instructions**:
1. Complete each code cell where prompted.
2. Follow the hints provided within each task.
3. Answer reflection questions thoughtfully based on the outputs of your code.

---

## **Part 1: SVD for Image Compression**

In this section, we'll apply SVD to compress an image. You will explore how reducing the number of singular values affects the quality of the image.

### **Task 1: Load and Display the Image**

Using the `matplotlib.image.imread` function, load and display an image from a URL. The first step is to visualize the original image.

In [None]:
# Copy paste path to image below
filename = '/content/IMAGE_NAME.jpeg'  # Replace with the path to your local image

# Step 2: Load the image using imread()

# Step 3: Display the image using imshow()


### **Task 2: Convert the Image to Grayscale**

Using the function `rgb2gray()`, convert the loaded image to grayscale. Grayscale images simplify the SVD process by reducing the image to a single channel.

In [None]:
# Convert to grayscale using the provided transformation
def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])


# Step 4: Apply the rgb2gray function to the image

# Step 5: Display the grayscale image using imshow()


### **Task 3: Perform SVD on the Grayscale Image**

Using `numpy.linalg.svd()`, perform SVD on the grayscale image to decompose it into three matrices: $U$, $\Sigma$, and $V^T$.

In [None]:
# Step 6: Decompose the grayscale image using SVD from numpy.linalg library

# Print shapes of U, S, and VT to understand their structure


### **Task 4: Reconstruct the Image Using Different Numbers of Singular Values**

Define a function `reconstruct_image()` to reconstruct the image using the top $k$ singular values from the $\Sigma$ matrix. This will allow you to see how the image quality is affected as $k$ increases.

In [None]:
# Reconstruct the image using k singular values

# Step 7: Test the reconstruction function with different values of k

# Display the compressed images


## **Part 2: PCA on a Dataset**

In this section, you will apply PCA to the famous Iris dataset to reduce its dimensionality and visualize the data in 2D.

### **Task 5: Load the Iris Dataset**

Using `sklearn.datasets.load_iris`, load the Iris dataset and store it in a DataFrame. You will also add the species labels to the DataFrame.

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
features = iris.feature_names

# Step 2: Create a DataFrame with the feature data

# Display the first few rows of the dataset

### **Task 6: Standardize the Data**

Using `sklearn.preprocessing.StandardScaler`, standardize the dataset before applying PCA. Standardization is important to ensure that each feature contributes equally to the PCA results.

In [None]:
from sklearn.preprocessing import StandardScaler

# Step 3: Standardize the data using StandardScaler

# Print the mean and variance after scaling

### **Task 7: Perform PCA**

Using `sklearn.decomposition.PCA`, reduce the dimensionality of the Iris dataset to 2 principal components. Store the principal components in a DataFrame for easy visualization.

In [None]:
from sklearn.decomposition import PCA

# Step 4: Instantiate PCA and reduce to 2 components

# Step 5: Create a DataFrame with the principal components

# Display the first few rows of the principal component DataFrame

### **Task 8: Analyze the Results**

Using the `explained_variance_ratio_` attribute of the PCA object, check how much of the variance in the data is captured by the two principal components.

In [None]:
# Step 6: Analyze the explained variance ratio

### **Task 9: Visualize the PCA-Transformed Data**

Using `seaborn.scatterplot()`, create a scatter plot of the two principal components, and color the points based on the species.

In [None]:
import seaborn as sns

# Step 7: Visualize the PCA-transformed data