# Week 10: Single Value Decomposition

In this coding assignment, we will walk through an example of using Singular Value Decomposition (SVD) on a dataset of iris plants. Run the following cell to import the necessary packages. 

In [None]:
from sklearn.datasets import load_iris
from sklearn.decomposition import TruncatedSVD
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

To begin, run the following cell to load the dataset into this notebook:
* `iris_features` will contain a numpy array of 4 attributes for 150 different plants (shape 150 x 4). 
* `iris_target` will contain the class of each plant. There are 3 classes of plants in the dataset: Iris-Setosa, Iris-Versicolour, and Iris-Virginica. The class names will be stored in `iris_target_names`.
* `iris_feature_names` will be a list of 4 names, one for each attribute in `iris_features`. 

Additional information on the dataset will be included in the description printed at the end of the following cell.

In [None]:
from sklearn.datasets import load_iris
iris_data = load_iris() # Loading the dataset

# Convert the dataset into a dataframe.
iris_dataframe = sns.load_dataset("iris")


Now, let's have a look at the first few rows of our dataframe. 

In [None]:
iris_dataframe.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Let's explore the data by creating a scatter matrix of our iris features. To do this, we'll create 2D scatter plots for every possible pair of our four features, classifying the datapoints by species. This should result in twelve total scatter plots in our scatter matrix, but we only need to consider the six below the diagonal, due to redundancy. Complete the following cell to generate the plot. 

Hint: you should be using `sns.pairplot` to create the scatter plots. Use only a single line of code. 

In [None]:
sns.set_theme(style="ticks")
pairplots = ...
pairplots.fig.suptitle("Scatter Matrix of Iris Features", y=1.08)

To begin, we will be performing SVD on our matrix dataset. Recall that SVD decomposes a $m * n$ matrix $A$ into the matrix product $U\Sigma V^*$, where $V^* = V^T$. Enter the dimensions of $U$, $\Sigma$, and $V^*$ below. The dimensions of $A$ have been provided as an example.



In [None]:
dimension_A = "m * n"
dimension_U = " "
dimension_Sigma = " "
dimension_VT = " "

For convenience, we will be using `scikit-learn`'s `TruncatedSVD` module. Fill in the first line in the cell below to perform SVD on our iris dataset and obtain the singular values. Try changing the number of iterations and random states. 

In [None]:
svd = ...
svd.fit(iris_data.data)  
print(svd.singular_values_)

Now, we obtain the matrix decomposition. Check the `sklearn` documentation for more information. 

In [2]:
U = ...
Sigma = ...
VT = ...

x1 = U[:,0]
x2 = U[:,1]

color_directory = {iris_data.target_names[0]:"purple", iris_data.target_names[1]:"orange", iris_data.target_names[2]:"green"}

plt.figure()
for i in range(x1.shape[0]):

    color_index = iris_data.target_names[iris_data.target[i]]
    plt.scatter(x1[i].T,x2[i], color = color_directory[color_index])

plt.show()

You should be able to see the three species clusters clearly marked. 

Congratulations on completing an SVD analysis! We will now move on to PCA.