# Exercise 2  - Eigenvalues and Eigenvectors

In this exercise we will use Numpy to compute the eigenvalues and eigenvectors for the Iris dataset.

In [1]:
import pandas as pd
import numpy as np

Load the dataset

In [2]:
df = pd.read_csv('iris-data.csv')
df.head()

Unnamed: 0,Sepal Length,Sepal Width,Petal Length,Petal Width,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


Selecting the sepal features

In [3]:
df = df[['Sepal Length', 'Sepal Width']]
df.head()

Unnamed: 0,Sepal Length,Sepal Width
0,5.1,3.5
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1
4,5.0,3.6


From Numpy's linear algebra module use the single value decomposition function to compute the eigenvalues and eigenvectors

In [4]:
eigenvectors, eigenvalues, _ = np.linalg.svd(df.values, full_matrices=False)

Look at the eigenvalues, we can see that the first value is the largest and thus the first eigenvector contributes the most information.

In [5]:
eigenvalues

array([81.25483015,  6.96796793])

It is handy to look at the eigenvalues as a percentage of the total variance within the dataset.  We will use a cumulative sum function to do this:

In [6]:
eigenvalues = np.cumsum(eigenvalues)
eigenvalues

array([81.25483015, 88.22279808])

Divide by the last or maximum value to convert to a percentage

In [7]:
eigenvalues /= eigenvalues.max()
eigenvalues

array([0.92101851, 1.        ])

We can see here that the first (or principal) component comprises 92% of the variation within the data and thus most of the information.

Looking at the eigenvectors:

In [8]:
eigenvectors

array([[-0.07553027, -0.11068158],
       [-0.07052087, -0.06007995],
       [-0.06946245, -0.09874988],
       [-0.06780439, -0.09257869],
       [-0.07500106, -0.13001654],
       [-0.08106887, -0.14194824],
       [-0.06949767, -0.13083793],
       [-0.07387221, -0.10451038],
       [-0.06448827, -0.0802363 ],
       [-0.07108529, -0.07283303],
       [-0.07994002, -0.11644208],
       [-0.07168494, -0.11767416],
       [-0.06942723, -0.06666184],
       [-0.06395906, -0.09957127],
       [-0.08600784, -0.12837377],
       [-0.08717191, -0.18596798],
       [-0.08106887, -0.14194824],
       [-0.07553027, -0.11068158],
       [-0.08378535, -0.1094495 ],
       [-0.07722355, -0.14894082],
       [-0.07824674, -0.07818284],
       [-0.07665912, -0.13618774],
       [-0.07062652, -0.15634409],
       [-0.07440141, -0.08517542],
       [-0.07168494, -0.11767416],
       [-0.0716145 , -0.05349806],
       [-0.07387221, -0.10451038],
       [-0.0766239 , -0.10409969],
       [-0.07605947,

What is the shape of the eigenvector matrix?

In [10]:
eigenvectors.shape

(150, 2)

Thus the principal components of the dataset are:

In [13]:
P = eigenvectors[0]
P

array([-0.07553027, -0.11068158])