Q1: Load the load_digit dataset from sklearn.datasets.
This dataset is made up of 1797 8x8 images. Each image is of a hand-written digit. Therefore, each training
example has 64 features (8X8) pixel values. Hence, the size of dataset is 1797X64.
Implement feature selection technique using Principal Component Analysis (PCA) using step-by-step
approach.

## Step 1. Importing the dataset

In [1]:
from sklearn.datasets import load_digits
import numpy as np
import pandas as pd

digits = load_digits()

df1 = pd.DataFrame(data = digits.data)

df1['species'] = digits.target

X = digits.data
Y = digits.target

## Step 2. Normalizing using Z score 

In [2]:
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)
X_scaled[:5]

array([[ 0.        , -0.33501649, -0.04308102,  0.27407152, -0.66447751,
        -0.84412939, -0.40972392, -0.12502292, -0.05907756, -0.62400926,
         0.4829745 ,  0.75962245, -0.05842586,  1.12772113,  0.87958306,
        -0.13043338, -0.04462507,  0.11144272,  0.89588044, -0.86066632,
        -1.14964846,  0.51547187,  1.90596347, -0.11422184, -0.03337973,
         0.48648928,  0.46988512, -1.49990136, -1.61406277,  0.07639777,
         1.54181413, -0.04723238,  0.        ,  0.76465553,  0.05263019,
        -1.44763006, -1.73666443,  0.04361588,  1.43955804,  0.        ,
        -0.06134367,  0.8105536 ,  0.63011714, -1.12245711, -1.06623158,
         0.66096475,  0.81845076, -0.08874162, -0.03543326,  0.74211893,
         1.15065212, -0.86867056,  0.11012973,  0.53761116, -0.75743581,
        -0.20978513, -0.02359646, -0.29908135,  0.08671869,  0.20829258,
        -0.36677122, -1.14664746, -0.5056698 , -0.19600752],
       [ 0.        , -0.33501649, -1.09493684,  0.03864775,  0.

## Step3. Computing Covariance matrix


In [3]:
features = X_scaled.T
cov_matrix = np.cov(features)
print(cov_matrix)

[[ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          1.00055679  0.55692803 ... -0.02988686  0.02656195
  -0.04391324]
 [ 0.          0.55692803  1.00055679 ... -0.04120565  0.07263924
   0.08256908]
 ...
 [ 0.         -0.02988686 -0.04120565 ...  1.00055679  0.64868875
   0.26213704]
 [ 0.          0.02656195  0.07263924 ...  0.64868875  1.00055679
   0.62077355]
 [ 0.         -0.04391324  0.08256908 ...  0.26213704  0.62077355
   1.00055679]]


## Step 4. Eigen value and vectors

In [4]:
values, vectors = np.linalg.eig(cov_matrix)
values[:5]

array([7.34477606, 5.83549054, 5.15396118, 3.96623597, 2.9663452 ])

## Step 4.1. Choose the top K Eigen Values

In [5]:
explained_variances = []
for i in range(len(values)):
  explained_variances.append(values[i]/ np.sum(values))
print(explained_variances)


[0.12033916097734891, 0.09561054403097871, 0.08444414892624563, 0.06498407907524185, 0.048601548759664076, 0.04214119869271942, 0.039420828035673955, 0.033893809246383334, 0.029982210116252336, 0.029320025512522052, 0.027818054635503252, 0.025770550925819976, 0.02275303315764252, 0.02227179739514346, 0.02165229431849249, 0.019141666064421324, 0.01775547085168189, 0.016380692742844174, 0.015964601688623428, 0.014891911870878185, 0.013479695658179384, 0.012719313702347555, 0.011658373505919445, 0.010576465985363149, 0.00975315947198111, 0.00944558989732, 0.008630138269707243, 0.00836642853668518, 0.007976932484112432, 0.007464713709260607, 0.007255821513702735, 0.0069191124548118165, 0.006539085355726167, 0.006407925738459864, 0.005913841117223433, 0.005711624052235249, 0.00523636803416635, 0.0008253509448180296, 0.0048180758644514226, 0.004537192598584484, 0.0010369573015571822, 0.004231627532327809, 0.00406053069979038, 0.003970848082758285, 0.0012510074249730168, 0.001351184113370862,

## Step 5. Transform the original n dimensional data points into k dimensions.

In [6]:

projected_1 = X_scaled.dot(vectors.T[0])
projected_2 = X_scaled.dot(vectors.T[1])
res = pd.DataFrame(projected_1, columns = ['PC1'])
res['PC2'] = projected_2
res['Y'] = Y
res.head()

Unnamed: 0,PC1,PC2,Y
0,-1.914214,0.954502,0
1,-0.58898,-0.924636,1
2,-1.302039,0.317189,2
3,3.02077,0.868772,3
4,-4.528949,1.09348,4
