

<center>
    <img src="https://miro.medium.com/v2/resize:fit:300/1*mgncZaKaVx9U6OCQu_m8Bg.jpeg">
</center>



The goal of PCA is to extract information while reducing the number of features
from a dataset by identifying which existing features relate to another. The crux of the algorithm is trying to determine the relationship between existing features, called principal components, and then quantifying how relevant these principal components are. The principal components are used to transform the high dimensional data to a lower dimensional data while preserving as much information. For a principal component to be relevant, it needs to capture information about the features. We can determine the relationships between features using covariance.

In [None]:
#import necessary package

import numpy as np
from scipy import linalg as LA
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

In [None]:

data = np.array([
    [   1,   2,  -1,   4,  10],
    [   3,  -3,  -3,  12, -15],
    [   2,   1,  -2,   4,   5],
    [   5,   1,  -5,  10,   5],
    [   2,   3,  -3,   5,  12],
    [   4,   0,  -3,  16,   2],
])

### Step 1: Standardize the Data along the Features

![image.png](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQLxe5VYCBsaZddkkTZlCY24Yov4JJD4-ArTA&usqp=CAU)




Explain why we need to handle the data on the same scale.

**Answer:**

The aim of standardizing the range of the continuous initial variables is so that each one of them contributes equally to the analysis.

To be more specific, the reason why it is important to perform standardization prior to PCA, is that the latter is quite sensitive regarding the variances of the initial variables. That is, if there are large differences between the ranges of initial variables, those variables with larger ranges will dominate over those with small ranges. ***For example:*** a variable that ranges between 0 and 100 will dominate over a variable that ranges between 0 and 1 which will lead to biased results. So, transforming the data to comparable scales can actually prevent this problem to happen.

In [None]:
mean = np.mean(data, axis=0, keepdims=True)

standardized_data = (data - mean) / np.std(data, axis=0)

# test to see if the data is standardized
print(standardized_data)

[[-1.36438208  0.70710678  1.5109662  -0.99186978  0.77802924]
 [ 0.12403473 -1.94454365 -0.13736056  0.77145428 -2.06841919]
 [-0.62017367  0.1767767   0.68680282 -0.99186978  0.20873955]
 [ 1.61245155  0.1767767  -1.78568733  0.33062326  0.20873955]
 [-0.62017367  1.23743687 -0.13736056 -0.77145428  1.00574511]
 [ 0.86824314 -0.35355339 -0.13736056  1.65311631 -0.13283426]]


![cov matrix.webp](https://dmitry.ai/uploads/default/original/1X/9bd2851674ebb55e404cc3ff5e2ffe65b42ff460.png)

We use the pair - wise covariance of the different features to determine how they relate to each other. With these covariances, our goal is to group / cluster based on similar patterns. Intuitively, we can relate features if they have similar covariances with other features.

### Step 2: Calculate the Covariance Matrix



In [None]:
# Calculate the covariance matrix
cov_matrix = np.cov(standardized_data, rowvar=False)

print(cov_matrix)

[[ 1.2        -0.42098785 -1.0835838   0.90219291 -0.37000528]
 [-0.42098785  1.2         0.20397003 -0.77149364  1.18751836]
 [-1.0835838   0.20397003  1.2        -0.59947269  0.22208218]
 [ 0.90219291 -0.77149364 -0.59947269  1.2        -0.70017993]
 [-0.37000528  1.18751836  0.22208218 -0.70017993  1.2       ]]


### Step 3: Eigendecomposition on the Covariance Matrix


In [None]:
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

print("Eigenvalues: \n", eigenvalues) # just checking the data
print("Eigenvectors: \n", eigenvectors)

Eigenvalues: 
 [3.80985761e+00 1.73655615e+00 4.94531029e-02 4.74189469e-05
 4.04085720e-01]
Eigenvectors: 
 [[-0.4640131   0.45182808 -0.70733581  0.28128049 -0.03317471]
 [ 0.45019005  0.48800851  0.29051532  0.6706731  -0.15803498]
 [ 0.37929082 -0.55665017 -0.48462321  0.24186072 -0.5029143 ]
 [-0.4976889   0.03162214  0.36999674 -0.03373724 -0.78311558]
 [ 0.43642295  0.49682965 -0.20861365 -0.64143906 -0.32822489]]


### Step 4: Sort the Principal Components
# np.argsort can only provide lowest to highest; use [::-1] to reverse the list

In [None]:
# np.argsort can only provide lowest to highest; use [::-1] to reverse the list

order_of_importance = np.argsort(eigenvalues)[::-1]#TO DO: insert code here
print ( 'the order of importance is :\n {}'.format(order_of_importance))

# utilize the sort order to sort eigenvalues and eigenvectors
sorted_eigenvalues = eigenvalues[order_of_importance]#TO DO: insert code here]

print('\n\n sorted eigen values:\n{}'.format(sorted_eigenvalues))
sorted_eigenvectors = eigenvectors[:, order_of_importance]#TO DO: insert code here] # sort the columns
print('\n\n The sorted eigen vector matrix is: \n {}'.format(sorted_eigenvectors))

the order of importance is :
 [0 1 4 2 3]


 sorted eigen values:
[3.80985761e+00 1.73655615e+00 4.04085720e-01 4.94531029e-02
 4.74189469e-05]


 The sorted eigen vector matrix is: 
 [[-0.4640131   0.45182808 -0.03317471 -0.70733581  0.28128049]
 [ 0.45019005  0.48800851 -0.15803498  0.29051532  0.6706731 ]
 [ 0.37929082 -0.55665017 -0.5029143  -0.48462321  0.24186072]
 [-0.4976889   0.03162214 -0.78311558  0.36999674 -0.03373724]
 [ 0.43642295  0.49682965 -0.32822489 -0.20861365 -0.64143906]]


Question:

**1. Why do we order eigen values and eigen vectors?**

We order eigenvalues and eigenvectors to identify the principal components that capture the most variance in the dataset. This helps us prioritize the most significant dimensions for further analysis or dimensionality reduction.

**2. Is it true we would consider the lowest eigen value compared to the highest? Defend your answer**


No, it is not true. We prioritize the highest eigenvalue over the lowest one. The highest eigenvalue corresponds to the principal component that captures the most variance in the data, making it more important for analysis.

You want to see what percentage of information each eigen value holds. You would have print out the percentage of each eigen value using the formula



> (sorted eigen values / sum of all sorted eigen values) * 100



In [None]:
# use sorted_eigenvalues to ensure the explained variances correspond to the eigenvectors

explained_variance = (sorted_eigenvalues / np.sum(sorted_eigenvalues)) * 100
explained_variance =["{:.2f}%".format(value) for value in explained_variance]
print( explained_variance)

['63.50%', '28.94%', '6.73%', '0.82%', '0.00%']


#Initialize the number of Principle components then perfrom matrix multiplication with the variable K example k = 3 for 3 priciple components




> The reulting matrix (with reduced data) = standardized data * vector with columns k

See expected output for k = 2



In [None]:
k =  2# select the number of principal components

reduced_data = np.matmul(standardized_data, sorted_eigenvectors[:, :k])

In [None]:
print(reduced_data)

[[ 2.3577116  -0.75728867]
 [-2.27171739 -1.81970663]
 [ 1.21259114 -0.50390931]
 [-1.41935914  1.9229856 ]
 [ 1.61562536  0.87541857]
 [-1.49485157  0.28250044]]


In [None]:
print(reduced_data.shape)

(6, 2)


# *What are 2 positive effects and 2 negative effects of PCA?*

**Positive Effects (Benefits) of PCA:**

A- Dimensionality Reduction: PCA helps in reducing the number of features (dimensions) in a dataset while retaining most of the important information. This reduction in dimensionality leads to simpler models, faster training times, and improved performance.

B- Feature Extraction: PCA extracts underlying patterns and relationships between features in high-dimensional data. It identifies the directions (principal components) along which the data varies the most, thereby providing a compact representation of the dataset.

**Negative Effects (Limitations) of PCA:**

A- Loss of Interpretability: after applying PCA, the original features are transformed into new dimensions (principal components) that may not have direct physical or intuitive meanings. This loss of interpretability can make it challenging to explain the results of the analysis.

B- Sensitivity to Outliers: PCA is sensitive to outliers in the data. Outliers can disproportionately influence the covariance matrix and the calculation of principal components, leading to potentially biased results. Preprocessing steps like outlier detection and removal may be necessary to mitigate this issue.


                 **© Mohamed Ahmed Yasin**