

<center>
    <img src="https://miro.medium.com/v2/resize:fit:300/1*mgncZaKaVx9U6OCQu_m8Bg.jpeg">
</center>



The goal of PCA is to extract information while reducing the number of features
from a dataset by identifying which existing features relate to another. The crux of the algorithm is trying to determine the relationship between existing features, called principal components, and then quantifying how relevant these principal components are. The principal components are used to transform the high dimensional data to a lower dimensional data while preserving as much information. For a principal component to be relevant, it needs to capture information about the features. We can determine the relationships between features using covariance.

In [6]:
#import necessary package
import numpy as np #TO DO


In [7]:

data = np.array([
    [   1,   2,  -1,   4,  10],
    [   3,  -3,  -3,  12, -15],
    [   2,   1,  -2,   4,   5],
    [   5,   1,  -5,  10,   5],
    [   2,   3,  -3,   5,  12],
    [   4,   0,  -3,  16,   2],
])

### Step 1: Standardize the Data along the Features

![image.png](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQLxe5VYCBsaZddkkTZlCY24Yov4JJD4-ArTA&usqp=CAU)




Explain why we need to handle the data on the same scale.

**In PCA, it is essential to standardize data because PCA identifies the directions of maximum variance, which can be biased if features are on different scales. Without standardization, features with larger magnitudes will dominate the principal components, leading to skewed results. Standardization ensures that all features contribute equally by rescaling them to have a mean of 0 and a standard deviation of 1, allowing PCA to capture the true underlying patterns and relationships in the data, rather than being influenced by the differences in scale.**

In [8]:
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
standardized_data = (data - mean) / std

![cov matrix.webp](https://dmitry.ai/uploads/default/original/1X/9bd2851674ebb55e404cc3ff5e2ffe65b42ff460.png)

We use the pair - wise covariance of the different features to determine how they relate to each other. With these covariances, our goal is to group / cluster based on similar patterns. Intuitively, we can relate features if they have similar covariances with other features.

### Step 2: Calculate the Covariance Matrix



In [9]:
cov_matrix = np.cov(standardized_data, rowvar=False)

print(cov_matrix)

[[ 1.2        -0.42098785 -1.0835838   0.90219291 -0.37000528]
 [-0.42098785  1.2         0.20397003 -0.77149364  1.18751836]
 [-1.0835838   0.20397003  1.2        -0.59947269  0.22208218]
 [ 0.90219291 -0.77149364 -0.59947269  1.2        -0.70017993]
 [-0.37000528  1.18751836  0.22208218 -0.70017993  1.2       ]]


### Step 3: Eigendecomposition on the Covariance Matrix


In [11]:
eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)

### Step 4: Sort the Principal Components
# np.argsort can only provide lowest to highest; use [::-1] to reverse the list

In [12]:
# np.argsort can only provide lowest to highest; use [::-1] to reverse the list

order_of_importance = np.argsort(eigenvalues)[::-1]
print ( 'the order of importance is :\n {}'.format(order_of_importance))

# utilize the sort order to sort eigenvalues and eigenvectors
sorted_eigenvalues = eigenvalues[order_of_importance]

print('\n\n sorted eigen values:\n{}'.format(sorted_eigenvalues))
sorted_eigenvectors = eigenvectors[:,order_of_importance] # sort the columns
print('\n\n The sorted eigen vector matrix is: \n {}'.format(sorted_eigenvectors))

the order of importance is :
 [4 3 2 1 0]


 sorted eigen values:
[3.80985761e+00 1.73655615e+00 4.04085720e-01 4.94531029e-02
 4.74189469e-05]


 The sorted eigen vector matrix is: 
 [[-0.4640131  -0.45182808 -0.03317471  0.70733581  0.28128049]
 [ 0.45019005 -0.48800851 -0.15803498 -0.29051532  0.6706731 ]
 [ 0.37929082  0.55665017 -0.5029143   0.48462321  0.24186072]
 [-0.4976889  -0.03162214 -0.78311558 -0.36999674 -0.03373724]
 [ 0.43642295 -0.49682965 -0.32822489  0.20861365 -0.64143906]]


Question:

1. Why do we order eigen values and eigen vectors?

 We order eigenvalues and eigenvectors to prioritize the most significant components for easier interpretation and analysis.

2. Is it true we would consider the lowest eigen value compared to the highest? Defend your answer

  Yes, we sometimes consider the lowest eigenvalue instead of the highest, depending on the specific application. For example, in stability analysis of dynamical systems, the smallest eigenvalue can reveal critical points of stability or instability. In optimization, the lowest eigenvalue of the Hessian matrix indicates the curvature of the function, where small values may suggest flat regions or saddle points. While the largest eigenvalue often represents dominance (such as in Principal Component Analysis, where it shows the direction of greatest variance), the smallest can be equally important in understanding system behavior or solving optimization problems.


You want to see what percentage of information each eigen value holds. You would have print out the percentage of each eigen value using the formula



> (sorted eigen values / sum of all sorted eigen values) * 100



In [13]:
# use sorted_eigenvalues to ensure the explained variances correspond to the eigenvectors

#TO DO: Insert code here
explained_variance = (sorted_eigenvalues / np.sum(sorted_eigenvalues)) * 100
explained_variance =["{:.2f}%".format(value) for value in explained_variance]
print( explained_variance)

['63.50%', '28.94%', '6.73%', '0.82%', '0.00%']


#Initialize the number of Principle components then perfrom matrix multiplication with the variable K example k = 3 for 3 priciple components




> The reulting matrix (with reduced data) = standardized data * vector with columns k

See expected output for k = 2



In [15]:
k = 2 # select the number of principal components

reduced_data = np.matmul(standardized_data, sorted_eigenvectors[:,:k])
reduced_data = np.round(reduced_data, decimals=2)

In [None]:
print(reduced_data)

[[ 1.07127878 -0.6983307 ]
 [-3.49014682 -0.59870297]
 [ 0.24003422 -0.48244534]
 [-0.14516166 -1.61189378]
 [ 1.34022572 -1.07434063]
 [-0.96573453 -2.23341502]]


In [16]:
print(reduced_data.shape)

(6, 2)


# *What are 2 positive effects and 2 negative effects of PCA

2 Positive Effects (Benefits) of PCA:

Dimensionality Reduction: PCA reduces the number of features, making it easier to visualize and process large datasets without losing too much important information.
Noise Reduction: By focusing on the most important components, PCA can filter out noise and irrelevant information, improving the performance of machine learning models.

2 Negative Effects (Limitations) of PCA:

Loss of Interpretability: The new principal components are linear combinations of the original features, making it harder to interpret the transformed data in terms of the original variables.
Assumption of Linearity: PCA assumes that the data can be represented well in a linear fashion, which may not hold for complex, non-linear datasets.