

<center>
    <img src="https://miro.medium.com/v2/resize:fit:300/1*mgncZaKaVx9U6OCQu_m8Bg.jpeg">
</center>



The goal of PCA is to extract information while reducing the number of features
from a dataset by identifying which existing features relate to another. The crux of the algorithm is trying to determine the relationship between existing features, called principal components, and then quantifying how relevant these principal components are. The principal components are used to transform the high dimensional data to a lower dimensional data while preserving as much information. For a principal component to be relevant, it needs to capture information about the features. We can determine the relationships between features using covariance.

In [2]:
#import necessary package
import numpy as np
from sklearn.preprocessing import StandardScaler #used StandardScalar for scaling at first but later changed my mind


In [3]:

data = np.array([
    [   1,   2,  -1,   4,  10],
    [   3,  -3,  -3,  12, -15],
    [   2,   1,  -2,   4,   5],
    [   5,   1,  -5,  10,   5],
    [   2,   3,  -3,   5,  12],
    [   4,   0,  -3,  16,   2],
])

### Step 1: Standardize the Data along the Features

![image.png](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQLxe5VYCBsaZddkkTZlCY24Yov4JJD4-ArTA&usqp=CAU)




Explain why we need to handle the data on the same scale.

1.   **Common Scale or Equal Contribution:** Datasets can have different units
and scales, so what scaling does is it ensures that each feature contributes equally. This means that each feature has zero mean and unit variance thereby contributing equally to the analysis. No single feature would disproportionately affect the results of the others with data that is scaled.
2. **Covariance Structure:** If data is not handled on the same scale, features that have larger variance or magnitude for example will dominate in a covariance matrix, leading to a biased principal components.The PCA relies on the covariance matrix to understand the relationships between features.
3. **A more optimised numerical stabilty**: Standardizing data makes the PCA algorithm more stable, making it numerically stable and efficient against numerical errors.  
4. **Improved Interpretation:** It is easier to interpret and understand features when they are on the same scale. It becomes easier to compare the influence of each feature on the model.

In [6]:


mean = np.mean(data, axis=0)
std_dev = np.std(data, axis=0)

# using the formula z = (xi - μ) / σ
standardized_data = (data - mean) / std_dev

print("Standardized Data:\n", standardized_data)

Standardized Data:
 [[-1.36438208  0.70710678  1.5109662  -0.99186978  0.77802924]
 [ 0.12403473 -1.94454365 -0.13736056  0.77145428 -2.06841919]
 [-0.62017367  0.1767767   0.68680282 -0.99186978  0.20873955]
 [ 1.61245155  0.1767767  -1.78568733  0.33062326  0.20873955]
 [-0.62017367  1.23743687 -0.13736056 -0.77145428  1.00574511]
 [ 0.86824314 -0.35355339 -0.13736056  1.65311631 -0.13283426]]


![cov matrix.webp](https://dmitry.ai/uploads/default/original/1X/9bd2851674ebb55e404cc3ff5e2ffe65b42ff460.png)

We use the pair - wise covariance of the different features to determine how they relate to each other. With these covariances, our goal is to group / cluster based on similar patterns. Intuitively, we can relate features if they have similar covariances with other features.

### Step 2: Calculate the Covariance Matrix



In [7]:

cov_matrix = np.cov(standardized_data, rowvar=False)

print("Covariance Matrix:\n", cov_matrix)

Covariance Matrix:
 [[ 1.2        -0.42098785 -1.0835838   0.90219291 -0.37000528]
 [-0.42098785  1.2         0.20397003 -0.77149364  1.18751836]
 [-1.0835838   0.20397003  1.2        -0.59947269  0.22208218]
 [ 0.90219291 -0.77149364 -0.59947269  1.2        -0.70017993]
 [-0.37000528  1.18751836  0.22208218 -0.70017993  1.2       ]]


### Step 3: Eigendecomposition on the Covariance Matrix


In [8]:

eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)


Eigenvalues:
 [3.80985761e+00 1.73655615e+00 4.94531029e-02 4.74189469e-05
 4.04085720e-01]
Eigenvectors:
 [[-0.4640131   0.45182808 -0.70733581  0.28128049 -0.03317471]
 [ 0.45019005  0.48800851  0.29051532  0.6706731  -0.15803498]
 [ 0.37929082 -0.55665017 -0.48462321  0.24186072 -0.5029143 ]
 [-0.4976889   0.03162214  0.36999674 -0.03373724 -0.78311558]
 [ 0.43642295  0.49682965 -0.20861365 -0.64143906 -0.32822489]]


### Step 4: Sort the Principal Components
# np.argsort can only provide lowest to highest; use [::-1] to reverse the list

In [9]:
# np.argsort can only provide lowest to highest; use [::-1] to reverse the list

order_of_importance = np.argsort(eigenvalues)[::-1]
print ( 'the order of importance is :\n {}'.format(order_of_importance))

# utilize the sort order to sort eigenvalues and eigenvectors
sorted_eigenvalues = eigenvalues[order_of_importance]

print('\n\n sorted eigen values:\n{}'.format(sorted_eigenvalues))
sorted_eigenvectors = eigenvectors[:, order_of_importance] # sort the columns
print('\n\n The sorted eigen vector matrix is: \n {}'.format(sorted_eigenvectors))

the order of importance is :
 [0 1 4 2 3]


 sorted eigen values:
[3.80985761e+00 1.73655615e+00 4.04085720e-01 4.94531029e-02
 4.74189469e-05]


 The sorted eigen vector matrix is: 
 [[-0.4640131   0.45182808 -0.03317471 -0.70733581  0.28128049]
 [ 0.45019005  0.48800851 -0.15803498  0.29051532  0.6706731 ]
 [ 0.37929082 -0.55665017 -0.5029143  -0.48462321  0.24186072]
 [-0.4976889   0.03162214 -0.78311558  0.36999674 -0.03373724]
 [ 0.43642295  0.49682965 -0.32822489 -0.20861365 -0.64143906]]


Question:

1. Why do we order eigen values and eigen vectors?

**We order eigen values and vectors because it shows how much variance is
captured by each principal component. Priority is given to the principal component that explain the most variance in the data set when we order eigen values in descending order. It makes sure that principal components capture the most variance, retaining critical information. It also helps in identifying and sorting the most important features in the data.**



2. Is it true we would consider the lowest eigen value compared to the highest? Defend your answer

**It is false. This is becuase in PCA, we focus on the highest eigenvalues because they represent the most significant components that capture the most variance and important patterns in the data. When we prioritze the highest eigenvalues, we are making sure that we have effective dimensionality reduction and noise reduction while preserving the critical information in the dataset.**


You want to see what percentage of information each eigen value holds. You would have print out the percentage of each eigen value using the formula



> (sorted eigen values / sum of all sorted eigen values) * 100



In [10]:
# use sorted_eigenvalues to ensure the explained variances correspond to the eigenvectors

#TO DO: Insert code here
explained_variance = (sorted_eigenvalues / np.sum(sorted_eigenvalues)) * 100
explained_variance =["{:.2f}%".format(value) for value in explained_variance]
print(explained_variance)

['63.50%', '28.94%', '6.73%', '0.82%', '0.00%']


#Initialize the number of Principle components then perfrom matrix multiplication with the variable K example k = 3 for 3 priciple components




> The reulting matrix (with reduced data) = standardized data * vector with columns k

See expected output for k = 2



In [11]:
k = 2  # no of principal components

top_k_eigenvectors = sorted_eigenvectors[:, :k]

# lemme transform you
reduced_data = np.matmul(standardized_data, top_k_eigenvectors)

In [12]:
print(reduced_data)

[[ 2.3577116  -0.75728867]
 [-2.27171739 -1.81970663]
 [ 1.21259114 -0.50390931]
 [-1.41935914  1.9229856 ]
 [ 1.61562536  0.87541857]
 [-1.49485157  0.28250044]]


In [13]:
print(reduced_data.shape)

(6, 2)


# *What are 2 positive effects and 2 negative effects of PCA

**Benefits**


1.   Dimensionality reduction: When PCA is used in a dataset, it will help reduce the features in a dataset and retain the most variation present. This makes it easier to visualize and then interpret. It will also make it more efficient by which storage requirements are reduced, it speeds it up which is beneficial for large datasets.
2.   It reduces noise in data: PCA helps identify and remove noise in the data which corresponds to the dimensions with least variance. By focusing on the dimensions with the highest variance, PCA helps in reducing the impact of noise, resulting in a cleaner and more meaningful representation of the data.

**Limitations**


1.   Information Loss: Even as PCA reduces the dimensionality of the data, it involves loss of information. The reduced dimensions may not fully capture all the variations present in the original data, leading to loss of details.
2.   Interpretability: When PCA is applied, the principal components (new dimensions) may not directly correspond to the original features, making it challenging to interpret the meaning of each component.





