__PCA (Principal Component Analysis)__

Link: https://builtin.com/data-science/step-step-explanation-principal-component-analysis

Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

![](https://user-images.githubusercontent.com/63338657/159118058-f52b161a-ecee-43f3-bbfe-0f12993d6cf3.png)

What are the application of dimensionality reduction?

Typically it is used in visualization and also in some cases where there is low computational power and we want to deploy the model. In that case we select top important features by doing PCA and then deploy the model.

---

__Geometric Intuition of PCA (simple example)__

Best resource: https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues/140579#140579

Link: https://youtu.be/FgakZw6K1QQ

![](https://user-images.githubusercontent.com/63338657/159118689-3377bd05-b90e-4177-bf73-6d8770929a19.png)

![](https://user-images.githubusercontent.com/63338657/159118793-516f85f7-a430-42b1-9c0a-3b7800d4e91f.png)

Another example

![](https://user-images.githubusercontent.com/63338657/159118913-1fe9f998-c73d-4946-8da1-dd2138d97893.png)

![](https://user-images.githubusercontent.com/63338657/159119261-1d52906c-2ce5-4dea-8b9c-6534e637d37e.png)

![](https://user-images.githubusercontent.com/63338657/159119375-76b7ad2a-8abb-4d6c-92aa-318d447dc38a.png)

Note: PCA never eliminates or drop any feature instead it constructs some new features which can explain more about the data.

---

__Mathematical Objective Function of PCA__

![](https://i.stack.imgur.com/Q7HIP.gif)

Please refer notes - Variance Maximization.

Please refer notes - Distance Minimization.

---

__Eigen Values and Eigen Vectors (PCA): Dimensionality Reduction__

![](https://user-images.githubusercontent.com/63338657/159149853-db2b4091-66b8-4a38-ad1f-a83fef5de52a.png)

![](https://user-images.githubusercontent.com/63338657/159149982-ccb22761-52b1-4a64-a159-686f9662d69d.png)

Every pair eigen vectors are perpendicular to each other which means the dot product is $0$.

![](https://user-images.githubusercontent.com/63338657/159150111-59557f5d-7c4e-43fe-aec1-171a0ed18259.png)

![](https://user-images.githubusercontent.com/63338657/159150198-9d88cd3d-4ffa-4a41-9e51-fca34f603ac0.png)

![](https://user-images.githubusercontent.com/63338657/159150301-b668f3e7-5d5f-4b36-89d7-2967a7b7a423.png)

![](https://user-images.githubusercontent.com/63338657/159150406-352443e9-36c1-463c-a838-5524f389ff68.png)

![](https://user-images.githubusercontent.com/63338657/159150581-949b4f2d-7b6c-4700-8ea6-31949f622f6e.png)

---

__PCA for Dimensionality Reduction and Visualization__

![](https://user-images.githubusercontent.com/63338657/159151032-c283222a-2b6e-40ba-92b5-6e16ea5f21f6.png)

---

__Summary__

- Principal Component Analysis (PCA) is a statistical techniques used to reduce the dimensionality of the data (reduce the number of features in the dataset) by selecting the most important features that capture maximum information about the dataset.
- The features are selected on the basis of variance that they cause in the output. Original features of the dataset are converted to the Principal Components which are the linear combinations of the existing features. The feature that causes highest variance is the first Principal Component. The feature that is responsible for second highest variance is considered the second Principal Component, and so on.
- In simple words, Principal Component Analysis is a method of extracting important features (in the form of components) from a large set of features available in a dataset.
- PCA finds the directions of maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information. By projecting our data into a smaller space, we’re reducing the dimensionality of our feature space.

---

__Advantages of Principal Component Analysis__

- Removes Correlated Features: In a real world scenario, this is very common that you get thousands of features in your dataset. You cannot run your algorithm on all the features as it will reduce the performance of your algorithm and it will not be easy to visualize that many features in any kind of graph. So, you MUST reduce the number of features in your dataset.
- You need to find out the correlation among the features (correlated variables). Finding correlation manually in thousands of features is nearly impossible, frustrating and time-consuming. PCA does this for you efficiently.
- After implementing the PCA on your dataset, all the Principal Components are independent of one another. There is no correlation among them.
- Improves Algorithm Performance: With so many features, the performance of your algorithm will drastically degrade. PCA is a very common way to speed up your Machine Learning algorithm by getting rid of correlated variables which don't contribute in any decision making. The training time of the algorithms reduces significantly with less number of features.
- So, if the input dimensions are too high, then using PCA to speed up the algorithm is a reasonable choice.
- Improves Visualization: It is very hard to visualize and understand the data in high dimensions. PCA transforms a high dimensional data to low dimensional data (2 dimension) so that it can be visualized easily.
- We can use 2D Scree Plot to see which Principal Components result in high variance and have more impact as compared to other Principal Components.
![](https://upload.wikimedia.org/wikipedia/commons/a/ac/Screeplotr.png)
- Even the simplest IRIS dataset is 4 dimensional which is hard to visualize. We can use PCA to reduce it to 2 dimension for better visualization.
- Consider a situation where we have 50 features (p = 50). There can be p(p-1)/2 scatter plots i.e. 1225 plots possible to analyze the variable relationships. It would be a tedious job to perform exploratory analysis on this data. That is why, we have to use PCA to get rid of this problem.

---

__Disadvantages of Principal Component Analysis__

![](https://user-images.githubusercontent.com/63338657/159152437-fdf5d56a-7ccc-43f5-ad32-d3212bea04c8.png)

- Independent variables become less interpretable: After implementing PCA on the dataset, your original features will turn into Principal Components. Principal Components are the linear combination of your original features. Principal Components are not as readable and interpretable as original features.
- Data standardization is must before PCA: You must standardize your data before implementing PCA, otherwise PCA will not be able to find the optimal Principal Components.
- For instance, if a feature set has data expressed in units of Kilograms, Light years, or Millions, the variance scale is huge in the training set. If PCA is applied on such a feature set, the resultant loadings for features with high variance will also be large. Hence, principal components will be biased towards features with high variance, leading to false results.
- Also, for standardization, all the categorical features are required to be converted into numerical features before PCA can be applied.
- PCA is affected by scale, so you need to scale the features in your data before applying PCA. Use StandardScaler from Scikit Learn to standardize the dataset features onto unit scale (mean = 0 and standard deviation = 1) which is a requirement for the optimal performance of many Machine Learning algorithms.
- Information Loss: Although Principal Components try to cover maximum variance among the features in a dataset, if we don't select the number of Principal Components with care, it may miss some information as compared to the original list of features.