## Introduction to Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a foundational technique in data analysis and machine learning for reducing the number of variables (features) in a dataset while retaining most of its essential information. It transforms the original data into new uncorrelated variables called principal components. These components represent the directions in which the data varies the most, allowing you to simplify high-dimensional data for easier analysis and modeling.

PCA is especially valuable when you're dealing with datasets with many features (columns), making it easier to visualize, analyze, and input into machine learning models—saving time and reducing the risk of overfitting

### Why Use PCA?

- Dimensionality Reduction: Lessens the number of input variables without losing much information.
- Pattern Discovery: Reveals the main patterns and relationships in the data.
- Noise Reduction: Helps remove less important variations (noise).
- Preprocessing: Often used before clustering or classification.

### How PCA Works (Step-by-Step)
Let's break down the main steps so you see how PCA transforms your data:

#### Step 1: Standardize Your Data
Variables may have different units or scales. For fairness, all features are converted to have **mean 0** and **standard deviation 1** (called standardizing or z-scoring):

$$Z = \frac{X - \mu}{\sigma}$$

Where $X$ is a value, $\mu$ the mean, and $\sigma$ the standard deviation of the feature.

### Step 2: Compute the Covariance Matrix
This matrix shows how much different features vary together (covariance). It's the mathematical foundation for finding principal components. Strong covariances point to redundant information.



### Step 3: Calculate Eigenvectors and Eigenvalues
* **Eigenvectors** of the covariance matrix show the directions (principal axes) where variance is maximized (think of finding new axes to “see” most spread).
* **Eigenvalues** show how much variance (information) each direction captures. The higher the value, the more important that direction.



### Step 4: Sort and Select Top Components
1.  Sort principal components by eigenvalues, descending.
2.  Decide how many components to keep (the first 2, 3, etc.), based on how much total variance you want to retain.



### Step 5: Project Data onto Principal Components
Your data is now represented in fewer dimensions by projecting it onto the new axes (principal components) you selected.



Mnemonic: **“Standardize → Covariance → Eigenvectors/values → Sort → Project”**

## Singular Value Decomposition

SVD is a mathematical method to break a data matrix $A$ into three matrices:

$$A = U \Sigma V^T$$

* $U$: Describes relationships among data points (rows)
* $\Sigma$: Diagonal matrix with singular values (importance)
* $V^T$: Describes relationships among features (columns)

SVD is used for data compression, noise reduction, and is a powerful way to mathematically implement PCA. In fact, most PCA implementations use SVD “under the hood!”

SVD can be seen as the machinery enabling PCA. The principal components that PCA discovers are essentially derived from SVD of your standardized data matrix.

When to Use PCA or SVD Directly?
* For numeric stability, SVD is used to compute PCA in practice.
* If you center the data (subtract mean) before SVD, the principal directions obtained by SVD and PCA match.

### Geometric and Practical Intuition
* Visualize PCA as finding new, rotated axes that capture the largest spread (variation) of your data.
* The first principal component is the line where projections of points are most spread out (direction of greatest variance).
* Each next component is perpendicular to the previous, capturing the next largest variance.
* Dimensionality reduction means keeping only the top $k$ principal components.

#### Example: Why Standardization and PCA Matter
Suppose you have people’s age (0–100) and income (in thousands, 0–200). If you skipped standardization, income’s wide range would dominate principal components, hiding important age patterns. After scaling, both features contribute fairly to PCA.

### Curse of dimensionality
The curse of dimensionality is a key challenge in machine learning and data analysis that arises when working with datasets containing a large number of features (dimensions).

The curse of dimensionality is why PCA and other dimensionality reduction approaches are so important: they help simplify complex data and make analysis and modeling more practical and accurate.

### Key Takeaways
* PCA is a method to reduce variables and find patterns in complex data.
* It works by discovering new axes (principal components) aligned to the strongest patterns in the data.
* Standardization is often required to ensure each feature contributes fairly.
* SVD is the main math engine for PCA in real-world applications.
* Choosing how many components to keep is a balance between simplicity and information retention.

In [None]:
# Sources: 
# [1](https://www.geeksforgeeks.org/data-analysis/principal-component-analysis-pca/)
# [2](https://builtin.com/data-science/step-step-explanation-principal-component-analysis)
# [3](https://www.ibm.com/think/topics/principal-component-analysis)
# [4](https://stackoverflow.com/questions/9590114/importance-of-pca-or-svd-in-machine-learning)
# [5](https://www.geeksforgeeks.org/machine-learning/singular-value-decomposition-svd/)
# [6](https://www.reddit.com/r/learnmachinelearning/comments/s66d63/what_is_singular_value_decomposition_svd_a/)
# [7](https://towardsdatascience.com/singular-value-decomposition-vs-eigendecomposition-for-dimensionality-reduction-fc0d9ac24a8e/)
# [8](https://www.reddit.com/r/MachineLearning/comments/4dkxm3/what_is_better_pca_or_svd/)
# [9](https://www.youtube.com/watch?v=gXbThCXjZFM)
# [10](https://www.reddit.com/r/statistics/comments/2yp3tl/can_someone_please_explain_principal_component/)
# [11](https://zilliz.com/glossary/curse-of-dimensionality-in-machine-learning)
# [12](https://telnyx.com/learn-ai/curse-of-dimensionality)
