<a href="https://colab.research.google.com/github/Rishabh9559/Data_science/blob/main/Phase%202%3A%20Machine%20Learning%20for%20Data%20Science/Dimensionally_reduction_techniques/Dimensionally_reduction_techniques.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dimensionally_reduction_techniques

Dimension -> feature, Columns



### **What is Dimensionality Reduction?**

In Machine Learning and Data Science, **dimensionality reduction** means reducing the number of input variables (features) in a dataset while keeping as much useful information as possible.

* High-dimensional data (many features) → harder to analyze, visualize, and train models on.
* Dimensionality reduction helps simplify the dataset, improve performance, and reduce noise.



### **Why is it needed?**

* To remove irrelevant/noisy features.
* To **avoid the "curse of dimensionality"** (performance issues when data has too many features).
* To **visualize data** in 2D or 3D.
* To **reduce computation cost** for ML algorithms.



### **Types of Dimensionality Reduction Techniques**

1. **Feature Selection (choose important features)**

   * Instead of reducing dimensions mathematically, we just **pick the best subset of features**.
   * Examples:

     * Filter methods (Correlation, Chi-square test, ANOVA)
     * Wrapper methods (Forward selection, Backward elimination)
     * Embedded methods (LASSO, Decision Trees feature importance)



2. **Feature Extraction (transform data into new dimensions)**

   * We create **new features** by combining or transforming original features.
   * Techniques:

   **a) Principal Component Analysis (PCA)**

   * Converts correlated features into fewer **uncorrelated principal components**.
   * Keeps maximum variance (information).
   * Widely used for compression & visualization.

   **b) Linear Discriminant Analysis (LDA)**

   * Supervised technique.
   * Reduces dimensions by maximizing class separability.
   * Used in classification tasks.

   **c) t-SNE (t-Distributed Stochastic Neighbor Embedding)**

   * Non-linear technique for **visualizing high-dimensional data in 2D/3D**.
   * Preserves local structure (good for clustering visualization).

   **d) Autoencoders (Deep Learning)**

   * Neural networks that learn compressed (lower-dimensional) representations.
   * Useful for very high-dimensional data like images.

   **e) UMAP (Uniform Manifold Approximation and Projection)**

   * Like t-SNE, but faster and preserves global + local structures better.



### **Quick Example**

Imagine you have a dataset of **100 features** describing patients’ health.

* Many features are correlated (e.g., weight, BMI, waist size).
* Instead of training an ML model on all 100 features, we can apply PCA → reduce it to maybe **10 principal components** that still capture **90–95% of the information**.






## **Principal Component Analysis (PCA)**
**In short:**
PCA = A way to reduce many correlated features into fewer uncorrelated features (principal components), keeping as much variance (information) as possible.

### **What is PCA?**

* PCA is a **dimensionality reduction technique**.
* It transforms high-dimensional data into a **smaller set of uncorrelated variables** called **principal components**.
* These components capture the **maximum variance (information)** from the original data.
* PCA is an Unsupervised learning technique.



### **Key Idea**

* In a dataset, many features are often **correlated**.
* PCA finds new axes (directions) in such a way that:

  1. The **first principal component (PC1)** captures the maximum variance.
  2. The **second principal component (PC2)** is orthogonal (uncorrelated) to PC1 and captures the next highest variance.
  3. This continues until all variance is explained.

So instead of working with 100 correlated features, PCA may reduce it to **10 uncorrelated principal components**.



### **Steps in PCA**

1. **Standardize the data** (make features have mean = 0, variance = 1).
2. **Compute covariance matrix** (relationship between features).
3. **Find eigenvalues & eigenvectors** of covariance matrix.

   * Eigenvectors = directions of principal components.
   * Eigenvalues = amount of variance captured.
4. **Sort eigenvalues** (biggest first) → select top *k* components.
5. **Project data** onto the new *k* dimensions.



### **Example**

Suppose you have data with two features:

* Height (cm)
* Weight (kg)

They are correlated (taller people usually weigh more).

* PCA finds a **new axis (PC1)** along the direction of maximum variance (Height+Weight combination).
* Then, it may reduce to **1D data** instead of 2D.



### **Applications of PCA**

* **Data compression** (reduce storage and computation).
* **Noise reduction** (remove less informative dimensions).
* **Visualization** (reduce data to 2D or 3D for plotting).
* **Preprocessing** before machine learning to improve performance.



### **Advantages**

* Reduces dimensionality without much loss of information.
* Removes correlation (components are independent).
* Makes models faster & less prone to overfitting.

### **Limitations**

* PCA is **linear** (may not capture complex patterns).
* Harder to interpret principal components (they are combinations, not original features).
* Sensitive to scaling (always standardize data before PCA).
