<div align="justify">

When working with machine learning models, datasets with too many features can cause issues like slow computation and overfitting. Dimensionality reduction helps to reduce the number of features while retaining key information. Techniques like __principal component analysis (PCA)__, __singular value decomposition (SVD)__ and __linear discriminant analysis (LDA)__ convert data into a lower-dimensional space while preserving important details.

__Example:__ when you are building a model to predict house prices with features like bedrooms, square footage and location. If you add too many features such as room condition or flooring type, the dataset becomes large and complex.

</div>

<div align="center">

![](../images/image1.png)

![](../images/image2.png)

![](../images/image3.png)

</div>

### __How Dimensionality Reduction Works?__

<div align="justify">

Lets understand how dimensionality Reduction is used with the help of example. Imagine a dataset where each data point exists in a 3D space defined by axes X, Y and Z. If most of the data variance occurs along X and Y then the Z-dimension may contribute very little to understanding the structure of the data.

</div>

<div align="center">

![](../images/how-dimensionality-reduction-works.jpg)

</div>

<div align="justify">

- Before Reduction You can see that Data exist in 3D (X,Y,Z). It has high redundancy and Z contributes little meaningful information
- On the right after reducing the dimensionality the data is represented in __lower-dimensional spaces__. The top plot (X-Y) maintains the meaningful structure while the bottom plot (Z-Y) shows that the Z-dimension contributed little useful information.

This process makes data analysis more efficient, improving computation speed and visualization while minimizing redundancy

</div>

### __Dimensionality Reduction Techniques__

<div align="justify">

Dimensionality reduction techniques can be broadly divided into two categories:

</div>

#### __1. Feature Selection__

<div align="justify">

Feature selection chooses the most relevant features from the dataset without altering them. It helps remove redundant or irrelevant features, improving model efficiency. Some common methods are:

- __Filter methods__ rank the features based on their relevance to the target variable.
- __Wrapper methods__ use the model performance as the criteria for selecting features.
- __Embedded methods__ combine feature selection with the model training process.

</div>

#### __2. Feature Extraction__

<div align="justify">

Feature extraction involves creating new features by combining or transforming the original features. These new features retain most of the dataset’s important information in fewer dimensions. Common feature extraction methods are:

1. __Principal Component Analysis (PCA):__ Converts correlated variables into uncorrelated 'principal components, reducing dimensionality while maintaining as much variance as possible enabling more efficient analysis.
2. __Missing Value Ratio:__ Variables with missing data beyond a set threshold are removed, improving dataset reliability.
3. __Backward Feature Elimination:__ Starts with all features and removes the least significant ones in each iteration. The process continues until only the most impactful features remain, optimizing model performance.
4. __Forward Feature Selection:__ Forward Feature Selection Begins with one feature, adds others incrementally and keeps those improving model performance.
5. __Random Forest:__ Random forest Uses decision trees to evaluate feature importance, automatically selecting the most relevant features without the need for manual coding, enhancing model accuracy.
6. __Factor Analysis__: Groups variables by correlation and keeps the most relevant ones for further analysis.
7. __Independent Component Analysis (ICA):__ Identifies statistically independent components, ideal for applications like ‘blind source separation’ where traditional correlation-based methods fall short.

</div>