In [1]:
import numpy as np 
import pandas as pd 
import seaborn as sns 
from matplotlib import pyplot as plt 

# ------------------------ ***Curse Of Dimensionality***---------------

<h3 style="color: red;">Definition:</h3>
<p>
  The curse of dimensionality refers to the various problems that arise when analyzing
  and organizing data in high-dimensional spaces that do not occur in low-dimensional settings.
</p>

<p>In simple terms, as the number of features (dimensions) increases:</p>
<ul>
  <li>Data becomes sparse.</li>
  <li>Distance measures become less meaningful.</li>
  <li>Models overfit more easily.</li>
  <li>Computation becomes more expensive.</li>
</ul>

<h3 style="color: blue;">Why is it a problem?</h3>
<ul>
  <li>
    <strong>Sparsity:</strong> In higher dimensions, data points are spread thinly.
    The more dimensions you add, the more data you need to maintain the same density.
  </li>
  <li>
    <strong>Distance Metrics Fail:</strong> Many ML algorithms (e.g., k-NN, clustering) rely on distance.
    But in high dimensions, distances between all points tend to become similar.
  </li>
  <li>
    <strong>Overfitting:</strong> More features can lead to models fitting noise rather than signal.
  </li>
  <li>
    <strong>Increased Computational Cost:</strong> Algorithms take longer and require more memory.
  </li>
</ul>


<h3 style="color: red;">✂️ Dimensionality Reduction</h3>

<h4 style="color: darkorange;">Definition:</h4>
<p>
  Dimensionality reduction is the process of reducing the number of random variables (features) under consideration
  by obtaining a set of principal features.
</p>

<h4 style="color: blue;">🧰 Why do we need it?</h4>
<ul>
  <li>Mitigate the curse of dimensionality</li>
  <li>Reduce overfitting</li>
  <li>Improve model performance</li>
  <li>Decrease training time</li>
  <li>Improve visualization</li>
</ul>

<h4 style="color: green;">✅ Types of Dimensionality Reduction:</h4>

<strong>1. Feature Selection (choose a subset of existing features):</strong>
<ul>
  <li>Filter methods (e.g., correlation, mutual information)</li>
  <li>Wrapper methods (e.g., recursive feature elimination)</li>
  <li>Embedded methods (e.g., LASSO)</li>
</ul>

<strong>2. Feature Extraction (create new features):</strong>
<ul>
  <li>PCA (Principal Component Analysis) – projects data to directions of maximum variance.</li>
  <li>t-SNE – good for visualization in 2D/3D.</li>
  <li>Autoencoders – neural networks that compress and reconstruct data.</li>
  <li>LDA (Linear Discriminant Analysis) – projects data to maximize class separability.</li>
</ul>

<h4 style="color: purple;">📊 Example of PCA:</h4>
<p>
  You have 4 features: height, weight, age, income.<br>
  But age and income might be strongly correlated.<br>
  PCA can combine them into 1 principal component that explains most of their joint variance.<br>
  So, instead of training on 4 features, you train on 2 or 3 principal components that carry most of the information.
</p>

<h4 style="color: teal;">🔍 Real-world Analogy:</h4>
<p>
  Imagine you are viewing a 3D object (e.g., a sculpture), but you're only allowed to take a photo from one angle.
  You choose the best angle that captures the most important features of the sculpture.<br>
  That’s what dimensionality reduction does — find the best "view" of your data in fewer dimensions.
</p>
