<a id="table-of-contents"></a>
# 📖 Table of Contents

[📉 Dimensionality Reduction Overview](#dimensionality-reduction)  
- [🚧 Key Challenges in High Dimensions](#key-challenges)

[🗂️ Data Setup](#data-setup)  
- [🧾 Sample Data](#sample-data)

[📊 Principal Component Analysis (PCA)](#pca)  
- [⚙️ How PCA Works](#how-pca-works)  
- [📈 Scree Plot / Explained Variance](#explained-variance)  
- [🧭 Business Use Cases](#pca-business-use)

[🌌 t-SNE (t-Distributed Stochastic Neighbor Embedding)](#tsne)  
- [🔍 Intuition](#tsne-intuition)  
- [⚙️ How It Works](#how-tsne-works)  
- [🚫 Limitations](#tsne-limitations)  
- [🎯 When to Use](#when-to-use-tsne)  
- [📊 Visualizations](#tsne-visuals)

[🌐 UMAP (Uniform Manifold Approximation & Projection)](#umap)  
- [🔬 Intuition vs t-SNE](#umap-intuition)  
- [📈 Plot Interpretation](#umap-visuals)

[📐 Linear Discriminant Analysis (LDA)](#lda)  
- [🔢 How LDA Works](#how-lda-works)  
- [🧮 Step-by-Step Breakdown](#lda-steps)  
- [📉 Dimensionality Constraint](#lda-constraint)

<!-- [📌 Final Summary](#final-summary) -->

<!-- [❓ FAQ / Notes](#faq)  
- [🧠 When to Use What](#when-to-use-what)  
- [📏 Unsupervised vs Supervised Methods](#unsup-vs-sup)
 -->
<!-- <hr style="border: none; height: 1px; background-color: #ddd;" /> -->
[Back to the top](#table-of-contents)
___



<a id="dimensionality-reduction"></a>
# 📉 Dimensionality Reduction Overview

<details><summary><strong>📖 Explanation (Click to Expand) </strong></summary>
Dimensionality reduction refers to techniques that transform high-dimensional data into a lower-dimensional space — while preserving as much **useful structure or signal** as possible.

These methods are valuable across both modeling and business contexts:

- 🔄 **Simplifying data** for faster computation and easier storage  
- 📉 **Reducing overfitting** by eliminating noise or redundant features  
- 👀 **Visualizing hidden structure** in 2D or 3D  
- 📊 **Improving model interpretability** by focusing on key components

This notebook covers several popular approaches — PCA, t-SNE, UMAP, and LDA — each with distinct goals, assumptions, and business use cases.
</details>

<a id="key-challenges"></a>
#### 🚧 Key Challenges in High Dimensions

<details><summary><strong>📖 Explanation (Click to Expand) </strong></summary>
High-dimensional datasets (e.g., 100+ features) often seem rich, but pose several practical issues:

##### ❌ Curse of Dimensionality
- Distance metrics (like Euclidean) lose meaning
- Feature space becomes sparse — hard to model effectively
- Models require exponentially more data to generalize

##### 💻 Computational Overhead
- More dimensions = higher training time
- Resource-intensive for models like clustering or k-NN

##### 👁️ Visualization Limitations
- Human intuition maxes out at 3D — we need projection techniques to reveal structure

Dimensionality reduction helps address these problems by transforming the data to **lower, information-rich representations** — enabling both insight and performance.
</details>


[Back to the top](#table-of-contents)
___



<a id="data-setup"></a>
# 🗂️ Data Setup

<a id="sample-data"></a>
#### 🧾 Sample data
<details><summary><strong>📖 Explanation (Click to Expand) </strong></summary>

We'll simulate a retail scenario where each row represents a customer, and each column captures a behavioral signal — such as purchase frequency, monetary value, or product interaction features.

The goal is to apply dimensionality reduction techniques to:
- Identify underlying **customer personas**
- Compress features for **modeling efficiency**
- Enable **2D/3D visualization** of customer segments or churn patterns
</details>


In [42]:
from sklearn.datasets import load_digits
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [53]:
import numpy as np
import pandas as pd

np.random.seed(42)

# Number of synthetic customers
n_customers = 3000

# Base RFM-like features
recency = np.random.exponential(scale=10, size=n_customers)
frequency = np.random.poisson(lam=3, size=n_customers)
monetary = np.random.gamma(shape=2, scale=100, size=n_customers)

# Additional customer behavior features
cart_additions = np.random.poisson(5, n_customers)
wishlist_items = np.random.poisson(2, n_customers)
page_views = np.random.normal(loc=10, scale=3, size=n_customers).clip(min=1)
returns = np.random.binomial(n=1, p=0.2, size=n_customers)
promo_clicks = np.random.poisson(2, n_customers)
avg_discount = np.random.uniform(0, 0.5, n_customers)
days_since_last_visit = np.random.exponential(scale=15, size=n_customers)

# Create DataFrame
df = pd.DataFrame({
    'recency': recency,
    'frequency': frequency,
    'monetary': monetary,
    'cart_additions': cart_additions,
    'wishlist_items': wishlist_items,
    'page_views': page_views,
    'returns': returns,
    'promo_clicks': promo_clicks,
    'avg_discount': avg_discount,
    'days_since_last_visit': days_since_last_visit
})

# Optional: simulate segment labels (for LDA later)
df['segment'] = np.select(
    [
        (df['monetary'] > 200) & (df['frequency'] > 4),
        (df['monetary'] < 80) & (df['recency'] > 15),
    ],
    ['High-Value', 'Churn-Risk'],
    default='Mid-Tier'
)

# Summary
print(f"Shape of dataset: {df.shape}")
df.head()


Shape of dataset: (3000, 11)


Unnamed: 0,recency,frequency,monetary,cart_additions,wishlist_items,page_views,returns,promo_clicks,avg_discount,days_since_last_visit,segment
0,4.692681,4,79.537238,8,1,4.777712,0,2,0.318244,1.624589,Mid-Tier
1,30.101214,2,508.446431,7,1,14.631486,0,3,0.479756,38.088446,Mid-Tier
2,13.167457,1,124.469273,3,1,13.715482,0,4,0.324193,0.520162,Mid-Tier
3,9.129426,2,48.842151,2,3,7.340231,1,2,0.478197,54.146812,Mid-Tier
4,1.696249,4,244.763028,5,1,7.365324,0,3,0.15229,5.704107,Mid-Tier


[Back to the top](#table-of-contents)
___


<a id="pca"></a>
# 📊 Principal Component Analysis (PCA)
<details><summary><strong>📖 Explanation (Click to Expand) </strong></summary>
Principal Component Analysis (PCA) is a linear dimensionality reduction technique that transforms high-dimensional data into a smaller set of uncorrelated variables called **principal components** — while retaining as much variability (information) as possible.

It is widely used for:
- 📉 **Feature compression** without losing signal
- 📊 **Data visualization** in 2D/3D
- 🧹 **Noise reduction** and decorrelation
- 📦 **Preprocessing step** for clustering or modeling

</details>


<a id="how-pca-works"></a>
#### ⚙️ How PCA Works

<details><summary><strong>📖 Explanation (Click to Expand) </strong></summary>
##### Step-by-step breakdown:
1. **Standardize the data**  
   Features are scaled to ensure no variable dominates due to its unit or magnitude.

2. **Compute the covariance matrix**  
   This captures how features vary together.

3. **Extract eigenvectors and eigenvalues**  
   Eigenvectors define new axes (principal components).  
   Eigenvalues tell how much variance each axis explains.

4. **Rank and select top components**  
   Keep components that explain the most variance — e.g., 95%.

5. **Project data**  
   The original data is rotated and projected onto this new set of axes.

PCA finds **directions of maximum variance**, not necessarily the most “important” features. It’s unsupervised and doesn’t consider class labels or outcomes.
</details>



<a id="explained-variance"></a>
#### 📈 Scree Plot / Explained Variance

<details><summary><strong>📖 Explanation (Click to Expand) </strong></summary>

A key part of PCA interpretation is deciding **how many components to keep**. This is usually done using the **explained variance ratio**:

- The **first few components** typically explain most of the variability.
- A **scree plot** shows the contribution of each component.
- A **cumulative plot** helps identify the point of diminishing returns.

Business intuition:
> “How many latent customer behaviors do we need to explain 90–95% of everything we see?”

We’ll visualize this using the scree and cumulative variance plots.

</details>


<a id="pca-business-use"></a>
#### 🧭 Business Use Cases


[Back to the top](#table-of-contents)
___



<a id="tsne"></a>
# 🌌 t-SNE (t-Distributed Stochastic Neighbor Embedding)



<a id="tsne-intuition"></a>
#### 🔍 Intuition



<a id="how-tsne-works"></a>
#### ⚙️ How It Works



<a id="tsne-limitations"></a>
#### 🚫 Limitations



<a id="when-to-use-tsne"></a>
#### 🎯 When to Use



<a id="tsne-visuals"></a>
#### 📊 Visualizations



[Back to the top](#table-of-contents)
___



<a id="umap"></a>
# 🌐 UMAP (Uniform Manifold Approximation & Projection)



<a id="umap-intuition"></a>
#### 🔬 Intuition vs t-SNE



<a id="umap-visuals"></a>
#### 📈 Plot Interpretation



[Back to the top](#table-of-contents)
___



<a id="lda"></a>
# 📐 Linear Discriminant Analysis (LDA)



<a id="how-lda-works"></a>
#### 🔢 How LDA Works



<a id="lda-steps"></a>
#### 🧮 Step-by-Step Breakdown



<a id="lda-constraint"></a>
#### 📉 Dimensionality Constraint



[Back to the top](#table-of-contents)
___



<!-- <a id="final-summary"></a>
# 📌 Final Summary

 -->

<!-- [Back to the top](#table-of-contents)
___

 -->

[🔗 Canonical Correlation Analysis (CCA)](#cca)  
- [🧬 When It’s Useful](#when-to-use-cca)  
- [🔄 Intuition: PCA for Two Views](#cca-intuition)  
- [📉 Dimensionality Reduction via Correlation](#cca-dim-reduction)


<!-- <a id="faq"></a>
# ❓ FAQ / Notes

 -->

<!-- <a id="when-to-use-what"></a>
#### 🧠 When to Use What

 -->

<!-- <a id="unsup-vs-sup"></a>
#### 📏 Unsupervised vs Supervised Methods
 -->

<!-- [Back to the top](#table-of-contents)
___

 -->