![image.png](attachment:b3f45de1-84da-436b-b915-1cb1fa79a770.png)

Good \[morning/afternoon], everyone.

Welcome to today's presentation on **UMAP**, which stands for **Uniform Manifold Approximation and Projection**. This topic is part of the **ML2: AI Concepts and Algorithms** course for the Summer Semester 2025, offered by the **Faculty of Computer Science and Applied Mathematics** at the **University of Applied Sciences Technikum Wien**.

This presentation was prepared by:

* **B. Knapp**,
* **M. Blaickner**,
* **S. Rezagholi**, and
* **R.O. Gomes**,
  under the guidance of our lecturer, **Rosana de Oliveira Gomes**.

Let’s now begin exploring the foundations, motivations, and applications of UMAP in the context of machine learning and dimensionality reduction.

Please go ahead and share the next slide.


![image.png](attachment:f7c9146e-ce6d-4433-8899-54ae6677c6ac.png)

Let’s begin by placing UMAP within the broader context of Artificial Intelligence.

This slide shows an overview of **key areas within AI**, divided into categories like:

* **Supervised Learning**, which includes:

  * **Regression** (e.g., Linear Regression, Ridge/Lasso, Neural Networks)
  * **Classification** (e.g., Logistic Regression, SVM, Random Forest, Boosting)

* **Reinforcement Learning**, which is covered separately.

* **Data Handling**: Tasks like exploratory data analysis (EDA), cleaning, feature selection, and class balancing.

Now let’s focus on the **Non-Supervised Learning** branch, which is marked in red.

It includes two main areas:

1. **Clustering** (e.g., K-means, Hierarchical Clustering, DBSCAN)
2. **Dimensionality Reduction** – and that’s where **UMAP** comes in.

UMAP is listed alongside **PCA**, **SVD**, and **t-SNE** as a technique used to **reduce the dimensionality** of complex datasets. These methods help us visualize and analyze high-dimensional data by projecting it into a lower-dimensional space while preserving meaningful structure.

So in this presentation, we’ll explore how UMAP achieves this goal—and why it is often preferred over other methods like PCA or t-SNE.

Let’s move on to the next slide.


![image.png](attachment:dca58402-f5f7-49ea-976a-41b869c54572.png)

Before we dive deeper into UMAP, let’s quickly recap what **Dimensionality Reduction** is all about.

---

### 🔺 The Problem: The Curse of Dimensionality

As we work with datasets that have **many features or variables**, we run into what’s called the **curse of dimensionality**.
This refers to the **increasing complexity and sparsity** of data as the number of dimensions grows—making it harder for machine learning algorithms to operate effectively.

---

### 🔬 High-Dimensional Data Examples:

Some real-world systems that involve high-dimensional data include:

* **Genomics** (thousands of genes),
* **Environmental science** (multiple sensor readings),
* **Natural Language Processing** (word embeddings in hundreds of dimensions),
* **Customer segmentation** (many demographic and behavioral features).

---

### 🧠 What Is Dimensionality Reduction?

Dimensionality reduction is the process of:

> 🔽 *Transforming high-dimensional data into a low-dimensional space, while trying to **preserve the original structure and relationships** of the data as much as possible.*

This is especially useful for:

* **Visualization** (e.g., plotting in 2D/3D),
* **Noise reduction**,
* **Speeding up computations**,
* **Improving generalization**.

---

### ⚙️ Common Algorithms:

* **PCA (Principal Component Analysis)** – a linear method that projects data along directions of maximum variance.
* **t-SNE** – a non-linear method that focuses on preserving local structure.
* **UMAP** – the focus of today’s talk, designed to preserve both local and global structure more efficiently than t-SNE.

---

🟠 Now that we’ve framed the problem, let’s take a closer look at **how UMAP works**. Please go ahead with the next slide.


![image.png](attachment:3a5cfa93-d60c-4c5b-bb5a-1d52c612decb.png)

Before we fully explore UMAP, let’s briefly recap **Principal Component Analysis (PCA)**, one of the most well-established dimensionality reduction techniques.

---

### 📌 What is PCA?

PCA **captures the essence of the data** by projecting it into a smaller set of **principal components (PCs)**—usually just **2 to 5**.

These components are:

* **Linear combinations** of the original features,
* **Orthogonal** (statistically independent),
* And **ranked** by the amount of variance they explain in the data.

---

### 🧩 What Are Principal Components?

* **Summarize patterns**: Each PC captures a direction of maximal variance in the data.
* **Uncorrelated**: PCs are mathematically constructed to be independent of each other.
* **Constructed from original variables**: Each PC is a weighted sum of the original features (as seen in the bottom-right table).

---

### 📊 On the Right Side:

* The bar chart shows that **PC1 explains the most variance**, followed by PC2, and so on. Typically, the first 2–3 PCs can already capture most of the dataset’s information.
* The tables demonstrate how **original features** are transformed into PCs (loadings) and how **individual samples** are then described using those PCs.

---

PCA is very efficient, fast, and useful when the data relationships are **linear**.
But when we deal with **non-linear structures**, PCA falls short.

This limitation leads us to more powerful, **non-linear** methods—such as **t-SNE** and especially **UMAP**, which we’ll be exploring next.

Let’s continue with the next slide.


![image.png](attachment:3f900292-ef0b-447e-9efe-a9c36e63678d.png)

Continuing with our recap of **Principal Component Analysis**, this slide gives us a **visual intuition** behind how PCA works.

---

### 🎯 What Are Loadings?

The **contribution of each variable** to the principal components is indicated by their **loadings**.

* Think of loadings as **directional arrows** that show how much a variable influences a particular principal component.
* Variables that point in the **same direction** tend to be **positively correlated**.
* Those pointing in **opposite directions** are **negatively correlated**.

---

### 🖼 Interpreting the Left Plot:

* This is a **loading plot** that visualizes the contributions of variables to PC1 and PC2.
* For example, variables like **Obesity**, **BMI**, and **Greasy diet** are clustered together in the upper-right quadrant, suggesting they all load heavily on PC1 and are correlated with each other.
* On the other hand, **Resting heart rate** and **Frequent exercise** are in the bottom-left quadrant, potentially anti-correlated with the first group.

---

### 📉 Interpreting the Right Plot:

* Here we see how PCA projects the original high-dimensional data onto a **2D plane** spanned by PC1 and PC2.
* Each dot represents a sample (or observation).
* The **dashed ellipse** shows the spread of the data along the directions of greatest variance—PC1 and PC2.

---

So PCA not only helps reduce dimensions, but also helps **reveal structure** in the data—by identifying patterns and correlations among variables.

That said, PCA assumes **linear relationships**. If the structure in your data is non-linear, PCA may miss it.
This is where non-linear methods like **UMAP** truly shine.

Let’s proceed to the next slide to begin exploring UMAP in detail.


![image.png](attachment:d677f45f-4668-4c78-8bed-f797d2423460.png)

This final PCA recap slide helps illustrate the **practical outcome** of a PCA transformation:

---

### 🧭 What Does the Plot Show?

* Each **dot** represents one observation (or data point).
* The two axes, **PC1** and **PC2**, are the first two principal components, capturing the most variance in the dataset.
* Points that are **close together** in this 2D space have **similar profiles** across the original high-dimensional features.

---

### 🟢 Cluster Interpretation:

* PCA is often used **before clustering** (e.g., K-means, DBSCAN), or **to visualize clusters** in a simpler space.
* Here, we see three natural groupings—marked by color—that show **how PCA can reveal structure** in the data.
* This type of plot is often used in **bioinformatics**, **customer segmentation**, or **market analysis** to visually identify **subgroups** in a population.

---

### 📌 Summary of PCA So Far:

* PCA is a **linear**, fast, and powerful tool.
* It can capture **global patterns** and provide **interpretable visualizations**.
* But… it assumes **linear relationships**, and can **miss non-linear manifolds** or **local structures** in the data.

That’s exactly where **UMAP** comes in—offering **non-linear, topology-preserving dimensionality reduction**.

Let’s now turn our attention to UMAP and see what makes it such a powerful alternative. Please proceed with the next slide.


![image.png](attachment:697a5cec-77fa-4fe2-93ea-77c9bad1622a.png)

Before we introduce UMAP, let’s quickly recap **t-SNE** — one of the most popular non-linear dimensionality reduction techniques.

---

### 🔍 What is t-SNE?

**t-SNE** stands for **t-distributed Stochastic Neighbor Embedding**.
It was introduced by **van der Maaten and Hinton (2008)** and is designed to project high-dimensional data into a **2D or 3D space** for visualization, especially when the data has **non-linear structure**.

---

### 🔄 How Does It Work?

1. **Measure pairwise similarities** between all points in high-dimensional space.
2. Convert those similarities into **probabilities**: how likely are two points to be neighbors?
3. Project the data to lower dimensions (e.g., 2D).
4. Use **gradient descent** to ensure the **low-dimensional similarities** match the **original high-dimensional ones**.

---

### 🧩 Key Parameter: Perplexity

* Think of **perplexity** as a knob that controls how many neighbors each point should consider.
* **Low perplexity** emphasizes **local structure** (tight clusters).
* **High perplexity** gives a more **global view** but may blend clusters.

This is visualized in the plots on the left:

* The left panel (low perplexity) shows tight clusters but possibly overemphasized local separation.
* The right panel (high perplexity) offers a more blended distribution.

---

### 🤔 Pros and Cons

✅ *Strengths*:

* Captures complex, non-linear structures.
* Excellent for **visualizing clusters**.

❌ *Limitations*:

* Doesn’t preserve global distances well.
* Results can **vary** between runs (non-deterministic).
* **Computationally expensive** on large datasets.

---

This sets the stage for **UMAP**, which retains the strengths of t-SNE but aims to **address these limitations**—being faster, more consistent, and more faithful to both **local and global** structure.

Let’s move on to UMAP in the next slide.


![image.png](attachment:a86d8704-2412-4ee7-9c1a-eea30a204ace.png)

This side-by-side comparison shows the **visual impact of PCA vs. t-SNE** on the **Fashion MNIST dataset**, a classic benchmark in machine learning that includes grayscale images of clothing items across 10 categories.

---

### 📊 PCA Plot (Left):

* The **PCA projection** is smooth and continuous.
* Data points are spread along major directions of **global variance**.
* However, **clusters are not well separated**—many categories blend together.
* It reflects **linear structure** and is good for **quick overview** or **preprocessing**, but lacks fine detail.

---

### 🎨 t-SNE Plot (Right):

* The **t-SNE projection** shows **clearly defined clusters**, each representing a different clothing class (e.g., T-shirt, Pullover, Sandal).
* t-SNE emphasizes **local neighborhood structure**, grouping similar observations closely.
* However, the **distances between clusters** don’t always reflect true global relationships.

---

### 🧠 Key Insight:

* PCA is better at preserving **global geometry**, but **fails to separate non-linear clusters**.
* t-SNE excels at **cluster separation** and local relationships, making it ideal for **exploratory visualization**.

---

This comparison illustrates why we sometimes need **non-linear methods**. But even t-SNE has downsides:

* It’s slow on large datasets.
* It lacks reproducibility across runs.
* It doesn’t preserve global structure well.

That’s why we now turn to **UMAP**, a newer method that offers the **best of both worlds**:
→ speed, reproducibility, local + global structure.

Let’s move to the next slide to explore UMAP.


![image.png](attachment:c3d578f0-45f2-4e74-b0a0-c4b5e4addf71.png)

We now arrive at the centerpiece of this presentation: **UMAP**, which stands for **Uniform Manifold Approximation and Projection**.

---

### 🧪 What Is UMAP?

UMAP is a **non-linear dimensionality reduction technique** introduced by **Leland McInnes, John Healy, and James Melville** in 2018.

Its **core motivation** is:

> To enable the **visualization** of high-dimensional data in 2D or 3D
> while preserving both **local clusters** and **global structure**,
> using concepts from **topology and manifold theory**.

---

### 🌐 The Main Idea

UMAP assumes that:

1. High-dimensional data **lies on a lower-dimensional manifold**—a curved surface embedded in higher space.
2. This manifold is **Riemannian**—meaning the data is **uniformly distributed** along it.

These assumptions allow UMAP to build a **graph-based representation** of the data that captures **topological relationships**, and then **optimize** a lower-dimensional layout that preserves those relationships.

---

### 📊 Visual Example

* On the **left**, we have points A–F in high-dimensional space.
* On the **right**, UMAP rearranges them into a 1D embedding.
* The goal is to **preserve cluster integrity** (e.g., a/b/c grouped, d/e/f grouped) and also reflect their **relative distances**.

---

### 🌀 The Manifold Perspective

* The **orange torus** shown at the bottom right represents a **manifold**—a curved but mathematically structured surface.
* UMAP’s core idea is that data often lives on such surfaces, and if we can “unwrap” them carefully, we can represent the data faithfully in fewer dimensions.

---

### 🚀 Why UMAP?

Compared to t-SNE, UMAP:

* Preserves **both local and global structure** better,
* Is **much faster**,
* Scales to **large datasets**,
* And offers **reproducible** results (if the seed is fixed).

Let’s now dive into **how** UMAP achieves this—on the next slide.


![image.png](attachment:5ec5e572-0409-4331-8fdd-84cc182c9797.png)

Now let’s break down the **UMAP algorithm step by step**.

---

### 🧱 Step 1: Construct a High-Dimensional Graph

UMAP begins by building a **graph in high-dimensional space**.
Each node is a data point, and edges connect **nearest neighbors**, weighted by **distance-based similarity**.

For example:

* Looking at **point A**, we compute distances to its neighbors: B (0.5), C (2.4), D (10.0), etc.
* The closer the neighbor, the stronger the edge in the graph.

---

### 📌 How Are Weights Assigned?

UMAP uses a technique similar to t-SNE, but more refined:

* It calculates **local distances and weights** for each data point to its k-nearest neighbors.
* These are used to build a **fuzzy simplicial complex**—essentially, a soft-edged graph capturing **local topology**.

This process is shown in the center image at the top:

* Nodes are connected based on how close they are.
* Overlapping neighborhoods indicate **local structure**.

---

### 📉 Step 2: Low-Dimensional Embedding

UMAP then **projects this high-dimensional graph into 2D or 3D**, trying to:

> **Preserve local structure** — that is, keep close neighbors in the high-dimensional space close in the lower-dimensional one.

As shown in the image on the bottom right:

* A remains close to B (0.5),
* But distant points like E and F get pushed far away.

This step involves:

* A **cost function** that encourages **similar points to stay close**,
* And **dissimilar points to repel**, much like t-SNE—but using **cross-entropy loss** instead of KL divergence.

---

### 🛠 Hyperparameters Matter

* The shape of the final curve and neighborhood structure depends on hyperparameters like:

  * `n_neighbors` (how many neighbors to consider),
  * `min_dist` (how tightly points are packed),
  * and others that we’ll discuss shortly.

---

### 🔁 In Summary

UMAP:

1. Learns the data’s shape by constructing a **topological graph**,
2. Optimizes a **low-dimensional embedding** that preserves that shape.

It’s **fast**, **scalable**, and produces **reproducible** results—perfect for exploratory data analysis and embedding tasks.

Let’s continue to the next slide to explore UMAP’s hyperparameters and how they affect the output.


![image.png](attachment:9df6864d-904f-46cc-b23f-6a82b15b27a6.png)

Let’s now look at the **final step of the UMAP algorithm**—how it performs **optimization** to align high-dimensional and low-dimensional structures.

---

### 🔗 Goal Recap:

UMAP has now constructed:

* A **graph in high-dimensional space**, using similarity scores `fᵢⱼ`,
* And a **graph in low-dimensional space**, using similarity scores `gᵢⱼ`.

The key goal:

> **Make these graphs match**:
> Points that are close in high-dim space should stay close in low-dim space.

---

### ⚙️ Step 3: Optimization

UMAP minimizes a **cross-entropy loss function** between the two graphs.

#### 🧾 The Loss Function:

$$
\text{Loss} = \sum_{i \ne j} f_{ij} \log\left(\frac{f_{ij}}{g_{ij}}\right) + (1 - f_{ij}) \log\left(\frac{1 - f_{ij}}{1 - g_{ij}}\right)
$$

* $f_{ij}$: similarity between point *i* and *j* in the **high-dimensional** space
* $g_{ij}$: similarity between point *i* and *j* in the **low-dimensional** space

This formula ensures:

* If two points are **similar in high dimensions**, they are **pulled closer** in the embedding.
* If they are **not similar**, they are **pushed apart**.

---

### 🎯 Why Cross-Entropy?

* Cross-entropy is widely used for **matching probability distributions**.
* UMAP treats the two graphs as **fuzzy probability distributions** of neighborhoods, and tries to minimize the **divergence** between them.

---

### 💡 Summary

* **Step 1**: Build a graph based on neighborhood similarities in high dimensions.
* **Step 2**: Initialize a layout in low dimensions.
* **Step 3**: Optimize the layout to minimize loss using **stochastic gradient descent**, aligning the two graphs.

With this, UMAP produces **a compact, faithful, and topology-preserving low-dimensional embedding**.

Let’s move to the next slide to explore the key **hyperparameters** that give us control over UMAP’s behavior.


![image.png](attachment:24bccb7f-c986-4b14-8648-8526aa1d3a84.png)

To understand how UMAP builds its internal representation, this slide explains how it constructs a **fuzzy graph in high-dimensional space**—a crucial first step in the algorithm.

---

### 🧩 Step-by-Step: High-Dimensional Fuzzy Graph Construction

---

#### **1. Position the Data Points**

The input data already exists in a **multi-dimensional space** (e.g., 50D from word embeddings, or 784D from flattened images).

Each data point is initially just a vector in this space.

---

#### **2. Define Radii Around Each Point**

UMAP draws a **radius** around each point—this defines a **local neighborhood**.

* But unlike fixed-radius methods, UMAP adapts the radius **per point**, depending on **local density**.
* This makes UMAP more robust to **non-uniformly distributed data**.

Two points are connected **if their radii overlap**, forming the **edges of the graph**.

🔴 **Note**: Radii are **not fixed**; they vary to capture **adaptive neighborhood sizes**.

---

#### **3. Build the Fuzzy Graph**

This graph is “fuzzy” because:

* Edges are **weighted with a probability** (not binary connections).
* That weight reflects how **strong the relationship** is between two points.
* As the distance between points increases, the **probability of connection decreases**.

In the lower image:

* The **pink point** has stronger probability connections with **orange** points (closer),
* And weaker connections to **blue** or **green** points (further away).

This step results in a **fuzzy simplicial complex**, capturing the local topological structure of the data.

---

### 🧠 Why This Matters

This fuzzy graph is the **foundation** that UMAP then projects into lower dimensions.
Preserving the **structure and probabilities** of this graph is the goal of the embedding optimization.

Next, we’ll explore the **hyperparameters** that let us control this process. Please continue with the next slide.


![image.png](attachment:a0864e29-5dc0-48f1-91f9-27ba9ae33042.png)

Now let’s look at the **first key hyperparameter** in UMAP: `n_neighbors`.

---

### 🔧 `n_neighbors`: Controls Local Connectivity

This parameter determines **how many neighbors** each point considers when building the high-dimensional graph.

#### 📐 Effects on Radius and Graph:

* A **larger `n_neighbors`** leads to **larger radii** around each point (shown on the right).
* More neighbors means **more connections** in the graph, capturing **more global structure**.
* A **smaller value** (left side) keeps neighborhoods **tight and local**, preserving fine structure and dense clusters.

---

### 🖼 Visual Explanation

* With `n_neighbors = 2`, the graph is **sparser**—points are only connected to a few others. Clusters are more **segmented**.
* With `n_neighbors = 4`, the **radii are larger**, and there’s more **overlap** between local neighborhoods. This blends local structures with more **global continuity**.

---

### 📊 Practical Guidelines

* **Smaller values** (e.g., 5–15): good for **fine cluster detail** (e.g., cell types in biology).
* **Larger values** (e.g., 30–100): good for **capturing macro patterns** (e.g., broad topic clusters in text).

---

### 🧠 Summary

* `n_neighbors` shapes the **locality vs. globality** trade-off in UMAP.
* It's directly linked to the **graph construction step**—the very first phase of the algorithm.
* It also affects **runtime and output structure**, so tuning it carefully is important for good embeddings.

Let’s now move to the next slide to explore the second major hyperparameter: `min_dist`.


![image.png](attachment:3e820d6e-17cf-461e-9533-36a3f4fd4f81.png)

This slide gives a **visual comparison** of how different values for `n_neighbors` affect the **structure of the UMAP embedding**.

---

### 📌 Left: **Low `n_neighbors`**

* Focus is on **local relationships**.
* Small, **tight clusters** are formed.
* Data points group closely based on **fine-grained similarities**.
* This setting is ideal when you want to detect **micro-structures** or subpopulations.

---

### 📌 Right: **High `n_neighbors`**

* Focus shifts to **global relationships**.
* Clusters may be more **spread out** and **globally ordered**.
* It better captures **macro patterns** and **overall geometry** of the dataset.
* Useful when you want a **holistic view** of how groups relate to each other.

---

### 🧠 Key Insight

> **`n_neighbors` acts as a scale controller**:
>
> * **Low values** → local fidelity, more granular
> * **High values** → global structure, less cluster separation

And remember:

* The point itself is **included** in its neighborhood count.
* The **default** in many UMAP implementations is `n_neighbors = 15`.

---

With this, we’ve completed our deep dive on the `n_neighbors` hyperparameter.

Next, we’ll look at another powerful hyperparameter in UMAP: `min_dist`, which controls how **tightly points are packed** in the embedding.

Please continue with the next slide.


![image.png](attachment:00b2ea2a-5432-4c6b-950d-787b2ad24df2.png)

Let’s now focus on another **crucial UMAP hyperparameter**: `min_dist`.

---

### 🔧 What is `min_dist`?

`min_dist` controls the **minimum spacing allowed** between points in the **low-dimensional space** (e.g., the 2D embedding).

Think of it as a **compression limit**:

* **Low `min_dist`** → points can pack tightly together
* **High `min_dist`** → points are kept more spread apart

---

### 📊 Why is it important?

This parameter directly affects the **visual appearance of clusters** in your embedding:

* A **lower `min_dist`** emphasizes **local density** and makes **tight, well-separated clusters**.
* A **higher `min_dist`** creates more **space between points**, helping to highlight **larger global trends** or relationships.

---

### 🖼 Illustration

In this slide:

* On the **left**, with **low `min_dist`**, points are **clustered tightly**, emphasizing **fine detail**.
* On the **right**, with **higher `min_dist`**, points are **more spread out**, showing a **broader structure**.

This step is part of the **projection & optimization phase** (highlighted top-right in red), where UMAP places points in the 2D plane.

---

### ⚙️ Summary

* `min_dist` affects the **final layout**, not the neighborhood graph.
* Use **low values (e.g., 0.1)** when you want **compact clusters**.
* Use **higher values (e.g., 0.5 or 0.8)** when you want to **preserve inter-cluster geometry**.

---

Together with `n_neighbors`, `min_dist` gives you powerful control over how **local vs. global** and **dense vs. sparse** your final UMAP plot will look.

Let’s move forward to see some examples or further tuning strategies. Please share the next slide.


![image.png](attachment:a9c0632c-fdfb-4a83-a052-79aa479dacb4.png)

This slide summarizes the **combined impact** of UMAP’s two most important hyperparameters:
👉 `n_neighbors` and `min_dist`.

---

### 🧪 Left Side: Fashion MNIST Visualization

Here we see a side-by-side comparison of:

* **UMAP** (left) and
* **t-SNE** (right)

Both were applied to the **Fashion MNIST** dataset (images of clothing).
UMAP clearly preserves both **clusters and relative positions**, while t-SNE provides a **tight clustering** but less interpretability in global structure.

---

### 🧩 Right Side: Hyperparameter Grid (PenDigits dataset)

This matrix shows how UMAP embeddings **change across different parameter settings**.

#### 📐 `n_neighbors`: Left → Right

* Increases from **5** to **320**
* Low: fine-grained clusters
* High: smoother, more blended structure

#### 📏 `min_dist`: Top → Bottom

* Increases from **0.0125** to **0.8**
* Low: tight clusters (points packed)
* High: loose clusters (points spread out)

---

### 🔎 Observations

* **Top-left (low n, low min\_dist)**:
  Very tight clusters, highly local detail.

* **Bottom-right (high n, high min\_dist)**:
  Very global structure, but detail is smoothed out.

* **Middle zone (n ≈ 20–80, min\_dist ≈ 0.05–0.2)**:
  Often gives a **good trade-off** between preserving detail and seeing global structure.

---

### 🔧 Practical Tip

When using UMAP:

* Tune `n_neighbors` to control the **scale of relationships** (local vs. global).
* Adjust `min_dist` to control the **tightness of clusters**.

There’s no one-size-fits-all setting—UMAP gives you flexibility depending on your **data** and **analysis goal**.

---

We’ve now covered the **theory, mechanics, and tuning** of UMAP.
Let me know if you'd like to proceed to applications, examples, or wrap-up slides.


![image.png](attachment:43c05cf1-4870-4327-84b3-73900d9aef8c.png)

Let’s wrap up our technical overview by summarizing the **key properties of UMAP**.

---

### 🔍 Interpretability

* UMAP is a **nonlinear dimensionality reduction method**, so unlike PCA, it **lacks straightforward interpretability**.
* In PCA, each principal component is a linear combination of features—but UMAP embeddings **do not carry such semantic meaning**.

---

### 🔗 Local and Global Structure

* A major strength of UMAP is its **ability to preserve both local and global structure**:

  * **Local**: Points that are close in high-dimensional space remain close.
  * **Global**: The overall geometry and relationships between clusters are reasonably well maintained.

---

### 🧠 Local Geometry Focus

* Similar to t-SNE, UMAP emphasizes **local geometry** over large-scale geometry.
* This means it’s particularly good at:

  * Identifying **clusters**
  * Revealing **fine structure**
  * Organizing data based on neighborhood graphs

---

### 📌 Driven by k-Nearest Neighbors

* UMAP’s behavior is **strongly influenced by the k-nearest-neighbor graph** built during the initial phase.
* This makes it sensitive to the choice of `n_neighbors`.

---

### ⚠️ Limitations

* **Not robust on small datasets**.

  * It performs best when there is enough data to build meaningful neighborhood relationships.
  * For very small datasets, the graph construction may not be reliable, and results can be noisy or unstable.

---

### ⚡ Performance Comparison

The **table** highlights one of UMAP’s biggest practical advantages:

| Dataset       | t-SNE Time | UMAP Time  |
| ------------- | ---------- | ---------- |
| COIL20        | 20 seconds | 7 seconds  |
| MNIST         | 22 minutes | 98 seconds |
| Fashion MNIST | 15 minutes | 78 seconds |
| GoogleNews    | 4.5 hours  | 14 minutes |

🟥 **UMAP is dramatically faster than t-SNE**, especially on large datasets.

---

So in conclusion:

* UMAP is a **powerful and efficient tool** for high-dimensional data visualization and analysis.
* While it’s not perfect for small datasets or direct feature interpretation, its **speed**, **scalability**, and **topology-preserving embeddings** make it a preferred choice for many modern machine learning tasks.

Let me know if you'd like to proceed with an application, demo, or final summary slide.


![image.png](attachment:1dff4364-271d-4043-af39-77f9576d729a.png)

Let’s conclude with this clear **summary table**, comparing the three main dimensionality reduction methods: **PCA**, **t-SNE**, and **UMAP**.

---

### 🧮 Method Overview

| **Property**         | **PCA**                    | **t-SNE**                      | **UMAP**                                    |
| -------------------- | -------------------------- | ------------------------------ | ------------------------------------------- |
| **Type**             | Linear                     | Non-linear                     | Non-linear                                  |
| **Focus**            | Maximize variance (global) | Preserve **local** structure   | Preserve **local and global** relationships |
| **Preserves**        | Spread of data (variance)  | Neighborhood similarities      | Cluster shape + manifold geometry           |
| **Speed**            | Very fast                  | Slow (esp. for large datasets) | Much faster than t-SNE                      |
| **Dimensionality**   | Any                        | Usually 2–3                    | Any                                         |
| **Stochasticity**    | Deterministic              | Stochastic                     | Stochastic                                  |
| **Interpretability** | Easy to interpret          | Hard to interpret              | Mid: better than t-SNE, worse than PCA      |
| **Outputs**          | Orthogonal axes            | Non-linear clusters            | Dense manifold with clusters and structure  |

---

### 🧠 Key Takeaways

* **PCA** is ideal for **interpretability** and speed, but only captures **linear structure**.
* **t-SNE** is great for visualizing **tight clusters**, but is **slow**, **non-deterministic**, and doesn’t preserve global structure.
* **UMAP** offers a **balanced, fast, and expressive** alternative that works well for **visualization, clustering, and exploration**—especially on large, complex datasets.

---

Thank you for your attention!
Let me know if you'd like to add a **Q\&A slide**, **references**, or **applications** to complete the presentation.


![image.png](attachment:1fe16936-9937-40c3-99b7-e24cf5098b4b.png)

This final slide provides **practical guidance** on **when to use PCA, t-SNE, or UMAP**, depending on your data and goals.

---

### 🧮 **PCA**

* Best for **linear** relationships and **variance preservation**.
* Very fast and easy to interpret.
* **Example**: Stock price analysis, where features often exhibit **linear dependencies**.

---

### 🔬 **t-SNE**

* Ideal for **fine-grained cluster separation**—especially when visualizing distinct **subpopulations**.
* Great for **medical diagnostics**, such as identifying cancer subtypes, where local structure reveals **biological differences**.
* Works well with **small to medium** datasets.

---

### 🌐 **UMAP**

* Designed for **large, high-dimensional, non-linear datasets**.
* Excels when:

  * Working with **embeddings** (e.g., NLP),
  * Or analyzing **biological data** (e.g., single-cell RNA-seq).
* Handles **millions of data points**, offering **speed**, **scalability**, and **balanced structure preservation**.

---

### ⚖️ Final Thought

> Always **consider the trade-off** between:

* 🔄 **Speed**
* 🔍 **Interpretability**
* 🧱 **Structure preservation**

You may even combine methods:

* Use PCA to reduce from 1000D to 50D,
* Then apply UMAP or t-SNE for 2D visualization.

---

Thank you for following this presentation on **UMAP and dimensionality reduction**.
Let me know if you’d like a conclusion slide, references, or Q\&A wrap-up.


![image.png](attachment:0812d592-d847-437a-ad5b-7e2edf91372a.png)

Let’s close with the key **takeaways** from this session on **dimensionality reduction**:

---

### 🧠 PCA – Principal Component Analysis

* **Strength**: Captures **linear** relationships and **maximizes variance**.
* **Use it when** you need speed, interpretability, and your data is **linearly structured**.
* 💡 *Think stock market data or financial indicators.*

---

### 🔬 t-SNE – t-Distributed Stochastic Neighbor Embedding

* **Strength**: Reveals **local structure** and forms **clear, detailed clusters**.
* **Use it when** you want high-resolution **cluster visualization**.
* ⚠️ Computationally heavy and hard to interpret globally.
* 💡 *Think cancer cell subtype analysis or digit recognition.*

---

### 🌐 UMAP – Uniform Manifold Approximation and Projection

* **Strength**: Balances **local and global** structure.
* **Use it when** you're working with **large, nonlinear datasets** (tens of thousands to millions of points).
* 🚀 *Faster than t-SNE, retains more global continuity, very flexible.*
* 💡 *Think gene expression, NLP embeddings, recommender systems.*

---

### 🎯 Final Thought

> Choosing between PCA, t-SNE, and UMAP is a **trade-off** between speed, interpretability, and structure preservation.

---

### ▶️ *Next Topic*: **MDS (Multidimensional Scaling)**

In the next session, we’ll explore **MDS**, an older but still useful technique for preserving **pairwise distances**.

Thank you for your attention! Let me know if you’d like speaker notes, a quiz, or a summary handout.


![image.png](attachment:77aa9d8a-b1d3-4aac-a7ca-ff090e25be17.png)

This final slide provides useful **references and resources** for further exploration and application of UMAP.

---

### 📘 Core Paper

**UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction**
📎 Link (2018, McInnes et al.): [SciPy 2018 UMAP Paper](https://arxiv.org/abs/1802.03426)
👉 Also available via **Moodle**.

This paper introduces the theory behind UMAP, including its topological foundations, fuzzy simplicial sets, and optimization framework.

---

### 🎥 Video Resource

**BioTuring Webinar**:
*A Practical Guide to UMAP*
🧠 By co-author **John Healy**, this is an excellent hands-on explanation of the algorithm, use cases, and tuning parameters.
📎 [Watch the webinar](https://www.youtube.com/watch?v=9HomdnM12oI)

---

### 💾 Installation Instructions

To install the official `umap-learn` package:

```bash
pip install umap-learn
# or
conda install -c conda-forge umap-learn
```

---

### 🔍 Documentation and Examples

Official docs:
🌐 [https://umap-learn.readthedocs.io](https://umap-learn.readthedocs.io)
Includes:

* Example notebooks,
* Parameter explanations,
* FAQs and common applications.

---

With that, your UMAP presentation is now complete.
Let me know if you'd like:

* a compact speaker script,
* a notebook template for the assignment,
* or practice quiz questions to reinforce learning.
