# Day 35 — "Geometry of Embeddings & Feature Spaces: Distance, Similarity & Collapse"

Embedding spaces are geometric objects. Learning reshapes distances and angles to make classes separable.


In [1]:
# Ensure repo root is on sys.path for local imports
import sys
from pathlib import Path

repo_root = Path.cwd()
if not (repo_root / "days").exists():
    for parent in Path.cwd().resolve().parents:
        if (parent / "days").exists():
            repo_root = parent
            break

sys.path.insert(0, str(repo_root))
print(f"Using repo root: {repo_root}")


Using repo root: /media/abdul-aziz/sdb7/masters_research/math_course_dlcv


## 1. Core Intuition

Learning is sculpting geometry: similar inputs move closer, dissimilar inputs move apart, and irrelevant variation is flattened.


## 2. Distance vs Angle

- L2 distance measures absolute separation.
- Cosine similarity measures alignment, ignoring magnitude.
- Normalized embeddings live on a unit hypersphere.


## 3. Feature Collapse

Collapse happens when all embeddings map to nearly the same point. Distances vanish, rank drops, and the model becomes useless.


## 4. Python — Distance & Collapse Checks

`days/day35/code/embedding_geometry.py` computes L2, cosine, and eigenvalue spectra.


In [2]:
from days.day35.code.embedding_geometry import l2_distance, cosine_similarity, detect_collapse
import numpy as np

x = np.array([1.0, 2.0])
y = np.array([2.0, 4.0])
print("L2:", l2_distance(x, y))
print("Cosine:", cosine_similarity(x, y))

rng = np.random.default_rng(0)
emb = rng.normal(0, 1, size=(100, 64))
eigvals = detect_collapse(emb)
print("Smallest eigenvalues:", eigvals[:5])


L2: 2.23606797749979
Cosine: 0.9999999000000099
Smallest eigenvalues: [0.06168722 0.07375072 0.08623057 0.09016501 0.09637953]


## 5. Visualization — Distances & Collapse Spectrum

`days/day35/code/visualizations.py` plots L2 vs cosine and covariance eigenvalues.


In [3]:
from days.day35.code.visualizations import plot_distance_comparison, plot_collapse_spectrum

RUN_FIGURES = False

if RUN_FIGURES:
    plot_distance_comparison()
    plot_collapse_spectrum()
else:
    print("Set RUN_FIGURES = True to regenerate Day 35 figures inside days/day35/outputs/.")


Set RUN_FIGURES = True to regenerate Day 35 figures inside days/day35/outputs/.


## 6. Geometry Across Layers

Early layers preserve many directions, middle layers compress, late layers separate classes.


## 7. Mini Exercises

1. Normalize embeddings and compare cosine vs L2.
2. Compute covariance spectrum to estimate intrinsic dimension.
3. Visualize embeddings with PCA.


## 8. Key Takeaways

- Embeddings are geometric objects.
- Distance and angle encode similarity.
- Collapse is geometric failure.
- Good embeddings cluster and separate.
