In [51]:
if True:
    from julia.api import Julia
    jl = Julia(compiled_modules=False)

#import julia; julia.install(quiet=True)
from julia import Main

import numpy     as np
from scipy.linalg import svd, qr
from scipy.linalg import qr, svd
from scipy.spatial.distance import squareform, pdist
from sklearn.cluster import SpectralClustering
from sklearn.decomposition import PCA
from scipy.optimize import linear_sum_assignment
from sklearn.metrics import confusion_matrix, adjusted_rand_score

import itertools

import param
import colorcet as cc

import panel     as pn; pn.extension()
import holoviews as hv; hv.extension( "bokeh", logo=False)

In [2]:
%load_ext julia.magic

Initializing Julia interpreter. This may take some time...




In [3]:
%%julia
using Pkg; Pkg.activate("../GenLinAlgProblems")
using GenLinAlgProblems, LinearAlgebra, RowEchelon, Printf, Latexify, LaTeXStrings, Random, SymPy

  Activating project at `~/elementary-linear-algebra/GenLinAlgProblems`


In [41]:
import warnings
from sklearn.exceptions import ConvergenceWarning
warnings.filterwarnings("ignore", message="k >= N for N * N square matrix")

<div style="height:2cm;">
<div style="float:center;width:100%;text-align:center;"><strong style="height:100px;color:darkred;font-size:40px;">Applications of Subspace Distances</strong>
</div></div>

# 1. Introduction

## 1.1. Objective

The previous notebooks introduced the Grassmannian manifold $\mathrm{Gr}(k, n)$,<br>
the space of all $k$-dimensional subspaces of $\mathbb{R}^n$.<br>
We developed the necessary tools: **principal angles, Grassmannian distances, geodesics**,<br>
and **optimization techniques** constrained to the manifold.

This notebook considers applications in which subspaces are the primary data objects.<br>
These include problems where information is distributed across sets of vectors,<br>
or where system states evolve within a low-dimensional linear structure.

Relevant contexts include:
- **Computer vision:** modeling image sets by low-rank approximations
- **Signal processing:** channel representations in MIMO systems
- **Model reduction:** subspace dynamics in large-scale simulations
- **Machine learning:** classification and clustering in subspace-based settings

In such cases, the Grassmannian framework supports tasks like interpolation, averaging, and optimization over subspaces.

We assume familiarity with the material developed in:
- [**PrincipalAngles.ipynb**](PrincipalAngles.ipynb)
- [**GrassmannianIntro.ipynb**](GrassmannianIntro.ipynb)
- [**GrassmannianGeodesicsOptimization.ipynb**](GrassmannianGeodesicsOptimization.ipynb)

The present notebook focuses on how the Grassmannian arises in practice, and how its geometry can be applied in computation.

We begin by reviewing why subspace-based models are natural in many applications.

## 1.2 Subspaces as Fundamental Representations

Many data- and model-driven problems lead not to individual vectors, but to sets of vectors that share a linear structure.<br>
It is often more appropriate to describe such a set by the subspace it spans,<br>
rather than by any particular basis or representative.

* In image recognition, for example, a set of images of a single subject under varying lighting<br>
can be approximated by a low-dimensional subspace.<br>
* In signal processing, time-varying observations may evolve within a subspace.
* In model reduction, dominant modes are extracted from snapshots of a high-dimensional system<br> and used to define a reduced basis.

In each case, the subspace — not the data points — becomes the object of interest.<br>
Comparisons, interpolations, and optimizations must be carried out over subspaces.<br>
This naturally leads to viewing subspaces as points on the Grassmann manifold $\mathrm{Gr}(k, n)$.

We will now recall the basic geometry of $\mathrm{Gr}(k, n)$ needed for applications.

____

The Grassmannian $\mathrm{Gr}(k, n)$ is a smooth manifold whose points correspond to $k$-dimensional linear subspaces of $\mathbb{R}^n$.<br>
$\qquad$ For computational purposes, a point in $\mathrm{Gr}(k, n)$ is typically represented<br>
$\qquad$ by an $n \times k$ matrix with orthonormal columns.<br>
$\qquad$ Two such matrices $U$ and $UQ$ (with $Q \in O(k)$) represent the same subspace.

Distances between subspaces are computed via the principal angles $\theta_1, \dots, \theta_k$ between them.<br>
$\qquad$ These are obtained from the singular values of $U^\top V$ for two orthonormal bases $U, V \in \mathbb{R}^{n \times k}$.

Standard metrics include:
- **Geodesic (arc length) distance:**
  $\quad d_g(U, V) = \left( \sum_{i=1}^k \theta_i^2 \right)^{1/2} $
- **Chordal distance**:
  $\qquad\qquad\qquad\quad d_c(U, V) = \left( \sum_{i=1}^k \sin^2 \theta_i \right)^{1/2} $
- **Projection (Frobenius) distance:**
  $\quad d_p(U, V) = \Vert U U^T - V V^T \Vert_F $

These distances define different geometries but are all intrinsic to the Grassmannian structure.

Geodesics on $\mathrm{Gr}(k, n)$ can be computed explicitly.<br>
$\qquad$ Given two points $U$ and $V$ in $\mathrm{Gr}(k, n)$, a shortest path connecting them<br>
$\qquad$ can be parameterized using the principal vectors and angles.<br>
$\qquad$ This structure supports interpolation, averaging, and optimization directly on the manifold.<br>

The tools developed in earlier notebooks allow us to compute these distances and geodesics numerically.<br>
$\qquad$ What follows is a series of applications where these structures are central.


# 2. Subspace Clustering

**Many high-dimensional datasets are composed of samples that approximately lie on a union of low-dimensional subspaces.**<br>
In such cases, classical clustering techniques like $k$-means, which assume isotropic point clouds,<br>
often fail to capture the underlying structure.

The goal of **subspace clustering** is to **partition the data into groups, each associated with a low-dimensional subspace.**<br>
Once the subspaces are estimated, each group can be represented as a point on $\mathrm{Gr}(k, n)$,<br>
and the clustering task becomes one of grouping points on the Grassmannian.

Typical applications include:
- Face recognition under varying illumination
- Motion segmentation in videos
- Multivariate time series with regime switching

This section demonstrates how Grassmannian distances can be used to build an affinity matrix<br>
and perform clustering based on subspace geometry.

____

#### Problem Statement

Let $X = \{x_i\}_{i=1}^{L M} \subset \mathbb{R}^N$ be a dataset consisting of $L$ clusters of $M$ points each, in ambient dimension $N$.<br>$\qquad$
Each cluster lies near an unknown $K$-dimensional linear subspace

$\qquad
X = \bigcup_{\ell=1}^L \mathcal{S}_\ell + \text{noise}, \quad \mathcal{S}_\ell \in \mathrm{Gr}(K, N)
$

The task is to assign each $x_i$ to its generating subspace $\mathcal{S}_\ell$, without access to the subspaces or ground-truth labels.

When $K \ll N$, Euclidean distance is ineffective: points from the same subspace may be far apart,<br>$\qquad$
and subspaces may be nearly orthogonal. Subspace-based clustering overcomes this by reasoning geometrically in $\mathrm{Gr}(K, N)$.

The method proceeds as follows:
1. Partition $X$ into overlapping local groups of size $m$.
2. For each group, form a matrix $X \in \mathbb{R}^{N \times m}$ and compute the rank-$K$ truncated SVD:<br>
   $\qquad
   X = U \Sigma V^\top, \quad U \in \mathbb{R}^{N \times K}
   $<br>
   The estimated subspace is $\mathscr{C}(U)$.
3. Represent each subspace by its orthonormal basis $U$.
4. Compute pairwise distances between subspaces using principal angles<br>
   $\qquad
   d(U_i, U_j) = \left\Vert \theta(U_i, U_j) \right\Vert_2, \quad \theta = \text{principal angles}
   $
5. Construct a symmetric affinity matrix $A \in \mathbb{R}^{n \times n}$ using a Gaussian kernel applied to the Grassmannian distances<br>
   $\qquad
   A_{ij} = \exp\left( -\frac{d(U_i, U_j)^2}{2\sigma^2} \right)
   $<br>
   where $\sigma$ controls the sensitivity to distance.

Define the (unnormalized) graph Laplacian $L = D - A$, where $D$ is the diagonal matrix with $D_{ii} = \sum_j A_{ij}$.<br>
$\qquad$ Spectral clustering embeds the data using the first $L$ eigenvectors of $L$, followed by $k$-means.

This algorithm uses the geometry of $\mathrm{Gr}(K, N)$ to recover clustering structure that is not visible in $\mathbb{R}^N$.

When subspaces are well-separated ($K \ll N$) and noise is small, clustering accuracy is high.<br>
$\qquad$ In the demo below, correct assignments appear as colored circles; misclassified subspaces are marked with crosses.<br>
$\qquad$ You can vary $N$, $K$, $L$, $M$, and noise level to explore different regimes.

This algorithm uses the geometry of $\mathrm{Gr}(K, N)$ to identify latent subspace structure that is invisible in $\mathbb{R}^N$.<br>
$\qquad$ When subspaces are well-separated ($K \ll N$) and noise is small, clustering accuracy is high.<br>$\qquad$ In the demo below, correct assignments appear as colored circles; misclassified subspaces are marked with crosses.

You can vary $N$, $K$, $L$, $M$, and noise level to explore different regimes.

In [57]:
def permute_labels(true_labels, pred_labels):
    C = confusion_matrix(true_labels, pred_labels)
    row_ind, col_ind = linear_sum_assignment(-C)
    mapping = dict(zip(col_ind, row_ind))
    return np.array([mapping.get(p, -1) for p in pred_labels])

class SubspaceClusteringDemo(pn.viewable.Viewer):
    n = param.Integer(20, bounds=(3, 50), label="Ambient Dimension (N)")
    k = param.Integer(3, bounds=(1, 20), label = "Subspace Dimension (K)")
    clusters = param.Integer(3, bounds=(2, 10), label="Clusters (L)")
    points_per_cluster = param.Integer(20, bounds=(5, 100), label="Points per Cluster (M)")
    groups_per_cluster = param.Integer(3, bounds=(1, 10), label="Groups per cluster")
    noise = param.Number(0.05, bounds=(0, 0.5), step=0.01)

    def __init__(self, **params):
        super().__init__(**params)
        self._enforce_valid_k()

    @param.depends('n', 'k', watch=True)
    def _enforce_valid_k(self):
        if self.k > self.n:
            self.k = self.n

    def random_subspace(self, n, k):
        Q, _ = qr(np.random.randn(n, k), mode='economic')
        return Q

    def principal_angles(self, U, V):
        if U.shape != V.shape:
            raise ValueError("Subspace shapes are incompatible.")
        _, s, _ = svd(U.T @ V)
        s = np.clip(s, -1.0, 1.0)
        return np.arccos(s)

    def grassmannian_distance(self, U, V):
        try:
            theta = self.principal_angles(U, V)
            return np.linalg.norm(theta)
        except Exception:
            return np.inf

    def permute_labels(self, true_labels, pred_labels):
        C = confusion_matrix(true_labels, pred_labels)
        row_ind, col_ind = linear_sum_assignment(-C)
        mapping = dict(zip(col_ind, row_ind))
        return np.array([mapping.get(p, -1) for p in pred_labels])

    @param.depends('n', 'k', 'clusters', 'points_per_cluster', 'groups_per_cluster', 'noise')
    def view(self):
        n, k, L, M = self.n, self.k, self.clusters, self.points_per_cluster
        G = self.groups_per_cluster
        noise = self.noise

        rng = np.random.default_rng(42)
        true_bases = [self.random_subspace(n, k) for _ in range(L)]

        # Generate full dataset
        data = []
        labels = []
        for i, B in enumerate(true_bases):
            coeffs = rng.standard_normal((k, M))
            X = B @ coeffs + noise * rng.standard_normal((n, M))
            data.append(X.T)
            labels.extend([i] * M)

        data = np.vstack(data)
        labels = np.array(labels)

        # Estimate subspaces
        est_bases = []
        est_labels = []

        for i in range(L):
            start = i * M
            end = (i + 1) * M
            X_cluster = data[start:end]
            cluster_label = i
            indices = np.array_split(np.arange(M), G)
            for idx in indices:
                if len(idx) < k:
                    continue
                block = X_cluster[idx]
                X_block = block.T - block.T.mean(axis=1, keepdims=True)
                U, S, _ = svd(X_block, full_matrices=False)
                if S[k-1] > 1e-10:
                    est_bases.append(U[:, :k])
                    est_labels.append(cluster_label)

        if len(est_bases) < L:
            return hv.Points([]).opts(
                title="Too few estimated subspaces for clustering",
                height=400, width=500
            )

        # Pairwise distances
        m = len(est_bases)
        D = np.zeros((m, m))
        for i in range(m):
            for j in range(i + 1, m):
                d = self.grassmannian_distance(est_bases[i], est_bases[j])
                D[i, j] = D[j, i] = d

        if np.isinf(D).any():
            return hv.Points([]).opts(
                title="Distance computation failed",
                height=400, width=500
            )

        # Clustering
        sigma = 1.0
        A = np.exp(-D**2 / (2 * sigma**2))
        try:
            with warnings.catch_warnings():
                warnings.simplefilter("ignore", ConvergenceWarning)
                pred_labels = SpectralClustering(
                    n_clusters=L,
                    affinity='precomputed',
                    random_state=0
                ).fit_predict(A)
        except Exception:
            return hv.Points([]).opts(
                title="Clustering failed",
                height=400, width=500
            )

        # Label alignment + ARI
        aligned_pred = self.permute_labels(est_labels, pred_labels)
        ari = adjusted_rand_score(est_labels, aligned_pred)
        correct = (aligned_pred == np.array(est_labels))

        # Visualization
        X_flat = np.vstack([B[:, 0] for B in est_bases])
        X_proj = PCA(n_components=2).fit_transform(X_flat)
        coords = X_proj[:, 0], X_proj[:, 1]
        palette = cc.glasbey[:L]

        correct_pts = hv.Points(
            (coords[0][correct], coords[1][correct], aligned_pred[correct]),
            ['x', 'y'], 'Cluster'
        ).opts(marker='circle', color='Cluster', cmap=palette, size=6, tools=['hover'])

        incorrect_pts = hv.Points(
            (coords[0][~correct], coords[1][~correct], aligned_pred[~correct]),
            ['x', 'y'], 'Cluster'
        ).opts(marker='x', color='Cluster', cmap=palette, size=8, tools=['hover'])

        return (correct_pts * incorrect_pts).opts(
            width=500, height=400,
            title=f"Subspace clustering (ARI = {ari:.2f})"
        )

    def __panel__(self):
        widgets = pn.Param(
            self,
            parameters=['n', 'k', 'clusters', 'points_per_cluster', 'groups_per_cluster', 'noise'],
            show_name=False, width=300
        )
        caption = pn.pane.Markdown(
            "Each point represents an estimated subspace,<br> visualized by its first basis vector (via PCA).<br>"
            "`circle` = correctly clustered, `x` = misclassified (after optimal label alignment).<br>"
            "**ARI** measures clustering accuracy up to label permutation; 1.0 is perfect.",
            width=500, height=50
        )
        return pn.Column(
            pn.Row(pn.Column(widgets, caption), self.view),
        )

SubspaceClusteringDemo().servable()

#### Subspace Clustering Demo
Samples from low-dimensional subspaces are clustered using Grassmannian distances.<br>
Each point represents one estimated subspace visualized by its first PCA feature vector.<br>
`circle` = correctly clustered, `x` = misclassified based on true subspace label.<br><br>
Vary dimension, number of clusters, and noise to assess clustering stability.<br>
Incorrect assignments are marked with x

**Remarks:**
* When the subspaces are sufficiently separated (small $K$, large $N$), and noise is low,<br>
the clustering will recover the true assignment up to permutation.<br>
In the 2D projection, correctly classified subspaces appear as distinct clusters; misclassified points are marked with crosses.
* As $K$ increases or $N$ decreases, subspaces begin to overlap and principal angles shrink.<br>
Noise further degrades separation. Under these conditions, clustering performance degrades.<br><br>
Choose $k \ll n$, with $m \geq k+1$, and moderate noise to observe successful recovery.

**Further Reading:**
* [**"A Tutorial on Subspace Clustering", René Vidal**](https://www.cis.jhu.edu/~rvidal/publications/SPM-Tutorial-Final.pdf)
* [**"Subspace Clustering", Madalina Ciortan**](https://medium.com/data-science/subspace-clustering-7b884e8fff73)

# OLD

Subspace distances appear in a wide range of applications.

Here we apply the tools from previous notebooks to PCA drift detection, subspace clustering, and signal analysis.


## Applications of Subspace Distances

Principal angle–based distances appear in many tasks  
where data or signals lie in low-dimensional subspaces.

We apply the metrics introduced earlier to three settings:  
PCA drift detection, anomaly detection, and subspace clustering.

---

### PCA Drift Detection

Given a streaming data source, monitor the leading principal subspace.  
Compare $U_t$ and $U_{t-1}$ using:

- Spectral distance → detects large changes  
- Frobenius distance → smooth sensitivity to gradual drift

Visualize angle trajectory and metric response over time.

---

### Anomaly Detection via Subspace Change

Anomalies can shift the data subspace.  
Track deviation of current subspace $U_t$ from reference $U_0$.

Use:

- Projection distance → sensitive to operator changes  
- Chordal distance → robust to small fluctuations

---

### Subspace Clustering

Data points lie near multiple subspaces.  
Cluster them by pairwise subspace distance.

- Build affinity matrix from projection or geodesic distances  
- Apply spectral clustering  
- Compare results across distance metrics

---

### Metric Comparison

Each distance captures a different notion of subspace difference:

- Spectral: peak deviation  
- Frobenius: total energy  
- Projection: matrix/operator change  
- Geodesic: intrinsic geometry

Choose the metric that matches what matters in your task.

---

### Summary

Subspace distances are not interchangeable.  
Different applications favor different metrics — depending on  
sensitivity to local variation, noise, or geometric structure.

