#### Load Libraries and Define Functions

In [None]:
if True:
    from julia.api import Julia
    jl = Julia(compiled_modules=False)

#import julia; julia.install(quiet=True)
from julia import Main

import numpy     as np
from scipy.linalg import svd, qr

import panel     as pn; pn.extension()
import holoviews as hv; hv.extension( "bokeh", logo=False)

In [None]:
%load_ext julia.magic

In [None]:
%%julia
using Pkg; Pkg.activate("../GenLinAlgProblems")
using GenLinAlgProblems, LinearAlgebra, RowEchelon, Printf, Latexify, LaTeXStrings, Random, SymPy

In [None]:
%%julia
function ex()
    function f(U)
        P_U = U * U'
        P_T = T * T'
        return norm(P_U - P_T)^2
    end

    function grad_f(U)
        return 2 * (U * U' - T * T') * U
    end

    # Starting subspace U₀ ∈ Gr(2,3)
    U₀ = [1.0 0.0;
          0.0 1.0;
          0.0 0.0]

    # Target subspace T: rotated 2-plane
    θ = π / 6
    T = [1.0 0.0;
         0.0 cos(θ);
         0.0 sin(θ)]
    T = Matrix(qr(T).Q)  # Ensure T is orthonormal

    # Run optimization
    U_final, losses = grassmann_optimize(f, grad_f, U₀, t=0.1, max_iter=100)

    # Print first and last few losses
    py_show("Initial loss: ", round(losses[1], digits=4))
    py_show("Final loss: ", round(losses[end], digits=4))
    return losses
end;

#### 
<div style="height:2cm;">
<div style="float:center;width:100%;text-align:center;"><strong style="height:100px;color:darkred;font-size:40px;">Geodesics and Optimization of the Grassmannian</strong>
</div></div>

# 1. Motivation: Why Optimize on the Grassmannian?

We now shift from exploring the geometry of subspaces to using it:<br>
applying the Grassmannian structure to design **optimization algorithms that stay on the manifold.**

In many applications, the object we’re trying to learn is not a vector, but a subspace.
- In **PCA**, we learn a low-dimensional subspace that captures most of the variance.
- In **subspace tracking**, we follow how that subspace evolves over time.
- In **compressed sensing** and **dimension reduction**, we work with structured low-rank approximations.

These subspaces are elements of the **Grassmannian manifold**, $ \mathrm{Gr}(k, n),$ the space of all $k$-dimensional subspaces of $ \mathbb{R}^n $.

---

#### Euclidean Updates Don't Work

Suppose we represent a subspace $ \mathcal{U} \subset \mathbb{R}^n $ by an orthonormal matrix $ U \in \mathbb{R}^{n \times k} $, with $ U^T U = I $.<br>
If we try to update it using a standard gradient step:

$\qquad
U_{\text{new}} = U - \eta\ \nabla f(U),
$

the new matrix $ U_{\text{new}} $ will **no longer be orthonormal**, and may not even span a valid subspace.

**Remark:** Matrix derivatives are treated in [**this notebook**](MatrixDerivatives.ipynb)

---

#### Grassmannian Optimization

To fix this, we need to:
1. Understand the **tangent space** at $U$, where we can take meaningful steps.
2. Use the **Riemannian gradient** — a projection of the Euclidean gradient.
3. Step along **geodesics** to stay on the manifold.

This is the core idea of **optimization on the Grassmannian**:<br>
Move along directions that respect the geometry, keeping subspaces valid at every step.

# 2. Riemannian Optimization on $\mathrm{Gr}(k, n)$

## 2.1 Tangents and Gradients

### 2.1.1 Tangent Space Matrix

<div style="background-color:#F2F5A9;color:black;padding-bottom:0cm;">

**Def:** Let $U \in \mathbb{R}^{n \times k}$ be a matrix with orthonormal columns, representing a point on the Grassmannian $\mathrm{Gr}(k, n)$.<br>
$\qquad$ The **tangent space** to the Grassmannian at $U$, denoted $T_U \mathrm{Gr}(k, n)$, is the set of all matrices $Z \in \mathbb{R}^{n \times k}$ such that

$\qquad\qquad
U^T Z = 0.
$
</div>

The definition ensures that each column of $Z$ is orthogonal to the columns of $U$.<br>
$\qquad$ That is, tangent vectors lie in the **orthogonal complement of the column space of $U$**<br>
$\qquad$ (This follows directly from the **Fundamental Theorem of Linear Algebra**)

**Remarks:**
* **$\mathbf{T_U \mathrm{Gr}(k, n)}$ is standard notation** and reads
"*the tangent space to the Grassmannian $\mathrm{Gr}(k, n)$ at the point $U$.*"
* Although the tangent space has dimension $k(n - k)$, we represent its elements as $n \times k$ matrices satisfying $U^T Z = 0$.<br>
This keeps the representation consistent with the shape of $U$ and simplifies computations, especially in optimization contexts.

---

#### Example: Tangent Space in $ \mathrm{Gr}(2,3) $

Consider $\;
U = \begin{pmatrix}
1 & 0 \\
0 & 1 \\
0 & 0
\end{pmatrix},\;
$
which represents the standard 2-plane spanned by the $x$- and $y$-axes in $\mathbb{R}^3$.

A matrix $Z \in T_U \mathrm{Gr}(2,3)$ must satisfy $U^T Z = 0$. i.e., its columns must be in $\mathscr{N}(U^T) = \mathrm{span}\left\{ \begin{pmatrix}0\\0\\1\end{pmatrix}  \right\}$.

So any **tangent matrix** $Z$ must have the form $\;
Z = \begin{pmatrix}
0 & 0 \\
0 & 0 \\
z_{1} & z_{2}
\end{pmatrix}.
$

That is, the tangent space consists of all $3 \times 2$ matrices with nonzero entries only in the third row, i.e.,<br>
directions pointing "upward" out of the $xy$-plane into $z$.

### 2.1.2 Projection onto the Tangent Space

Suppose we have an **Euclidean gradient** $G = \nabla f(U)$.<br>
The **Riemannian gradient** is its projection onto the tangent space:

$\qquad
\mathrm{grad}_U f = (I - U U^T) G.
$

This removes the component of $G$ that would push $U$ off the manifold.


**Remark:** The Riemannian gradient $\mathrm{grad}_U f$ is typically rank-deficient as a matrix,<br>
$\qquad$ because it lies entirely in the tangent space $T_U \mathrm{Gr}(k, n)$, which has dimension $k (n - k) < n k$.

---

**Key Property:**<br>
$\qquad$ This projected gradient tells us how to move **within** the Grassmannian, rather than stepping off the manifold.<br>
$\qquad$ It plays the same role as a regular gradient, just **respecting the constraint**<br>
$\qquad$ that $U$ must remain an orthonormal basis for a $k$-dimensional subspace.

#### Example Riemannian Gradient in $\mathrm{Gr}(2,3)$

Suppose we are minimizing a function $f(U)$ and we compute the Euclidean gradient as
$\;
G = \begin{pmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{pmatrix}.
$

To obtain the **Riemannian gradient**, we project $G$ onto the tangent space using:

$\qquad
\mathrm{grad}_U f = (I - U U^T) G.
$

Taking $U$ as $\;\;
U = \begin{pmatrix}
1 & 0 \\
0 & 1 \\
0 & 0
\end{pmatrix},\;\; \text{then}\;\;
U U^T = \begin{pmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 0
\end{pmatrix},
\quad
I - U U^t = \begin{pmatrix}
0 & 0 & 0 \\
0 & 0 & 0 \\
0 & 0 & 1
\end{pmatrix}.
$

we obtain
$\;
\mathrm{grad}_U f =
\begin{pmatrix}
0 & 0 & 0 \\
0 & 0 & 0 \\
0 & 0 & 1
\end{pmatrix}
\begin{pmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{pmatrix}
=
\begin{pmatrix}
0 & 0 \\
0 & 0 \\
5 & 6
\end{pmatrix}.
$

So the Riemannian gradient only has nonzero entries in the third row — just like any valid tangent matrix at $U$.

### 2.1.3  Why Use $n \times k$ Matrices to Represent Tangents and Gradients?

Even though the Grassmannian has dimension $k(n - k)$, we represent tangent vectors and Riemannian gradients as $n \times k$ matrices<br>
$\qquad$ with orthogonality constraints (i.e., $U^T Z = 0$).

This representation lives in the same ambient space as $U$ itself, making it easy to perform computations<br>
$\qquad$ like projecting gradients or stepping along geodesics — using standard matrix operations.

**Remark:** These definitions are also **coordinate-free**: we do not need to choose a specific basis for the tangent space:<br>
replacing $U$ with $U Q$ for any $Q \in \mathrm{O}(k)$ does not change the subspace or the structure of the tangent space.

By using $n \times k$ matrices, we preserve both **geometric clarity** and **computational convenience**.

## 2.2 Geodesic Updates and Optimization

### 2.2.1 Geodesic Updates from a Tangent Vector

In the [**previous notebook**](GrassmannianIntro.ipynb), we introduced geodesics as smooth curves connecting subspaces,<br>
parameterized by principal vectors and principal angles between known points on the Grassmannian.

Here, we take a **local** perspective.<br>
Given a point $U \in \mathrm{Gr}(k, n)$ and a tangent vector $Z \in T_U \mathrm{Gr}(k,n)$,<br>
we want to move along the geodesic $U(t)$ starting at $U$ in the direction $Z$.

When $Z = A \Theta B^T$ is the compact SVD of the tangent matrix,<br>
the geodesic is given by:

$\qquad
U(t) = U B \cos(\Theta t) B^T + A \sin(\Theta t) B^T.
$

This curve:
- Starts at $U$ with initial velocity $Z$,
- Preserves orthonormality of $U(t)$ for all $t$,
- Stays on the Grassmannian $\mathrm{Gr}(k,n)$.

In optimization, we typically approximate the geodesic near $t = 0$ with a first-order step:

$\qquad
U_{\text{trial}} = U + t Z,
$

This update does not preserve orthonormality, and while $U_{\text{trial}}$ may remain full column rank<br>
and span a $k$-dimensional subspace, its column space does not generally lie on the true geodesic.<br>
For small $t$, it provides a useful first-order approximation.

To restore an orthonormal basis for the updated subspace, we apply a **QR retraction**:

$\qquad
U_{\text{new}} = \mathrm{qf}(U + t Z),
$

where $\mathrm{qf}(\cdot)$ extracts the orthonormal factor from a QR decomposition.

<details>
<summary>Why apply the QR retraction?</summary>

For small enough $t$, $U + tZ$ typically remains full column rank.<br>
QR retraction
1. Restores a well-conditioned orthonormal matrix on the Grassmannian.
2. Keeps the result close to the true geodesic for small $t$.
</details>

### 2.2.2 Example: Minimizing a Subspace Loss

We now apply the QR retraction in a simple optimization task on the Grassmannian $\mathrm{Gr}(2,3)$.

We define the following loss function:

$\qquad
f(U) = \| P_u - P_t \|_F^2,
$

where $P_u = U U^T$ is the projection matrix onto the subspace spanned by $U$,<br>
and $T$ is a fixed $3 \times 2$ orthonormal matrix defining a target subspace with projection matrix $P_t$.

This is a convex function of projection matrices, but not of $U$ directly — so the optimization is nontrivial.

---

**What We are Trying to Show**

We already know the optimal subspace $T$, so this example is not about *finding* a solution.<br>
Instead, it demonstrates that
- The Riemannian gradient defines a descent direction within the tangent space.
- A retraction (via QR) maps the updated point back to the Grassmannian.
- The loss $f(U)$ decreases over successive steps.

This confirms that **Riemannian gradient descent with retraction** respects the manifold structure and behaves as expected<br>
even when applied to a curved, non-Euclidean space of subspaces.

#### Step 2: Compute the Riemann Gradient

In [None]:
pn.Column(
    "## Convergence of Riemannian Gradient Descent",
    hv.Curve(Main.losses, "Iteration", "Loss" ).opts( show_grid=True, tools=['hover'])
)

**Riemannian gradient descent with retraction**<br>
Loss values over successive iterations of gradient descent on the Grassmannian $\mathrm{Gr}(2,3)$,<br>
using a projected Riemannian gradient and QR retraction to preserve orthonormality.

# 3. Take Away

This notebook reframes the geometry of the Grassmannian in computational terms:

- A **tangent vector** $Z \in T_U \mathrm{Gr}(k, n)$ defines a local direction of motion.
- A **first-order step** $U + tZ$ must be corrected to stay on the manifold.
- The **QR retraction** maps the updated point back to $\mathrm{Gr}(k, n)$ while preserving orthonormality.
- Gradient descent using this scheme leads to consistent reduction of geometric loss functions.

These ingredients — tangent vectors, gradient projection, and retraction — form the core of Riemannian optimization.

They also serve as the foundation for algorithms in the next notebook,<br>
where we apply these techniques to interpolation, regression, and clustering of subspaces.