In [None]:
import numpy as np
from scipy.linalg import eigh

import holoviews as hv
import panel as pn
hv.extension('bokeh', logo=False)
pn.extension('katex')

<div style="height:2cm;">
<div style="float:center;width:100%;text-align:center;"><strong style="height:100px;color:darkred;font-size:40px;">The Generalized SVD</strong>
</div></div>

# 1 Preliminaries and Notation

Let $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{p \times n}$, with $\operatorname{rank}(A) = r_A$ and $\operatorname{rank}(B) = r_B$.

**Consider the minimization problem**

$\qquad
\operatorname{argmin}_x \Vert A x - b\Vert^2 + \lambda \Vert B x\Vert^2
$

$\qquad$ where $\lambda \geq 0$ is a real parameter and $\Vert \cdot \Vert$ is the Euclidean norm.

$\qquad$ The second term penalizes certain directions in $\mathbb{R}^n$ via the matrix $B$.

**Example:**<br>
$\qquad$ Let $n = 2$, $B = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}.\;\;$ Then
$\;\;
\Vert B x\Vert^2 = x_1^2.
$

$\qquad$ The penalty affects $x_1$ but not $x_2$. The balance between fitting $A x \approx b$ and penalizing $x_1$ depends on the structure of $A$ relative to $B$.

In general, **the interaction between $A$ and $B$ determines the behavior of the solution:**<br>
$\qquad$ The solution of the minimization problem depends on the joint action of $A$ and $B$.<br>
$\qquad$ Understanding this interaction requires analyzing the matrix pair $(A, B)$.
___

#### Normal Equation

The objective function for the minimization problem is given by

$\qquad
\begin{align}
f(x) &= \Vert A x - b \Vert^2 + \lambda \Vert B x \Vert^2\\
&= (A x - b)^T (A x - b) + \lambda (B x)^T (B x) \\
&= x^T A^T A x - 2 b^T A x + b^T b + \lambda x^T B^T B x
\end{align}$

Taking the gradient with respect to $x$ yields

$\qquad
\nabla_x f(x) = 2 A^T A x - 2 A^T b + 2 \lambda B^T B x.
$

Setting $\nabla_x f(x) = 0$ yields the normal equation

$\qquad
(A^T A + \lambda B^T B)\ x = A^T b.
$

# 2. Variational Formulation

The matrix $A^T A + \lambda B^T B$ governs the solution of the minimization problem.<br>
The relative influence of $A^T A$ and $B^T B$ depends on the parameter $\lambda$.

To understand this interaction, consider the generalized eigenvalue problem

$\qquad
A^T A x = \mu B^T B\ x \;\; \Leftrightarrow \;\; (A^T A - \mu B^T B)\ x = 0.
$

The pair $(A^T A, B^T B)$ defines a **matrix pencil**
$\;\;
A^T A - \mu B^T B.
$

#### Observations

- The solution to the normal equation depends on the interaction between $A^T A$ and $B^T B$.
- The matrix $B^T B$ introduces directional penalties that affect the solution in ways not visible from $A$ alone.
- To understand how these two quadratic forms compete across directions in $\mathbb{R}^n$,<br> it is useful to study their relative action on vectors.
- This leads naturally to considering the generalized Rayleigh quotient.

____
The **generalized Rayleigh quotient** is

$\qquad
\rho(x) = \Large{\frac{ \Vert A x \Vert^2 }{ \Vert B x \Vert^2 } }= \Large{\frac{ x^T A^T A\ x }{ x^T B^T B\ x }}.
$

This quotient is invariant under scaling of $x$.
We can therefore impose a normalization constraint and consider<br>

$\qquad
\max_{x} \; x^T A^T A x \quad \text{subject to} \quad x^T B^T B\ x = 1.
$

The Lagrangian for this problem is

$\qquad
L(x, \mu) = x^T A^T A x - \mu (x^T B^T B\ x - 1).
$

Setting $\nabla_x L = 0$ yields

$\qquad
A^T A x = \mu B^T B x.
$

**The generalized eigenvectors and eigenvalues $\mu$ thus correspond to stationary values of $\rho(x)$.**

#### Example

Consider the Rayleigh coefficient
$\qquad
\rho(x) = \Large{\frac{ \Vert A x \Vert^2 }{ \Vert B x \Vert^2 }},\;\;
$ with $A = \begin{pmatrix} 3 & 2\\ 1 & 4\end{pmatrix}\;$ and $\;B = \begin{pmatrix} 2.0 & 0.5 \\ 0.1 & 1.5 \end{pmatrix}$

$\qquad$ for vectors $x$ on the unit circle in $\mathbb{R}^2$, i.e.,
$\;\;x(\theta) = \begin{pmatrix} \cos \theta \\ \sin \theta \end{pmatrix}$.

Since $\rho(x)$ depends only on the direction of $x$, and since $\rho(x(\theta)) = \rho(x(\theta + \pi))$, it suffices to plot $\rho(x)$ over $\theta \in [0, \pi]$.

The stationary values of $\rho(x)$ correspond to the generalized eigenvalues $\mu$ of the pencil
$A^T A - \mu B^T B$.<br>
$\qquad$ These values are marked on the plot to illustrate the connection<br>
$\qquad$  between the variational formulation and the generalized eigenvalue problem.

In [None]:
class GSVDQuotientViewer(pn.viewable.Viewer):
    def __init__(self, A, B, **params):
        super().__init__(**params)
        self.A, self.B = A, B
        theta = np.linspace(0, np.pi, 500)  # theta in [0, pi]
        self.theta = theta

        # Evaluate rho(x) for x(theta) = [cos θ, sin θ]^T
        rho_vals = []
        for t in theta:
            x = np.array([np.cos(t), np.sin(t)])
            num = x @ (A.T @ A) @ x
            den = x @ (B.T @ B) @ x
            rho = num / den if den > 1e-12 else np.nan
            rho_vals.append(rho)

        self.rho = np.array(rho_vals)

        # Generalized eigenvalues
        eigvals, eigvecs = np.linalg.eig(np.linalg.inv(B.T @ B) @ (A.T @ A))
        idx = np.argsort(eigvals)
        self.mu = eigvals[idx]
        self.xs = eigvecs[:, idx].T

    def _get_plot(self):
        curve = hv.Curve((self.theta, self.rho), 'θ', 'ρ(θ)')

        pts = []
        vlines = []
        for x, mu in zip(self.xs, self.mu):
            t = np.arctan2(x[1], x[0])
            t = t % np.pi
            pts.append((t, mu))
            vlines.append(hv.VLine(t).opts(color='red', line_width=1.5))

        markers = hv.Scatter(pts, 'θ', 'ρ').opts(color='red', size=10)

        plot = curve * markers
        for vline in vlines:
            plot *= vline

        return plot.opts(width=550, height=300, title='Generalized Rayleigh Quotient and Stationary Directions', show_grid=True)

    def _get_markdown(self):
        """Return a Column with Markdown header + LaTeX for mu_i and x_i."""
        items = [] #[pn.pane.Markdown('### Generalized eigenvalues and eigenvectors')]

        for i, (x, mu) in enumerate(zip(self.xs, self.mu), start=1):
            t = np.arctan2(x[1], x[0]) % np.pi
            text_md = f'**μ_{i}** = {mu:.4f},  θ = {t:.4f} rad'
            vec_latex = (
                r'$$x_' + f'{i} = \\begin{{pmatrix}} ' +
                f'{x[0]:.4f} \\\\ {x[1]:.4f}' +
                r'\end{pmatrix}$$'
            )
            items.append(
                pn.Column(
                    pn.pane.Markdown(text_md),
                    pn.pane.LaTeX(vec_latex)
                )
            )

        return pn.Column(*items, width=300)

    def __panel__(self):
        return pn.Row(
            pn.pane.HoloViews(self._get_plot()),
            pn.Spacer(width=20),
            pn.Column( pn.Spacer(height=20),
                      '### Generalized eigenvalues and eigenvectors',
                       pn.Row(  pn.Spacer(width=23),self._get_markdown()))
        )

# Example usage with asymmetric A, B
GSVDQuotientViewer( np.array([[3.0, 2.0], [1.0, 4.0]]),
                    np.array([[2.0, 0.5], [0.1, 1.5]])).servable()

**Plot of the generalized Rayleigh quotient:**<br>
$\qquad$ Stationary values correspond to generalized eigenvalues $\mu_i$ of the pencil $A^T A - \mu B^T B$.<br>
$\qquad$ Vertical lines indicate directions $x_i$ associated with $\mu_i$.

#### Interpretation

- The generalized Rayleigh quotient $\rho(x)$ identifies dominant directions balancing the actions of $A$ and $B$.
- The generalized eigenproblem yields these extremal directions, but does not provide a full basis for analyzing the joint action of $A$ and $B$.
- A complete understanding of the problem requires knowing how $A$ and $B$ act on all directions in $\mathbb{R}^n$.
- The generalized singular value decomposition (GSVD) will provide an orthogonal basis revealing this structure,<br> allowing both $A$ and $B$ to be represented in simple, compatible forms.

# 3. Towards the Generalized SVD

The generalized eigenproblem

$\qquad
A^T A x = \mu B^T B x
$

identifies extremal directions $x$, but does not provide a complete basis for understanding the joint action of $A$ and $B$.

In the normal equation

$\qquad
(A^T A + \lambda B^T B) x = A^T b,
$

the solution depends on how the two quadratic forms act across **all directions** in $\mathbb{R}^n$.

To make this interaction transparent, it is desirable to find a basis of $\mathbb{R}^n$ in which both $A$ and $B$<br>
$\qquad$ are represented in simple form.<br>
$\qquad$ This requires a single change of basis matrix $X$ such that, when expressing vectors in the $X$-basis,<br>
$\qquad$ both $A$ and $B$ take diagonal (or nearly diagonal) forms.
____

This leads to the **generalized singular value decomposition (GSVD)**, which produces matrices

- $U \in \mathbb{R}^{m \times m}$ orthogonal,
- $V \in \mathbb{R}^{p \times p}$ orthogonal,
- $X \in \mathbb{R}^{n \times n}$ invertible (not orthogonal in general!),

and diagonal matrices

- $C = \operatorname{diag}(c_1, \dots, c_n)$,
- $S = \operatorname{diag}(s_1, \dots, s_n)$,

such that

$\qquad
A = U C X^T, \quad B = V S X^T,
$

We will further see that the diagonal entries of $C$ and $S$ are related

$\quad
c_i^2 + s_i^2 = 1 \quad \text{for all} \ i.
$

**The columns of $X$ define a basis in which the relative action of $A$ and $B$ is fully revealed.**

The next notebook derives this decomposition.
____

#### Example

We now return to the matrix pair used in the previous Rayleigh quotient example:

$\qquad
A = \begin{pmatrix} 3.0 & 2.0 \\ 1.0 & 4.0 \end{pmatrix}, \quad
B = \begin{pmatrix} 2.0 & 0.5 \\ 0.1 & 1.5 \end{pmatrix}.
$

The stationary values of the generalized Rayleigh quotient

$\qquad
\rho(x) = \frac{ \Vert A x \Vert^2 }{ \Vert B x \Vert^2 }
$

yield two extremal directions $x_i, $ i.e., the generalized eigenvalue eigenvector pairs of the generalized normal equation

$\qquad
\begin{align}
\mu_1 &\approx 1.4344, \quad x_1 \approx \left(\begin{array}{r} −0.8913 \\ 0.4534 \end{array}\right), \\
\mu_2 &\approx 8.0112, \quad x_2 \approx \left(\begin{array}{r} −0.0356 \\ -0.9994 \end{array}\right).
\end{align}
$

However, in the normal equation

$\qquad
(A^T A + \lambda B^T B) x = A^T b,
$

the solution depends on the behavior of both $A^T A$ and $B^T B$ across **all directions** in $\mathbb{R}^2$.<br>
A basis that diagonalizes $A^T A$ does not in general diagonalize $B^T B$.

The GSVD provides a common basis $X$ where both $A$ and $B$ take simple forms:

$\qquad
A = U C X^T, \quad B = V S X^T.
$

This decomposition reveals how $A$ and $B$ interact in each direction defined by the columns of $X$.

To visualize this, we plot the images of the unit circle under $A X^T$ and under $B X^T$.

In [None]:
class GSVD_EllipseViewer(pn.viewable.Viewer):
    def __init__(self, A, B, **params):
        super().__init__(**params)
        self.A = A
        self.B = B

        # Step 1: Compute A^T A and B^T B
        AtA = A.T @ A
        BtB = B.T @ B

        # Step 2: Solve generalized eigenproblem A^T A x = mu B^T B x
        mu, X = eigh(AtA, BtB)

        # Step 3: Normalize columns of X (unit norm)
        X = X / np.linalg.norm(X, axis=0)

        # Store results
        self.mu = mu
        self.X = X

        # Step 4: Compute A X and B X
        CX = A @ X
        SX = B @ X

        # Step 5: Compute C and S diagonal entries
        c = np.linalg.norm(CX, axis=0)
        s = np.linalg.norm(SX, axis=0)

        # Normalize to c_i^2 + s_i^2 = 1
        norm_factors = np.sqrt(c**2 + s**2)
        self.c = c / norm_factors
        self.s = s / norm_factors

        # Optional: rescale U and V accordingly
        self.U = CX / norm_factors
        self.V = SX / norm_factors

    def _plot_ellipses(self):
        theta = np.linspace(0, 2*np.pi, 300)
        unit_circle = np.vstack([np.cos(theta), np.sin(theta)])
        numer = np.sum((self.A @ unit_circle)**2, axis=0)
        denom = np.sum((self.B @ unit_circle)**2, axis=0)
        rho = numer / denom

        # Rayleigh curve points
        rayleigh_curve = np.sqrt(rho) * unit_circle

        # Express in X basis
        circle_in_X_basis = self.X.T @ unit_circle

        # Map through A X^T and B X^T
        A_image = self.A @ self.X @ circle_in_X_basis
        B_image = self.B @ self.X @ circle_in_X_basis

        # Build Holoviews curves
        curve_A = hv.Curve((A_image[0], A_image[1]), 'x', 'y', label='A').opts(
            color='blue', line_width=2, muted_alpha=0)
        curve_B = hv.Curve((B_image[0], B_image[1]), 'x', 'y', label='B').opts(
            color='green', line_width=2, muted_alpha=0)
        unit_circle_curve = hv.Curve((unit_circle[0], unit_circle[1]), 'x', 'y', label='Unit').opts(
            color='gray', line_dash='dashed', line_width=1, muted_alpha=0)

        # Add generalized eigenvector directions + markers at sqrt(mu_i)
        L = 3.0
        lines = []
        markers = []
        for x, mu in zip(self.X.T, self.mu):
            x = x / np.linalg.norm(x)
            # Line from -L * x to +L * x
            xs = [-L * x[0], L * x[0]]
            ys = [-L * x[1], L * x[1]]
            line = hv.Curve((xs, ys), 'x', 'y', label='Evec').opts(color='red', line_width=1.5, muted_alpha=0)

            # Marker at sqrt(mu_i) * x
            r = np.sqrt(mu)
            marker_x = r * x[0]
            marker_y = r * x[1]
            marker = hv.Scatter(([marker_x], [marker_y]), label="Evec").opts(color='red', size=8, muted_alpha=0)

            lines.append(line)
            markers.append(marker)

        rayleigh_curve_plot = hv.Curve((rayleigh_curve[0], rayleigh_curve[1]), kdims=['x'], vdims=['y'], label='Rayleigh').opts(
            color='brown', line_width=2, muted_alpha=0)#line_dash='dotted')

        plot = (curve_A * curve_B * unit_circle_curve *rayleigh_curve_plot * hv.Overlay(lines) * hv.Overlay(markers)).opts(
            width=400, height=400, show_grid=True, aspect='equal',
            shared_axes=False, legend_position='bottom',
            title='GSVD: Action of A and B on Unit Circle')
        return plot

    def _plot_ellipses_in_X_basis(self):
        theta = np.linspace(0, 2*np.pi, 300)
        u_circle = np.vstack([np.cos(theta), np.sin(theta)])

        C_u = np.diag(self.c) @ u_circle
        S_u = np.diag(self.s) @ u_circle

        curve_C = hv.Curve((C_u[0], C_u[1]), "x̃", "ỹ", label='A').opts(
            color='blue', line_width=2, muted_alpha=0)
        curve_S = hv.Curve((S_u[0], S_u[1]), "x̃", "ỹ", label='B').opts(
            color='green', line_width=2, muted_alpha=0)
        unit_circle_curve = hv.Curve((u_circle[0], u_circle[1]), "x̃", "ỹ", label='Unit').opts(
            color='gray', line_dash='dashed', line_width=1, muted_alpha=0)

        # Add generalized eigenvectors in X basis + markers
        L = 1.5
        lines = []
        markers = []
        for i in range(2):
            # Plot along x and y axes
            xs = [-L, L] if i == 0 else [0, 0]
            ys = [0, 0] if i == 0 else [-L, L]
            line = hv.Curve((xs, ys), "x̃", "ỹ", label='Evec').opts(color='red', line_width=1.5)

            # Markers at (c_i, 0) and (0, s_i)
            marker_c = hv.Scatter(([self.c[i]], [0]), label="Evec").opts(color='red', size=8, muted_alpha=0)
            marker_s = hv.Scatter(([0], [self.s[i]]), label="Evec").opts(color='red', size=8, muted_alpha=0)

            lines.append(line)
            markers.append(marker_c)
            markers.append(marker_s)

        plot = (curve_C * curve_S * unit_circle_curve * hv.Overlay(lines) * hv.Overlay(markers)).opts(
            width=400, height=400, show_grid=True, aspect='equal',
            shared_axes=False, legend_position='bottom',
            title='GSVD: Action of A and B in X basis (Axis-aligned)')
        return plot

    def __panel__(self):
        return pn.Row(
                pn.pane.HoloViews(self._plot_ellipses()),
                pn.pane.HoloViews(self._plot_ellipses_in_X_basis())
        )

GSVD_EllipseViewer( np.array([[3.0, 2.0], [1.0, 4.0]]),
                    np.array([[2.0, 0.5], [0.1, 1.5]])).servable()

**GSVD Ellipse Plots**<br>
<div style="padding-left:1cm;">

The *left plot* shows the images of the unit circle under $A X^T$ and $B X^T$ in the original coordinate system.<br>
The red lines indicate the directions of the generalized eigenvectors $x_i$, along which $\rho(x)$ is stationary.<br>
The red markers on these lines are placed at $\sqrt{\mu_i} x_i$, where $\mu_i$ are the generalized eigenvalues.<br>
The orange Rayleigh curve shows $\sqrt{\rho(x)}$ scaled in the direction of $x$, visualizing the ratio of the actions of $A$ and $B$ in all directions.<br>
The generalized eigenvector directions correspond to stationary points of this curve.<br>
<br>
The *right plot* shows the same transformations in the $X$-basis. In this basis, $A$ and $B$ act as diagonal operators<br>
with scalings $c_i$ and $s_i$ along the respective coordinate axes.<br>
Red lines indicate the coordinate axes (generalized eigenvectors),<br>
and red markers are placed at $(c_i, 0)$ and $(0, s_i)$, showing the action of $A$ and $B$ in each eigendirection.

Observe that the stationary directions of $\rho(x)$ in the left plot correspond to the coordinate axes in the $X$-basis (right plot).<br>
The magnitudes $c_i$ and $s_i$ indicate the relative action of $A$ and $B$ in each eigendirection.<br>
The GSVD reveals this structure explicitly.
</div>

# 4. Take Away

- The solution to the **normal equation** $(A^T A + \lambda B^T B) x = A^T b$<br>
depends on the interaction between $A^T A$ and $B^T B$ across all directions.
- The **generalized Rayleigh quotient** $\rho(x)$ identifies directions where the ratio of the actions of $A$ and $B$ is extremal.
- The **generalized eigenproblem** $A^T A x = \mu B^T B x$ reveals these directions and associated generalized eigenvalues $\mu_i$.
- The **GSVD provides a basis** in which both $A$ and $B$ act as diagonal operators, fully exposing their relative action.
- The **stationary directions of $\rho(x)$ correspond to the coordinate axes in this basis.**<br>
The scalings $c_i$ and $s_i$ quantify the strength of $A$ and $B$ in each eigendirection.
- The GSVD makes the structure of the coupled quadratic forms $(A^T A, B^T B)$ fully transparent.