In [1]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import scipy.linalg as linalg
import math

import panel as pn; pn.extension()
import holoviews as hv; hv.extension('bokeh', logo=None);

<div style="float:center;width:100%;text-align: center;"><strong style="height:100px;color:darkred;font-size:40px;">Rayleigh Quotients</strong></div>

# 1. The Normalized Quadratic Form

## 1.1 Power Iteration

Upon convergence, the iterative power method to [find a dominant eigenvalue](IterativeMethods_python.ipynb)<br>$\qquad$
resulted in the problem to estimate the eigenvalue from the square matrix $A$<br>
$\qquad$ and an approximate eigenvector $x_{\text{approx}}$.

Since $A x = \lambda x$, we might substitute $x = x_{\text{approx}}$ to get equations for $\lambda$, i.e., $\; \lambda\ x_{\text{approx}} \approx A x_{\text{approx}}$

For example,
$$
x_{\text{approx}} =
 \begin{pmatrix} 1.10 \\ 1.98 \\ 2.00 \end{pmatrix}, \quad A x_{\text{approx}} =
 \begin{pmatrix} 2.10 \\ 4.00 \\ 3.90 \end{pmatrix} \quad \Rightarrow \left\{\begin{align}
    1.10 \lambda & = 2.10 \\
    1.98 \lambda & = 4.00 \\
    2.00 \lambda & = 3.90 \\
 \end{align}\right.
$$

A reliable approach is to simply multiply through by $x^t$ from the left:

$\begin{align}
A x = \lambda x \quad & \Rightarrow \quad     & x^t A x & = & \lambda x^t x & \\
                      & \Leftrightarrow \quad & \lambda & = & \frac{ x^t A x }{x^t x} & \quad \text{ since } \lVert x \rVert \ne 0.
\end{align}$

For our example, this yields the estimate
$\lambda \approx \frac{x_{\text{approx}} \cdot \left( A x_{\text{approx}} \right) }
                      {\Vert x_{\text{approx}} \Vert^2 } \approx 1.97 $

**Definition:** Let $R(x) = \frac{x^t A x}{x^t x}, x \ne 0.$

**Remarks:**
* $R(x) = \frac{x^t A x}{\Vert x \Vert^2} = \hat{x}^t A \hat{x}, x \ne 0,\;\;$ where $\hat{x} = \frac{1}{\Vert x \Vert} x.$
* $R(\alpha x) = R(x)$ for all $\alpha \ne 0.$
* $\therefore$ to investigate the range of $y=R(x),$ it is sufficient to consider the ball $\Vert x \Vert =1$.

## 1.2. The Reduced SVD $\mathbf{A = U_r \Sigma_r V_r^t}$

<div style="float:left;width:37%;padding:5pt;">

The SVD shows that applying a matrix $A$<br>$\quad$ to a ball $\left\{ x \;\mid\; \Vert x \Vert = 1 \right\}$<br>
$\quad$ converts the ball into an **ellipsoid**,<br>
$\quad$ with major **axis intersections at** $\mathbf{\sigma_i}$.

**Remark:** this is the theorem we used for<br>$\quad$ the [Principal Component Analysis](PCA_and_SVD.ipynb) <br>$\qquad$ seen previously.
</div>
<div style="float:left;width:59%;padding:5pt;">
<div style="float:left;padding-left:5pt;width:100%;background-color:#F2F5A9;color:black;">

**Theorem:** Given a matrix $A \in \mathbb{R}^{M \times N}$ with reduced SVD $A = U_r \Sigma_r V^t_r$.<br>

$\quad$ $\sigma_1 = \quad \underset{\lVert q \rVert = 1}{\operatorname{max}}\;\; \lVert A\ q \rVert$, 
$\quad$ $\ v_1 = \; \underset{\lVert q \rVert = 1}{\operatorname{argmax}}\ \lVert A\ q \rVert$,<br>
<!-- -->
$\quad$ $\sigma_2 = \underset{\lVert q \rVert = 1, q \perp v_1}{\operatorname{max}}\ \lVert A\ q \rVert$, 
$\quad$ $v_2 = \underset{\lVert q \rVert = 1, q \perp v_1}{\operatorname{argmax}}\ \lVert A\ q \rVert$,<br>
$\quad \dots,\quad$ and further<br>
$\quad$ $\sigma_r = \quad \underset{\lVert q \rVert = 1}{\operatorname{min}}\;\; \lVert A\ q \rVert$, 
$\quad$ $\ v_r = \; \underset{\lVert q \rVert = 1}{\operatorname{argmin}}\ \lVert A\ q \rVert$
</div>

Let's illustrate the theorem with an example for a matrix of size $2 \times 2$

In [2]:
def apply_covariance_matrix_to_data(covx = np.array([[1,0.8],[0.8,2]]), u = np.random.uniform(-1, 1, (2, 500))  ):
    N    = u.shape[1]
    rng  = dict(x=(u[0,:].min(), u[0,:].max()), y=(u[1,:].min(), u[1,:].max()))
    
    y = covx @ u                                # apply the covariance matrix as a linear transform
    p = [np.stack([u[:,i],y[:,i]]) for i in range(0,y.shape[1],10)] # set up the paths connecting points to their transforms

    # compute the singular vectors of the covariance matrix
    #    and find the axes
    origin = y.mean(axis=1)
    e1, v1 = np.linalg.eig(covx)
    v1[:,0] *= 500*e1[0]; v1[:,1] *= 500*e1[1]
    a = [np.stack([v1[:,i], origin]) for i in range(2)]

    return (hv.Path(p)*hv.Path(a, group='AXIS')*hv.Scatter(u.T, group='A')*hv.Scatter(y.T, group='B'))\
           .opts(frame_height=300).redim.range(**rng)

theta = np.linspace(0, 2*np.pi, 500)
h=\
apply_covariance_matrix_to_data(u = np.stack( [np.cos(theta), np.sin(theta)]))

pn.Row( h.opts(
    hv.opts.Path (alpha=.5, color='black', line_dash='dotted'),
    hv.opts.Path ('AXIS', apply_ranges=False, color='darkgreen', alpha=1, line_width=2, line_dash='solid'),
    hv.opts.Scatter(size=5, alpha=.3),
    hv.opts.Scatter('A', color='blue'),
    hv.opts.Scatter('B', color='indianred', alpha=.6),
    hv.opts.Overlay( aspect='equal', title='Apply A to All Points on a Circle')
).redim.range(x=(-2.5,2.5), y=(-2.5,2.5)),
pn.pane.Markdown("""
<br><br>

* The **axes of the ellipse** are the singular vectors **v₁** and **v₂**
* the **major an minor axes** of the ellipse are **σ₁** and **σ₂**
* the mapping of a point on the circle to a point on the ellipse
is shown by a dashed line
* all points on the ellipse are at a **distance d from the center,** where<br>
 **σ₂ ≤ d ≤ σ₁**
""")
)

We obtained the singular values $\sigma$ from the eigenproblem $A^t A x = \lambda x$.<br>
$\qquad$ Multiplying through by $x^t$ and solving for $\lambda = \sigma^2$, we again see the normalized quadratic form<br>

$
\qquad \sigma^2 = \frac{ x^t A^t A x }{ x^t x } = \frac{ \Vert A x \Vert^2}{ \Vert x \Vert^2}, \;\; x \ne 0.
$

The above theorem states in particular that for all $x \ne 0$<br>

$
\qquad  \sigma_1^2 = \max_x \frac{ x^t A^t A x }{ x^t x },\quad
$ and that this maximum is achieved at $x = v_1$.

Futher,<br>

$
\qquad  \sigma_r^2 = \min_x \frac{ x^t A^t A x }{ x^t x },\quad
$ and that this minimum is achieved at $x = v_r$, where $r = rank(A) = 2.$

# 2. Rayleigh Quotient Definition and Theorem

## 2.1 Definition

**Definition:** The Rayleigh Quotient of a **symmetric** matrix $S$ of size $N \times N$ is the normalized quadratic form<br><br>$\qquad$
$R(x) = \frac{x^t S x}{x^t x}, x\ne 0,\quad$  a function $R : x\in \mathbb{R}^N \rightarrow \mathbb{R}^{+}$.

**Remarks:**
* Any symmetric matrix $S$ has an orthogonal eigendecomposition $S = Q^t \Lambda Q$.
* setting $\frac{x}{\Vert x \Vert} = Q \tilde{x}$, $R(\tilde{x}) = \tilde{x}^t \Lambda \tilde{x} = \sum_i \lambda_i \tilde{x}_i^2$.<br>
$\qquad$ Since the eigenvalues are real, we can rank-order them: $\lambda_1 \ge \lambda_2 \ge \dots \Lambda_N$.
* To compute the SVD of $S$, we require an orthogonal eigendecomposition of<br>
$\qquad$ $S^t S = Q^t \Lambda Q Q^t \Lambda Q = Q^t \Lambda^2 Q$.<br><br>
Therefore, given an eigenpair $(\lambda, v)$ of $S$ with $\lambda \ne 0$, then $( \vert \lambda \vert, v)$ is a singular pair of $S$.

This allows us to rewrite the previous theorem in terms of the Rayleigh Coefficient:

<div style="float:left;padding-left:5pt;width:98%;background-color:#F2F5A9;color:black;">

**Theorem:** Given a matrix symmetric matrix $S$ with ordered eigenvalues $\lambda_1 \ge \lambda_2 \ge \dots \lambda_N$. Then

<div style="float:left;padding-left:5pt;width:46%;background-color:#F2F5A9;color:black;">

$\quad$ $\lambda_1 = \quad \underset{x \ne 0}{\operatorname{max}}\;\; R(x)$, 
$\quad$ $\ v_1 = \; \underset{x \ne 0}{\operatorname{argmax}}\ R(x)$,<br>
<!-- -->
$\quad$ $\lambda_2 = \underset{x \ne 0,\ x \perp v_1 }{\operatorname{max}}\ R(x)$, 
$\quad$ $v_2 = \underset{x \ne 0,\ x \perp v_1}{\operatorname{argmax}}\ R(x)$,<br>
$\quad \dots$
</div>
<div style="float:left;border-left:2px solid black;padding-left:0.5cm;width:42%;background-color:#F2F5A9;color:black;">
    The smallest eigenvalue of $S$ satisfies
    
$\quad$ $\lambda_N = \quad \underset{x \ne 0}{\operatorname{min}}\;\; R(x)$, 
$\quad$ $\ v_N = \; \underset{x \ne 0}{\operatorname{argmin}}\ R(x)$,<br>
</div>
</div>

This result is actually **easily verified:** $S$ has an orthogonal decomposition $S = Q \Lambda Q^t$.<br>
$\qquad$ substituting $x = Q \tilde{x}$ in $R(x) = x^t S x, \;\Vert x \Vert = 1,$ we obtain<br>
$\qquad\qquad R(\tilde{x}) = \lambda_1 \tilde{x}_1^2 + \lambda_2 \tilde{x}_2^2 + \dots  \lambda_N \tilde{x}_N^2$,<br>
$\qquad$ where the $\tilde{x}_i$ are the entries in $\tilde{x}$.

$\qquad$ Replacing $\lambda_i$ with either $\lambda_1$ or $\lambda_N$ yields the bounds $\lambda_N \le R(x) \le \lambda_1.$<br>
$\qquad$$\qquad$ Note that the bounds are achieved at $x = q_N$ and $x = q_N$ respectively.

$\qquad$The remaining statements follow in the same way by considering $\tilde{x}$ and successively setting $\tilde{x}_1 =0, \tilde{x}_2 =0, \dots$


**Eigenvalues occur at the critical points of $R(x)$**:

Computing the gradient fo $R(x)$ for a matrix $S$ of size $N \times N$, we find
$$\nabla R(x) = \frac{2}{ \lVert x \rVert^2 } \left( S - R(x)\ I \right) x$$

Setting the gradient equal to zero to find the critical points, we obtain
$$
S x = R(x) x
$$

**In summary,** we have<br>

$\qquad
\lambda_N \le R(x) \le \lambda_1,
$,
where the maximum and minimum values of $R(x)$ are achieved<br>$\qquad$ at the eigenvectors $v_1$ and $v_N$ respectively.

$\qquad$ All other eigenpairs are the saddle points of $R(x)$.

## 2.2. A 2x2 Example, Symmetric Matrix

In [3]:
def show_2x2_rayleigh_quotient(A, N=100, symmetric=True):
    def angle( v ):
        theta = math.atan2(v[1],v[0])*180/np.pi
        return theta + 180 if theta < 0 else theta
    def R_phi(phi):
        x = np.array( [np.cos(phi), np.sin(phi)] )
        return x.dot( A @ x )

    # get the eigenvalues, and
    if symmetric:
        evals, evecs = np.linalg.eigh(A)
    else:
        evals, evecs = np.linalg.eig(A)
    svals        = [ abs(e) for e in evals ]
    evec_angles  = [angle(evecs[:,0]),angle(evecs[:,1])]

    # graph it
    d_phi = 180. / (N+1)
    phi   = np.arange(0, 180 + d_phi, d_phi)
    R     = [R_phi(np.pi/180*p) for p in phi ]

    h_l   = [ hv.Curve( (phi,R), "angle", "R(x)" ).opts( width=400),
              hv.HLine(svals[0]), hv.HLine(svals[1]),
              hv.VLine( evec_angles[0]),hv.VLine( evec_angles[1]),
              hv.Scatter( (evec_angles, svals) ).opts(size=4, color="red")
            ]
    return hv.Overlay( h_l ).opts( hv.opts.HLine(line_width=0.7, color='black'),
                                   hv.opts.VLine(line_width=0.7, color='black'))

In [4]:
A = np.array([ [ 39., -48], [ -48, 11]])/25
h = show_2x2_rayleigh_quotient(A)
pn.Row(h.opts(title="Rayleigh Coefficient versus Singular Vector Angle", show_grid=True, width=500))

## 2.3. A 2x2 Example, Non-Symmetric Matrix

**Remark:** The theorem does not hold if the matrix is not symmetric:

$\qquad$ the eigenvalues still lie on the $R(x)$ curve, but are not at the critical points in general.

In [5]:
A=np.array([[ 1, 0 ], [1, 2]])
h = show_2x2_rayleigh_quotient(A, N=100, symmetric=False)
pn.Row(h.opts(title="R(x) versus Angle, A ≠ Aᵗ", show_grid=True, width=500))

## 2.4 A 3x3 Example

In [6]:
hv.extension('plotly', logo=None)

In [7]:
def show_3x3_rayleigh_quotient( A, N =1):
    print("Using Spherical Coordinates for the unit vectors")
    phi,theta = np.mgrid[0:360*N, 40:220*N] / N
    c = np.pi/180
    sp = np.sin(c*phi[  :,0]); cp = np.cos(c*phi[  :,0])
    st = np.sin(c*theta[0,:]); ct = np.cos(c*theta[0,:])
    R  = np.empty(phi.shape)
    for i in range(len(sp)):
        for j in range(len(st)):
            x = np.array( [st[j] *cp[i], st[j] * sp[i], ct[j]])
            R[i,j] = x.dot( A @ x )
    return [hv.TriSurface((phi.flat,theta.flat,R.flat), ["θ", "ϕ", "R(x)"]).opts(title="Singular Values -1,1,3"),
            hv.Raster(R, ["θ", "ϕ"]).opts(title="Projection into [ϕ,θ] Colored by Value")]

A=A = np.array([
[ 417., 240, 0],
[ 240, 739, 0],
[ 0, 0, -289],
])/289.
pn.Column(*show_3x3_rayleigh_quotient( A))

Using Spherical Coordinates for the unit vectors


# 3. Minimax Theorem

The Rayleigh quotient is a building block for a great deal of theory.

An important consequence of the previous theorem for numerical calculations is that
* the maximum eigenvalue of a symmetric matrix $S$ can be found by solving a maximization problem
* the minimum eigenvalue can be found by solving a a minimization problem.
* reducing the dimension of $S$ we can find all the eigenvalues, one by one.

The rephrased theorem is as follows:

<div style="float:left;padding-left:5pt;width:98%;background-color:#F2F5A9;color:black;">

**Theorem:** Let $S$ be a symmetric matrix with eigenvalues numbered in an decreasing sequence<br>
$\qquad\qquad
\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_n
$<br>
$\qquad$ with corresponding to the eigenvectors<br>
$\qquad\qquad v_1, \dots, v_n$.

Then its Rayleigh quotient $R(x) = \frac{x^t S x}{x^t x}, \;\; x\ne 0$ satisfies

$\qquad\qquad
\begin{align}
\max_{ x \ne 0} R(x) = \lambda_1 \\
\min_{z\ne 0}\ \max_{x^t z = 0} R(x) = \lambda_2 \\
\min_{z_1,z_2 \ne 0}\ \max_{
        x^t z_1 = 0, x^t z_2 = 0
        }
        R(x) = \lambda_3 \\
\dots
\end{align}$
</div>