# Overview of Power Method

#### Acknoledgement

All content in this document is a summary of the lectures and course materials of Bumhee Cho. This is the case not only for this document but also for all contents in the repository. You can find the lecture here:

[Linear Algebra with Python - Using NumPy and SciPy](https://www.inflearn.com/en/course/%EC%84%A0%ED%98%95%EB%8C%80%EC%88%98%ED%95%99?attributionToken=kwHwkgoLCL-Is7wGEM-t4ysQARokNjc5MGFiYjItMDAwMC0yODM0LTk4N2UtMjQwNTg4ODE0OTkwKgYxNDgzNDcyOJzWty3Fy_MXjr6dFdSynRXC8J4Vo4CXIra3jC2o5aotjpHJMOGr6zDkq-swmu7GMJ_Wty2Q97IwOg5kZWZhdWx0X3NlYXJjaEgBWAFgAWgBegJzaQ) 

## General Idea of Power Method

Power method is one way to obtain **the eigenvector of the largest absolute size and the corresponding eigenvector of a diagonalizable matrix**. 


Let $A$ be an $n \times n$ diagonalizable matrix. And $\lambda_i$ be the eigenvalues, $\mathbf{v}_i$ be the corresponding eigenvectors ($i=1,2,\dots,n$),  
arranged so that

$$
|\lambda_1| \geq |\lambda_2| \geq \cdots \geq |\lambda_n|.
$$

Using $\{v_i\}$ as basis vectors, any vector $\mathbf{x}_0 \in \mathbb{R}^n$ can be expressed as

$$
\mathbf{x}_0 = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_n \mathbf{v}_n, 
\qquad (c_1, \dots, c_n \text{ are coordinates}).
$$

Now let $\mathbf{x}_k = A^k \mathbf{x}_0$, then

$$
\mathbf{x}_1 = A \mathbf{x}_0 
= c_1 A \mathbf{v}_1 + c_2 A \mathbf{v}_2 + \cdots + c_n A \mathbf{v}_n
= c_1 \lambda_1 \mathbf{v}_1 + c_2 \lambda_2 \mathbf{v}_2 + \cdots + c_n \lambda_n \mathbf{v}_n
$$

$$
\mathbf{x}_2 = A \mathbf{x}_1 
= c_1 A^2 \mathbf{v}_1 + c_2 A^2 \mathbf{v}_2 + \cdots + c_n A^2 \mathbf{v}_n
= c_1 \lambda_1^2 \mathbf{v}_1 + c_2 \lambda_2^2 \mathbf{v}_2 + \cdots + c_n \lambda_n^2 \mathbf{v}_n
$$

$$
\vdots
$$

$$
\mathbf{x}_k = A \mathbf{x}_{k-1} 
= c_1 A^k \mathbf{v}_1 + c_2 A^k \mathbf{v}_2 + \cdots + c_n A^k \mathbf{v}_n
= c_1 \lambda_1^k \mathbf{v}_1 + c_2 \lambda_2^k \mathbf{v}_2 + \cdots + c_n \lambda_n^k \mathbf{v}_n
$$

Divide $\mathbf{x}_k$ by $\lambda_1^k$:

$$
\frac{1}{\lambda_1^k} \mathbf{x}_k 
= c_1 \mathbf{v}_1 + c_2 \Bigl(\tfrac{\lambda_2}{\lambda_1}\Bigr)^k \mathbf{v}_2 + \cdots + c_n \Bigl(\tfrac{\lambda_n}{\lambda_1}\Bigr)^k \mathbf{v}_n.
$$

As $k \to \infty$,

$$
\frac{1}{\lambda_1^k} \mathbf{x}_k 
= c_1 \mathbf{v}_1
$$

Note that $\mathbf{x}_k$ itself is the eigenvector corresponding to $\lambda_1$. Accordingly, $\lambda_1$ can be computed from $\mathbf{x}_k$,

$$
A \mathbf{x}_k = \lambda_1 \mathbf{x}_k
$$

$$
\lambda_1 = \frac{\mathbf{x}_k^\top A \mathbf{x}_k}{\mathbf{x}_k^\top \mathbf{x}_k} 
= \frac{\mathbf{x}_k^\top (A \mathbf{x}_k)}{\mathbf{x}_k^\top \mathbf{x}_k}
$$

This is it.

**Power Method summary:**

Take arbitrary $\mathbf{x}_0$. Multiply $A$ repeatedly:

$$
\mathbf{x}_k = A^k \mathbf{x}_0, \quad k \text{ is large}.
$$

Then,
- Largest eigenvector $\;\; \mathbf{x}_k \approx \mathbf{v}_1$,  
- Largest eigenvalue $\;\; \lambda_1 \approx \dfrac{\mathbf{x}_k^\top (A \mathbf{x}_k)}{\mathbf{x}_k^\top \mathbf{x}_k}$.

How fast $\mathbf{x}_k$ would converge?

- Convergence ratio $\;\; |\frac{\lambda_2}{\lambda_1}|$.

**One Additional Note - Normalization**

For numerical stability, 1. pick a unit vector for $\mathbf{x}_0$ and 2. normalize $\mathbf{x}_k$ at each step.

$$
\mathbf{x}_k \; \leftarrow \; \frac{\mathbf{x}_k}{\|\mathbf{x}_k\|}
$$

Otherwise:

$$
\mathbf{x}_k = c_1 \lambda_1^k \mathbf{v}_1 + c_2 \lambda_2^k \mathbf{v}_2 + \cdots + c_n \lambda_n^k \mathbf{v}_n,
$$

$$
\|\mathbf{x}_k\| \; \rightarrow \; \text{can be too large}
$$

## Power Method Algorithm

1. Initalization - $\mathbf{x}_0$ s.t. $\|\mathbf{x}_0\| = 1$
2. Iteration until converge
    - $\mathbf{x}_k = A\mathbf{x}_{k-1}/\|A\mathbf{x}_{k-1}\|$
    - $\mu_k = \mathbf{x}_k \cdot A\mathbf{x}_{k}/\mathbf{x}_k \cdot \mathbf{x}_k$

where

$$
\mathbf{x}_k \approx \mathbf{v}_1
$$

$$
\mu_k \approx \lambda_1
$$

* Convergence criteria ($10^{-7}$ -- $10^{-12}$)
    * eigvec relative error = $\|\mathbf{x}_k - \mathbf{x}_{k-1}\|/\|\mathbf{x}_k\|$
    * eigval relative error = $|\mu_k - \mu{x}_{k-1}/\mu_k|$
    * Theoratically, the eigenvalue converges faster 

### Example 1.

$$
A=
\begin{bmatrix}
6 & 5 \\
1 & 2
\end{bmatrix}, \; 
\mathbf{x}_0 = 
\begin{bmatrix}
1 \\
0
\end{bmatrix}
$$

$$
\lambda_1 = 7, \; \lambda_2 = 1, \; \mathbf{v}_1 = 
\begin{bmatrix}
1/\sqrt{1.04} \\
0.2/\sqrt{1.04}
\end{bmatrix} = 
\begin{bmatrix}
0.980580 \cdots \\
0.196116 \cdots
\end{bmatrix}
$$

First, let's use ```linalg.eig``` to see the results.

In [1]:
import numpy as np
from scipy import linalg

In [2]:
A = np.array([
    [6, 5],
    [1, 2]
], dtype = np.float64)

eigvals, eigvecs = linalg.eig(A)
print(f'$\lambda_1$ is {np.real_if_close(eigvals[0])}')
print(f'eigenvector is {eigvecs[:, 0]}')

$\lambda_1$ is 7.0
eigenvector is [0.98058068 0.19611614]


Now, let's implement power method.

In [3]:
x0 = np.array([1, 0], dtype = np.float64)

x_prev = x0
mu_prev = 0

threshold = 10**(-9)

while 1:

    # eigvec
    # x_k = A x_{k-1}
    x_new = A @ x_prev
    
    
    # normalize
    x_new = x_new / linalg.norm(x_new)

    # eigval
    mu_new = np.vdot(x_new, A @ x_new) / np.vdot(x_new, x_new)

    # convergence test
    vec_error = linalg.norm(x_new - x_prev) / linalg.norm(x_new)
    val_error = abs ((mu_new - mu_prev) / mu_new)
    if vec_error < threshold:
        break
    if val_error < threshold:
        break

    # update for next iter
    x_prev = x_new
    mu_prev = mu_new

In [4]:
print(f'eigval from power method: {mu_new}')
print()
print(f'eigvec from power method: {x_new}')

eigval from power method: 6.999999999533171

eigvec from power method: [0.98058068 0.19611614]


### Example 2.

$$
A =
\begin{bmatrix}
2 & 1 &  &  &  \\
1 & \ddots & \ddots &  &  \\
    & \ddots & 2 & 1 &  \\
   &  & 1 & 2 & 1 \\
   &  &  & 1 & 2 \\
\end{bmatrix}_{20 \times 20}
$$

In [5]:
diag = 2*np.ones((20,))
off_diag = np.ones((19,))

A = np.diag(diag, k=0) + np.diag(off_diag, k=1) + np.diag(off_diag, k=-1)

eigvals, eigvecs = np.linalg.eigh(A)

# ascending order -> last val
print(f'$\lambda_1$ is {np.real_if_close(eigvals[-1]): .6f}')
print(f'eigenvector is {eigvecs[:, -1]}')

$\lambda_1$ is  3.977662
eigenvector is [0.04599544 0.09096342 0.13389943 0.17384434 0.20990586 0.24127843
 0.26726124 0.28727388 0.30086929 0.30774377 0.30774377 0.30086929
 0.28727388 0.26726124 0.24127843 0.20990586 0.17384434 0.13389943
 0.09096342 0.04599544]


In [6]:
x0 = np.zeros((20,), dtype=np.float64)
x0[0] = 1

x_prev = x0
mu_prev = 0

threshold = 10**(-12)

while 1:

    # eigvec
    # x_k = A x_{k-1}
    x_new = A @ x_prev
    
    
    # normalize
    x_new = x_new / linalg.norm(x_new)

    # eigval
    mu_new = np.vdot(x_new, A @ x_new) / np.vdot(x_new, x_new)

    # convergence test
    vec_error = linalg.norm(x_new - x_prev) / linalg.norm(x_new)
    val_error = abs ((mu_new - mu_prev) / mu_new)
    if vec_error < threshold:
        break
    if val_error < threshold:
        break

    # update for next iter
    x_prev = x_new
    mu_prev = mu_new

In [7]:
print(f'eigval from power method: {mu_new}')
print()
print(f'eigvec from power method: {x_new}')

eigval from power method: 3.9776616523366655

eigvec from power method: [0.0459992  0.09097061 0.1339094  0.17385622 0.20991858 0.24129087
 0.26727229 0.28728255 0.30087482 0.30774567 0.30774187 0.30086375
 0.2872652  0.2672502  0.241266   0.20989315 0.17383247 0.13388946
 0.09095624 0.04599168]


## Inverse Power Method

We can tweak the power method to get the smallest eigenvalue and the corresponding eigenvector.

$$
A \mathbf{v} = \lambda \mathbf{v} \iff A^{-1}\mathbf{v} = \frac{1}{\lambda}\mathbf{v}
$$

Power Method:

$$
\mathbf{x}_k = \frac{A \mathbf{x}_{k-1}}{\|A \mathbf{x}_{k-1}\|}, \; \mu_k = \frac{\mathbf{x}_k \cdot (A \mathbf{x}_{k})}{\mathbf{x}_k \cdot \mathbf{x}_k}
$$

Inverse Power Method:

$$
\mathbf{x}_k = \frac{A^{-1} \mathbf{x}_{k-1}}{\|A^{-1} \mathbf{x}_{k-1}\|}, \; \mu_k = \frac{\mathbf{x}_k \cdot (A^{-1} \mathbf{x}_{k})}{\mathbf{x}_k \cdot \mathbf{x}_k}
$$

Use the power method for $A^{-1}$, get the eigenvalue and inverse it $\Rightarrow$ smalliest eigenvalue of $A$.

But getting an inverse matrix is computationally inefficient and usually not recommended.

So use the relationship $A \mathbf{x}_{k+1} = \mathbf{x}_k$ instead. LU factorization can be useful here.