The Chebyshev Iteration
===
Let $A$ be SPD, and $C$ an SPD preconditioner. If we perform $n$ steps of the Richardson iteration with damping parameter $\tau$, the error $e^n$ follows from the initial error $e^0$ via 

$$
e^n = (I - \tau C^{-1} A)^n e^0.
$$

Now, we allow to use different damping parameters $\tau_k$ in every iteration. Then

$$
e^n = (I - \tau_m C^{-1} A) \cdots (I - \tau_2 C^{-1} A )(I - \tau_1 C^{-1} A) e^0.
$$

With the polynomial

$$
p(\lambda) = \Pi_{i=1}^n (1 - \tau_i \lambda)
$$

we can write 

$$
e^n = p(C^{-1} A) e^0
$$

We observe that $p(.)$ is a polynomial of degree $m$, such that $p(0) = 1$. Let $(\lambda_i, z^i)$ be the eigen-system of $C^{-1} A$, and expand errors with respect to the eigen-basis:

$$
\| e^n \|_C^2 = \sum_{i=1}^N {e^n_i}^2 = \sum_{i=1}^N \big( p(\lambda_i) {e^0_i} \big)^2 \leq \sup_{\lambda \in \sigma(C^{-1} A)} p(\lambda)^2 \; \| e^0 \|_C^2
$$


The goal is now to find damping parameters $\tau_1, \ldots \tau_m$ such that

$$
\max_{\lambda_i \in \sigma(C^{-1} A)} p(\lambda_i)
$$

is minimal. It is rarely feasible to work with the precise spectrum. But often we have bounds such that $\sigma(C^{-1} A) \subset [\gamma_1, \gamma_2]$ with $0 < \gamma_1 < \gamma_2$. Then, we can simplify the problem to optimize the polynomial such that

$$
\min_{p \in \text{Pol(m)} \atop p(0) = 1} \max_{\lambda \in [\gamma_1, \gamma_2]}
 p(\lambda)
$$

The solution to this min-max problem is given by Chebyshev polynomials.

Chebyshev polynomials
---

Chebyshev polynomials (of the first kind) are defined via the three-term recurrence relation 

\begin{eqnarray*}
T_0(x) & = & 1 \\
T_1(x) & = & x \\
T_{n+1}(x) & = & 2 x T_n(x) - T_{n-1}(x) \qquad n \geq 1
\end{eqnarray*}

Using induction and the trigonometric addition formulas one easily 
shows that

$$
T_n(x) = \cos( n \arccos (x)) \qquad \text{for} \; x \in [-1,1]
$$

and thus 

$$
\sup_{x \in [-1,1]} T_n(x) = 1
$$

In [None]:
def Cheby(n,x):
    T,Told = x, 1+0*x
    for i in range(n):
        T,Told = 2*x*T-Told, T
    return Told

In [None]:
%matplotlib widget

import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(6, 4))
fig.canvas.toolbar_visible = True
fig.canvas.header_visible = False
ax.set_ylim([-3, 3])
ax.grid(True)

x = np.linspace(-1.2, 1.2, 500)
for k in [1, 2, 5, 20]:
    ax.plot (x, Cheby(k,x), label='T'+str(k))
ax.legend()

We rescale 
* the argument such that $\lambda = \gamma_1$ is mapped to $-1$ and $\lambda = \gamma_2$ is mapped to $+1$,
* and scale the range such that $\widetilde T_n(0) = 1$:

$$
\widetilde T_n(\lambda) = \frac{1}{T_n \left(\tfrac{-\gamma_1-\gamma_2}{\gamma_2-\gamma_1}\right)} 
T_n \left(\tfrac{2 \lambda -\gamma_1-\gamma_2}{\gamma_2-\gamma_1}\right)
$$

In [None]:
def ScaledCheby(n,lam,gamma1, gamma2):
    x = -(gamma2+gamma1)/(gamma2-gamma1)+2/(gamma2-gamma1)*lam
    fac = 1/Cheby(n, -(gamma2+gamma1)/(gamma2-gamma1))
    return fac*Cheby(n,x)

We compare the scaled Chebyshev polynomial to the error reduction of $n$ steps Richardson iteration for the error component in the eigen-space corresponding to an eigen-value $\lambda$:
$$
(1 - \tau_{\text{opt}} \lambda)^n \qquad \text{with} \qquad \tau_\text{opt} = \frac{2}{\gamma_1 + \gamma_2}
$$

In [None]:
%matplotlib widget

import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(6, 4))
fig.canvas.toolbar_visible = True
fig.canvas.header_visible = False

ax.set_ylim([-1.5, 1.5])
ax.grid(True)
# ax.set_title("")
 
gamma2 = 1

@widgets.interact(n=(0, 50, 1), gamma1=(0.01, 1, 0.01))
def update(n = 5, gamma1=0.1):
    s1 = np.linspace(0, gamma1, 500)
    s2 = np.linspace(gamma1, gamma2, 500)
    for l in list(ax.lines): l.remove()
    ax.plot(s1, ScaledCheby(n,s1,gamma1, gamma2), color='r', linestyle='dashed')
    ax.plot(s2, ScaledCheby(n,s2,gamma1, gamma2), color='r', label='Cheby')
    tauopt = 2/(gamma1+gamma2)
    ax.plot(s1, (1-tauopt*s1)**n, color='b', linestyle='dashed')
    ax.plot(s2, (1-tauopt*s2)**n, color='b', label='Richardson')
    
ax.legend()

The maximum of $\tilde T$ on the interval $[\gamma_1, \gamma_2]$ is
$$
\frac{1}{T_n \big(\tfrac{-\gamma_1-\gamma_2}{\gamma_2-\gamma_1}\big)}
$$
