### Conjugate Gradient Method ###

* solve $Qx = b$ for $x$
* Medium ground between method of steepest descent (1st order, gradient) and Newton's method (2nd order, uses Hessian)

#### Conditions ####

* $Q$ is symmetric positive-definite

#### Conjugation ####

* The set of nonzero vectors $\{d_1, d_2,..., d_k\}$ are conjugate (also Q-orthogonal) with respect to $Q$ if

$\begin{equation}
d_i^{T} Q d_j = 0 \forall i \ne j
\end{equation}$

* If the set of vectors are Q-orthogonal, they are also linearly independent

#### Optimization Problem ####

Goal: $\min_{x \in \mathbb{R}^n} \frac{1}{2} x^T Q x - b^T x$

the unique solution to this problem is also the unique solution to $Qx = b$

Let $x^{*}$ denote the solution. Let $\{d_0, d_2,..., d_{n-1}\}$ be $Q$-conjugate. They are therefore a basis of the space, so

$x^{*} = \alpha_{0} d_{0} + ... + \alpha_{n-1} d_{n-1}$

and

$d_{i}^T Q x^{*} = d_{i}^T Q(\alpha_{0} d_{0} + ... + \alpha_{n-1} d_{n-1}) = \alpha_{i} d_{i}^TQd_{i}^T$

and

$\alpha_{i} = \dfrac{d_{i}^T Q x^{*}}{d_{i}^TQd_{i}^T}$, so

$x^{*} = \sum_{i=0}^{n-1} \dfrac{d_{i}^T b}{d_{i}^TQd_{i}^T} d_i$

Showing that we don't need to matrix invert $Q$ to solve for $x$.

#### Conjugate Direction Theorem ####

Let $\{d_0, d_2,..., d_{n-1}\}$ be $Q$-conjugate and $x_{0}$ an arbitrary starting point.

The update rule is

$x_{k+1} = x_{k} + \alpha_{k} d_{k}$ where
$g_{k} = Qx_{k} - b$ (gradient), and
$a_{k} = - \dfrac{g_{k}^T d_{k}}{d_{k}^T Q d_{k}} = - \dfrac{(Qx_{k} - b)^T d_{k}}{d_{k}^T Q d_{k}}$

After $n$ steps, $x_{n} = x^{*}$

#### Conjugate Gradient Method ####

We have the update rule $a_{k} = - \dfrac{g_{k}^T d_{k}}{d_{k}^T Q d_{k}} = - \dfrac{(Qx_{k} - b)^T d_{k}}{d_{k}^T Q d_{k}}$, but how should we choose the vectors $d_0,...,d_{n-1}$?

They are chosen on-the-fly, at each step of the algorithm.

Let $x_{i} \in \mathbb{R}^n$ be arbitrary.

$d_{0} = -g_{0} = b - Q x_{0}$
$\alpha_{k} = - \dfrac{g_{k}^T d_{k}}{d_{k}^T Q d_{k}}$
$x_{k+1} = x_{k} + \alpha_{k}d_{k}$
$g_{k} = Qx_{k} - b$
$d_{k+1} = - g_{k+1} + \Beta_{k} d_{k}$
$\Beta_{k} = \dfrac{g_{k+1}^T Q d_{k}}{d_{k}^T Q d_{k}}$

What's $\Beta_{}$

#### Resources ####

1. http://www.cs.cmu.edu/~aarti/Class/10725_Fall17/Lecture_Slides/conjugate_direction_methods.pdf

In [10]:
tol = 1e-3

In [15]:
def conjugate_gradient(A,b,tol):
    x = np.zeros(b.shape[0]) # change this later
    r = b - A @x    # calculate residual
    if np.linalg.norm(r) < tol:
        return x
    p = r
    k = 0
    
    while(True):
        print("k: ", k)
        alpha = np.dot(r,r) / p.T @ A @ p
        x = x + alpha * p
        r_old = r
        r = r_old - alpha * A @ p
        print("r: ", r)
        if np.linalg.norm(r) < tol:
            return x
        Beta = r.T @ r / r_old.T @ r_old
        p = r + Beta * p
        k = k + 1
        if k > 10 * b.shape[0]:
            return x

    

In [16]:
import numpy as np
matrixSize = 10 
B = np.random.rand(matrixSize, matrixSize)
A = np.dot(B, B.T)

x = np.random.rand(matrixSize)
b = A @ x

In [17]:
print("A: ", A)
print("x: ", x)
print("b: ", b)


A:  [[2.19634073 1.98530864 1.84243301 1.74431309 0.90312792 2.0413351
  2.37228523 2.02423152 1.99944406 2.60739996]
 [1.98530864 3.23113766 2.72433425 1.93137626 1.43472151 2.04407131
  2.65556405 2.20058838 2.03521858 3.28170445]
 [1.84243301 2.72433425 4.18757805 2.07588052 1.44218201 2.19623322
  2.4993239  2.5124834  3.02710431 3.22319559]
 [1.74431309 1.93137626 2.07588052 1.7903947  1.00632348 1.49943338
  2.39217706 2.01109051 2.07525757 2.53544206]
 [0.90312792 1.43472151 1.44218201 1.00632348 0.94753205 0.67942596
  1.10868398 1.05889836 0.97844062 1.23934893]
 [2.0413351  2.04407131 2.19623322 1.49943338 0.67942596 2.60137453
  2.12182792 2.18823179 1.89837279 2.63576612]
 [2.37228523 2.65556405 2.4993239  2.39217706 1.10868398 2.12182792
  3.81924269 2.9243779  2.86105937 3.75970569]
 [2.02423152 2.20058838 2.5124834  2.01109051 1.05889836 2.18823179
  2.9243779  3.07545162 2.4763173  2.82382665]
 [1.99944406 2.03521858 3.02710431 2.07525757 0.97844062 1.89837279
  2.86105

In [18]:
x_sol = conjugate_gradient(A,b,tol)

k:  0
r:  [-1.17449621e+08 -1.41022421e+08 -1.55112756e+08 -1.14189557e+08
 -6.35787511e+07 -1.19365369e+08 -1.60207902e+08 -1.39407875e+08
 -1.43657563e+08 -1.80588774e+08]
k:  1
r:  [-2.22499098e+40 -2.67155921e+40 -2.93848955e+40 -2.16323163e+40
 -1.20444960e+40 -2.26128334e+40 -3.03501308e+40 -2.64097289e+40
 -2.72147990e+40 -3.42111273e+40]
k:  2
r:  [-5.37100942e+187 -6.44900127e+187 -7.09335688e+187 -5.22192564e+187
 -2.90747702e+187 -5.45861723e+187 -7.32635952e+187 -6.37516753e+187
 -6.56950716e+187 -8.25838346e+187]
k:  3
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  4
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  5
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  6
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  7
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  8
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  9
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  10
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  11
r:  [nan nan nan nan nan nan nan nan nan nan

  # This is added back by InteractiveShellApp.init_path()
