In [1]:
import numpy as np

### Conjugate Gradient Method ###

* solve $Qx = b$ for $x$
* Medium ground between method of steepest descent (1st order, gradient) and Newton's method (2nd order, uses Hessian)

#### Conditions ####

* $Q$ is symmetric positive-definite

#### Conjugation ####

* The set of nonzero vectors $\{d_1, d_2,..., d_k\}$ are conjugate (also Q-orthogonal) with respect to $Q$ if

$\begin{equation}
d_i^{T} Q d_j = 0 \forall i \ne j
\end{equation}$

* If the set of vectors are Q-orthogonal, they are also linearly independent

#### Optimization Problem ####

Goal: $\min_{x \in \mathbb{R}^n} \frac{1}{2} x^T Q x - b^T x$

the unique solution to this problem is also the unique solution to $Qx = b$

Let $x^{*}$ denote the solution. Let $\{d_0, d_2,..., d_{n-1}\}$ be $Q$-conjugate. They are therefore a basis of the space, so

$x^{*} = \alpha_{0} d_{0} + ... + \alpha_{n-1} d_{n-1}$

and

$d_{i}^T Q x^{*} = d_{i}^T Q(\alpha_{0} d_{0} + ... + \alpha_{n-1} d_{n-1}) = \alpha_{i} d_{i}^TQd_{i}^T$

and

$\alpha_{i} = \dfrac{d_{i}^T Q x^{*}}{d_{i}^TQd_{i}^T}$, so

$x^{*} = \sum_{i=0}^{n-1} \dfrac{d_{i}^T b}{d_{i}^TQd_{i}^T} d_i$

Showing that we don't need to matrix invert $Q$ to solve for $x$.

#### Conjugate Direction Theorem ####

Let $\{d_0, d_2,..., d_{n-1}\}$ be $Q$-conjugate and $x_{0}$ an arbitrary starting point.

The update rule is

$x_{k+1} = x_{k} + \alpha_{k} d_{k}$ where
$g_{k} = Qx_{k} - b$ (gradient), and
$a_{k} = - \dfrac{g_{k}^T d_{k}}{d_{k}^T Q d_{k}} = - \dfrac{(Qx_{k} - b)^T d_{k}}{d_{k}^T Q d_{k}}$

After $n$ steps, $x_{n} = x^{*}$

#### Conjugate Gradient Method ####

We have the update rule $a_{k} = - \dfrac{g_{k}^T d_{k}}{d_{k}^T Q d_{k}} = - \dfrac{(Qx_{k} - b)^T d_{k}}{d_{k}^T Q d_{k}}$, but how should we choose the vectors $d_0,...,d_{n-1}$?

They are chosen on-the-fly, at each step of the algorithm.

Let $x_{i} \in \mathbb{R}^n$ be arbitrary.

$d_{0} = -g_{0} = b - Q x_{0}$
$\alpha_{k} = - \dfrac{g_{k}^T d_{k}}{d_{k}^T Q d_{k}}$
$x_{k+1} = x_{k} + \alpha_{k}d_{k}$
$g_{k} = Qx_{k} - b$
$d_{k+1} = - g_{k+1} + \Beta_{k} d_{k}$
$\Beta_{k} = \dfrac{g_{k+1}^T Q d_{k}}{d_{k}^T Q d_{k}}$

What's $\Beta_{}$

#### Resources ####

1. http://www.cs.cmu.edu/~aarti/Class/10725_Fall17/Lecture_Slides/conjugate_direction_methods.pdf

In [2]:
tol = 1e-3

In [3]:
def conjugate_gradient(A,b,tol):
    x = np.random.rand(b.shape[0]) # unif in [0,1)
    r = b - A @x    # calculate residual
    if np.linalg.norm(r) < tol:
        return x
    p = r
    k = 0
    
    while(True):
        print("k: ", k)
        alpha = np.dot(r,r) / p.T @ A @ p
        x = x + alpha * p
        r_old = r
        r = r_old - alpha * A @ p
        print("r: ", r)
        if np.linalg.norm(r) < tol:
            return x
        Beta = r.T @ r / r_old.T @ r_old
        p = r + Beta * p
        k = k + 1
        if k > 10 * b.shape[0]:
            return x

    

In [4]:
# Debugging on this

A = np.asarray([[1,0],[0,1]]) # A is 2x2 identity
b = np.asarray([4,4])
x_sol = conjugate_gradient(A,b,tol)
# x should equal 4

# why are we getting an exploding residual?

k:  0
r:  [-140.80219343 -165.18722061]
k:  1
r:  [-2.76593028e+10 -3.24495183e+10]
k:  2
r:  [-3.88088431e+48 -4.55300075e+48]
k:  3
r:  [-5.46908175e+222 -6.41625241e+222]
k:  4
r:  [nan nan]
k:  5
r:  [nan nan]
k:  6
r:  [nan nan]
k:  7
r:  [nan nan]
k:  8
r:  [nan nan]
k:  9
r:  [nan nan]
k:  10
r:  [nan nan]
k:  11
r:  [nan nan]
k:  12
r:  [nan nan]
k:  13
r:  [nan nan]
k:  14
r:  [nan nan]
k:  15
r:  [nan nan]
k:  16
r:  [nan nan]
k:  17
r:  [nan nan]
k:  18
r:  [nan nan]
k:  19
r:  [nan nan]
k:  20
r:  [nan nan]


  # This is added back by InteractiveShellApp.init_path()


In [5]:
import numpy as np
matrixSize = 10 
B = np.random.rand(matrixSize, matrixSize)
A = np.dot(B, B.T)

x = np.random.rand(matrixSize)
b = A @ x

In [6]:
print("A: ", A)
print("x: ", x)
print("b: ", b)


A:  [[2.92923027 1.79942045 1.7990432  2.21082261 1.81318821 3.00253346
  1.73840723 2.68846298 2.4763917  1.99620996]
 [1.79942045 2.14068765 1.68653824 2.03639392 2.09705433 2.59188882
  1.60035568 2.19718161 1.36557642 1.39967597]
 [1.7990432  1.68653824 1.95342304 2.07749373 1.83606085 2.29824527
  1.32283344 1.9218278  1.44961448 1.449828  ]
 [2.21082261 2.03639392 2.07749373 3.02463456 2.67421759 3.36292907
  1.87365092 2.54837524 1.74108754 1.6291054 ]
 [1.81318821 2.09705433 1.83606085 2.67421759 3.40025996 3.55411746
  2.02692052 2.66021019 1.98071239 2.25335086]
 [3.00253346 2.59188882 2.29824527 3.36292907 3.55411746 4.49662137
  2.70590331 3.6225019  2.94692459 2.68421054]
 [1.73840723 1.60035568 1.32283344 1.87365092 2.02692052 2.70590331
  2.22406841 2.00394681 1.97147774 1.62767979]
 [2.68846298 2.19718161 1.9218278  2.54837524 2.66021019 3.6225019
  2.00394681 3.721016   2.59484615 2.29282893]
 [2.4763917  1.36557642 1.44961448 1.74108754 1.98071239 2.94692459
  1.97147

In [7]:
x_sol = conjugate_gradient(A,b,tol)

k:  0
r:  [-5647.99922834 -5435.1845743  -5009.73298217 -7106.31258846
 -7048.7528301  -8678.16310124 -4977.49532107 -6992.17626586
 -4852.4773191  -4720.86679655]
k:  1
r:  [-2.31369039e+21 -2.22653824e+21 -2.05225071e+21 -2.91148052e+21
 -2.88768400e+21 -3.55519876e+21 -2.03896874e+21 -2.86438524e+21
 -1.98760990e+21 -1.93367468e+21]
k:  2
r:  [-2.48614966e+101 -2.39250130e+101 -2.20522263e+101 -3.12849824e+101
 -3.10292796e+101 -3.82019833e+101 -2.19095064e+101 -3.07789254e+101
 -2.13576358e+101 -2.07780810e+101]
k:  3
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  4
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  5
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  6
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  7
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  8
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  9
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  10
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  11
r:  [nan nan nan nan nan nan nan nan nan nan]
k:  12
r

  if sys.path[0] == '':
  
  
