# Gradient Descent for Linear Systems

## Gradient Descent for a Linear System

Here we will be minimizing the mean square error of $Ax=b$.  That is, we seek to minimize
$$ MSE = \frac{1}{n} \sum_{i=1}^n \left ( b_i - A_i x \right )^2 $$
It turns out the gradient of $MSE$ can be written
$$ \nabla MSE = A'(b-Ax)$$
This means that each iteration of our gradient descent can be written as
$$x^+ = x^c + \alpha A'(b-Ax^c)$$

In [27]:
function grlinearupdate(xc, A, b, alpha)
    x = xc + alpha*A'*(b-A*xc)
    return(x)
end

grlinearupdate (generic function with 4 methods)

In [69]:
function grl_fixed(A, b, x0; alpha=0.01, eps=10e-8, maxiter=25)
    delta = 1
    xc = copy(x0)
    check = norm(A*x-b)
    iter = 0
    while delta>eps && iter<maxiter
        xp = grlinearupdate(xc, A, b, alpha)
#        println(xp)
        xc = xp
        checkp = norm(A*xp-b)
        delta = check - checkp
        check = checkp
        iter += 1
    end
    return xp, check, iter
end

grl_fixed (generic function with 4 methods)

In [61]:
x = [0,0.0]   # initial guess
alpha = 0.01    # fixed step size

0.01

In [62]:
include("../jl/gen_eigm.jl")
using LinearAlgebra

A = gen_eigm([1,2])

2×2 Array{Float64,2}:
  1.05392   -0.225857
 -0.225857   1.94608 

In [63]:
b = rand(2)
norm(A*x-b)

0.5620989829778662

In [64]:
xp = grlinearupdate(x, A, b, alpha)

2-element Array{Float64,1}:
 0.0024929809633145745
 0.008052144176404423 

In [71]:
grl_fixed(A,b,x; alpha=0.5, maxiter=100)

([0.00249298, 0.00805214], 0.36265896079057564, 12)

## Pick a better alpha

In [72]:
alpha = norm(A'*(A*x-b))^2/norm(A*A'*(A*x-b))^2

0.31043632066076726

In [73]:
grl_fixed(A,b,x; alpha=alpha, maxiter=100)

([0.00249298, 0.00805214], 2.1743074721612347e-7, 39)

In [81]:
function grl_1(A, b, x0; eps=10e-8, maxiter=25)
    delta = 1
    xc = copy(x0)
    check = norm(A*x-b)
    iter = 0
    while delta>eps && iter<maxiter
        rc = A*xc-b
        alpha = norm(A'*rc)^2/norm(A*A'*rc)^2
        xp = grlinearupdate(xc, A, b, alpha)
#        println(xp)
        xc = xp
        checkp = norm(A*xp-b)
        delta = check - checkp
        check = checkp
        iter += 1
    end
    return xp, check, iter
end

grl_1 (generic function with 1 method)

In [82]:
x = [0,0]
grl_1(A,b,x; maxiter=100)

([0.00249298, 0.00805214], 9.72453152041342e-8, 26)

In [83]:
x = [1,1]
grl_1(A,b,x; maxiter=100)

([0.00249298, 0.00805214], 4.180010425352371e-8, 20)