In [17]:
using PyPlot

Qt: Untested Windows version 10.0 detected!


# Problem 1
Develop a Newton's method to approximate $e^{-1}$ that does not involve division. Start at $x_0 = 0.3$ and run this thing.

## Solution: 
Using $f(x) = \log(\frac{x}{d})$, we obtain the iterative method: 

$$x^{k+1} = x^k - (\log(x^k) - \log(d)) x^k$$

Now, we run it.

I don't see any significant differences between the two $x_0$ runs. One takes longer than the other?

In [22]:
x0 = 0.3
x = Float64[]
push!(x,x0)
push!(x, x[end] - x[end]*(log(x[end]) - log(d)))
d = e
eps = 1.0e-8
its = 0

while abs(x[end] - x[end-1]) > eps
    push!(x, x[end] - x[end]*(log(x[end]) - log(d)))
    its += 1
end

@printf "The approximation: %10f\n" x[end]
@printf "The error: %10f\n" abs(x[end] - e)
@printf "Iterations: %5d\n" its

The approximation:   2.718282
The error:   0.000000
Iterations:     6


In [23]:
x0 = 1.0
x = Float64[]
push!(x,x0)
push!(x, x[end] - x[end]*(log(x[end]) - log(d)))
d = e
eps = 1.0e-8
its = 0

while abs(x[end] - x[end-1]) > eps
    push!(x, x[end] - x[end]*(log(x[end]) - log(d)))
    its += 1
end

@printf "The approximation: %10f\n" x[end]
@printf "The error: %10f\n" abs(x[end] - e)
@printf "Iterations: %5d\n" its

The approximation:   2.718282
The error:   0.000000
Iterations:     5


# Problem 2
a) Newton's method on $f(x) = x^q$. What's the convergence ratio there?

b) Apply Newton's method to minimize $f(x) = \| x \|_2^\beta$. For what $x_0$ and $\beta$ does it converge? What happens when $\beta \leq 1$?

c) Repeat part b, but with Armijo linesearch. 

## Solution
a) If we apply the straight root-finding method, then we have this iterative method:

$$x^{k+1} = x^k - \frac{f(x)}{f'(x)} = x^k - \frac{(x^k)^q}{q(x^k)^{q-1}}$$

$$= x^k - \frac{1}{q}x^k = x^k (1 - \frac{1}{q})$$

Therefore the method is Q-linear, with $\mu = 1 - \frac{1}{q}$. 

If we apply the method instead to the derivative, then through the same procedure we obtain $\mu = 1 - \frac{1}{q-1}$.


b) The method is implemented in the next code box. The following values of $\beta$ don't converge: 2,4. It's not entirely clear to me which x-values converged and which didn't.

In [56]:
function f(x,b)
    return norm(x,2)^b
end

function g(x,b)
    return x*b*norm(x,2)^(b-2)
end

function H(x,b)
    return b*((b-2)*norm(x,2)^(b-4)*x*x' + norm(x,2)^(b-2)*ones(size(x))*ones(size(x))')
end

H (generic function with 1 method)

In [142]:
# Loop to test different \beta values
for b = 2:8
    # Loop to try different x_0 points
    for j = 1:10
        x0 = 10*randn(5,);
        x = x0;

        # Method
        for i = 1:20
            # Modify the Hessian to be PD
            A = H(x,b);
            E = eigfact(A);
            V = E[:vectors];
            lambda = diagm(max(real(E[:values]),1e-2));
            d = -V*inv(lambda)*V'*g(x,b)
            
            # Iterate
            x = x + d;
        end
        
        @printf "Iteration: %2d | Beta: %2.2f | Starting norm: %4.4f | x-norm: %4.4f\n\n" j b norm(x0,2) norm(x,2)
    end
end

Iteration:  1 | Beta: 2.00 | Starting norm: 23.6898 | x-norm: 216490987487800151536161307619514856200384544768.0000

Iteration:  2 | Beta: 2.00 | Starting norm: 27.4912 | x-norm: 247190634509125503673258390417833169621587329024.0000

Iteration:  3 | Beta: 2.00 | Starting norm: 16.9152 | x-norm: 134675330029214103550768570323400256974175076352.0000

Iteration:  4 | Beta: 2.00 | Starting norm: 30.9686 | x-norm: 292439090084129146604804161847544770923002331136.0000

Iteration:  5 | Beta: 2.00 | Starting norm: 29.1641 | x-norm: 265845187141827010973203145132181853286059474944.0000

Iteration:  6 | Beta: 2.00 | Starting norm: 9.7269 | x-norm: 87300760469436022203221086922375698265118081024.0000

Iteration:  7 | Beta: 2.00 | Starting norm: 25.0673 | x-norm: 235666402632398449102159876263025313607613153280.0000

Iteration:  8 | Beta: 2.00 | Starting norm: 7.9419 | x-norm: 72229727949269831570033102139016722091570364416.0000

Iteration:  9 | Beta: 2.00 | Starting norm: 16.3832 | x-norm: 138925

LoadError: LoadError: ArgumentError: invalid argument #5 to LAPACK call
while loading In[142], in expression starting on line 1

 | Beta: 6.00 | Starting norm: 19.7790 | x-norm: 31598884892383528374234970587136.0000

Iteration:  1 | Beta: 7.00 | Starting norm: 22.6225 | x-norm: 0.6045



c) The Armijo linesearch is implemented in the following code box. I randomly sampled points in a radius-10 ball from the origin and tried all values of $\beta$ from 2 to 8. They all reduced the norm of the point! This is a significant improvement from part b.

In [146]:
# Loop to test different \beta values
for b = 2:8
    # Loop to try different x_0 points
    for j = 1:10
        x0 = 10*randn(5,);
        x = x0;

        # Method
        for i = 1:60
            # Modify the Hessian to be PD
            A = H(x,b);
            E = eigfact(A);
            V = real(E[:vectors]);
            lambda = diagm(max(real(E[:values]),1e-2));
            d = -V*inv(lambda)*V'*g(x,b)
            
            #Backtracking linesearch
            alpha = 1;
            mu = 10^-2.0;
            newf = f(x + alpha*d,b);
            while newf > f(x,b) + (alpha*mu)*(dot(g(x,b),d))
                newf = f(x + alpha*d,b);
                alpha = alpha/2;
            end
            
            # Iterate
            x = x + alpha * d;
        end
        
        @printf "Iteration: %2d | Beta: %2.2f | Starting norm: %4.4f | x-norm: %4.4f\n\n" j b norm(x0,2) norm(x,2)
    end
end

Iteration:  1 | Beta: 2.00 | Starting norm: 16.5399 | x-norm: 9.4952

Iteration:  2 | Beta: 2.00 | Starting norm: 17.3253 | x-norm: 5.2006

Iteration:  3 | Beta: 2.00 | Starting norm: 15.8226 | x-norm: 3.2345

Iteration:  4 | Beta: 2.00 | Starting norm: 11.2295 | x-norm: 0.1337

Iteration:  5 | Beta: 2.00 | Starting norm: 19.3075 | x-norm: 4.8249

Iteration:  6 | Beta: 2.00 | Starting norm: 27.2600 | x-norm: 5.6989

Iteration:  7 | Beta: 2.00 | Starting norm: 18.1808 | x-norm: 3.7263

Iteration:  8 | Beta: 2.00 | Starting norm: 25.6922 | x-norm: 7.0310

Iteration:  9 | Beta: 2.00 | Starting norm: 18.0183 | x-norm: 12.7897

Iteration: 10 | Beta: 2.00 | Starting norm: 14.0150 | x-norm: 0.0927

Iteration:  1 | Beta: 3.00 | Starting norm: 14.9877 | x-norm: 0.2317

Iteration:  2 | Beta: 3.00 | Starting norm: 29.2475 | x-norm: 10.5763

Iteration:  3 | Beta: 3.00 | Starting norm: 12.6303 | x-norm: 0.7693

Iteration:  4 | Beta: 3.00 | Starting norm: 26.7673 | x-norm: 14.3380

Iteration:  5 | B

# Problem 3
Suppose $D^0$ is symmetric and that $D^k$ is updated according to the formula 

$$D^{k+1} = D^k + \frac{y^k y^{k'}}{q^{k'} y^{k}}$$

where $y^k = p^k - D^k q^k$. Show that we have

$$D^{k+1} q^i = p^i$$ 

for all $k$ and $i \leq k$. 

Conclude that for a positive definite quadratic problem, after $n$ steps for which $n$ linearly independent increments $q^0, ..., q^{n-1}$ are obtained, $D^n$ is equal to the inverse Hessian of the cost function.

## Solution
Don't have this one yet. An update is coming!

# Problem 4
Add BFGS quasi-Newton option to the previous Newton method and compare it with the regular Newton on Toms566 problems.

## Solution
Don't have one yet. Will do one soon.