Computing the inverse of Hessian in Newton's method #832

qzhu2017 · 2020-07-10T23:48:57Z

I am new to Julia. Recently, I tried to use the Optim package to optimize the Rosenbrock function with the Newton's method. The code is given as follows,

f_rb(x) = (1-x[1])^2 + 100*(4x[2] - x[1]^2)^2 

function g!(storage, x)
    storage[1] = (2*(200x[1]^3 - 800x[1]*x[2] + x[1] - 1))
    storage[2] = (-800*(x[1]^2 - 4x[2]))
end

function h!(storage, x)
    storage[1, 1] = 1200x[1]^2 - 1600x[2] + 2
    storage[1, 2] = -1600x[1]
    storage[2, 1] = -1600x[1]
    storage[2, 2] = 3200
    
end

x0 = [-1.0, 1.5]
res = optimize(f_rb, g!, h!, x0, Newton(),
               Optim.Options(iterations = 30,
                             store_trace = true,
                             extended_trace = true)

              )

However, the path from Optim (green) looks different from what I obtained based on own code (orange).

Clearly, the Newton SOLVER from Optim chose a different direction at the first step. By checking the code, I realized that the code used the cholesky factorization by default to compute inverse hessian.

Optim.jl/src/multivariate/solvers/second_order/newton.jl

Lines 64 to 77 in 0794f44

    
           if typeof(NLSolversBase.hessian(d)) <: AbstractSparseMatrix 
        
               state.s .= -NLSolversBase.hessian(d)\convert(Vector{T}, gradient(d)) 
        
           else 
        
               state.F = cholesky!(Positive, NLSolversBase.hessian(d)) 
        
               if typeof(gradient(d)) <: Array 
        
                   # is this actually StridedArray? 
        
                   ldiv!(state.s, state.F, -gradient(d)) 
        
               else 
        
                   # not Array, we can't do inplace ldiv 
        
                   gv = Vector{T}(undef, length(gradient(d))) 
        
                   copyto!(gv, -gradient(d)) 
        
                   copyto!(state.s, state.F\gv) 
        
               end 
        
           end

For a simple problem like 2*2 hessian, do we really need to do the cholesky factorization? Perhaps, it is better to have an option to let user compute H^-1 directly when H has a small size.

timholy · 2020-07-11T19:44:34Z

Is H positive-definite at the starting point? Newton's method is not guaranteed to converge until you modify the Hessian to make it positive-definitie. That's what cholesky!(Positive, H) does.

That said, not very many people have looked at how it works, and there is surely room for improvement.

qzhu2017 · 2020-07-11T22:11:03Z

@timholy You are right. So H is not positive definte at the starting point in my previous post.
If I change the starting point, it matches with my hand code.

But it seems that no modification on Hessian leads to quicker convergence. I am not sure if this means that the current code has a room for improvement.

timholy · 2020-07-12T09:25:01Z

Speed of convergence is often problem-dependent, so your results here may not be reflective of general trends. The bottom line is that Optim can't ignore this issue, because without it you can find situations where it would converge to a saddle point rather than a minimum. For example, try an unconditioned algorithm on "eigenvalue minimization"

f(x) = (x'*(A*x))/(x'*x)

for some matrix A with negative eigenvalues, e.g., Diagonal([-3, -2, -1, 1, 2, 3]).

There's a long literature on Hessian modification but none of it is terribly satisfying. This is discussed a bit in section 3.4 of Nocedal & Wright. The steps taken by Optim---particularly the second one---do seem a bit unfortunate. That's what I mean by "potential room for improvement." You could analyze the algorithm in PositiveFactorizations for opportunities that make this less bad.

pkofod · 2020-07-30T20:21:21Z

Let me add another comment although Tim already answered above. Before we added these corrections the typical complaint was: why is my Newton run failing? We need a safeguarded method as the default choice. I can't remember if we allow for a linsolve choice, but if we don't it's in the making. Thanks for the question though!

pkofod closed this as completed Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computing the inverse of Hessian in Newton's method #832

Computing the inverse of Hessian in Newton's method #832

qzhu2017 commented Jul 10, 2020

timholy commented Jul 11, 2020 •

edited

qzhu2017 commented Jul 11, 2020

timholy commented Jul 12, 2020 •

edited

pkofod commented Jul 30, 2020

Computing the inverse of Hessian in Newton's method #832

Computing the inverse of Hessian in Newton's method #832

Comments

qzhu2017 commented Jul 10, 2020

timholy commented Jul 11, 2020 • edited

qzhu2017 commented Jul 11, 2020

timholy commented Jul 12, 2020 • edited

pkofod commented Jul 30, 2020

timholy commented Jul 11, 2020 •

edited

timholy commented Jul 12, 2020 •

edited