Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing the inverse of Hessian in Newton's method #832

Closed
qzhu2017 opened this issue Jul 10, 2020 · 4 comments
Closed

Computing the inverse of Hessian in Newton's method #832

qzhu2017 opened this issue Jul 10, 2020 · 4 comments

Comments

@qzhu2017
Copy link

I am new to Julia. Recently, I tried to use the Optim package to optimize the Rosenbrock function with the Newton's method. The code is given as follows,

f_rb(x) = (1-x[1])^2 + 100*(4x[2] - x[1]^2)^2 

function g!(storage, x)
    storage[1] = (2*(200x[1]^3 - 800x[1]*x[2] + x[1] - 1))
    storage[2] = (-800*(x[1]^2 - 4x[2]))
end

function h!(storage, x)
    storage[1, 1] = 1200x[1]^2 - 1600x[2] + 2
    storage[1, 2] = -1600x[1]
    storage[2, 1] = -1600x[1]
    storage[2, 2] = 3200
    
end

x0 = [-1.0, 1.5]
res = optimize(f_rb, g!, h!, x0, Newton(),
               Optim.Options(iterations = 30,
                             store_trace = true,
                             extended_trace = true)

              )

However, the path from Optim (green) looks different from what I obtained based on own code (orange).
image

Clearly, the Newton SOLVER from Optim chose a different direction at the first step. By checking the code, I realized that the code used the cholesky factorization by default to compute inverse hessian.

if typeof(NLSolversBase.hessian(d)) <: AbstractSparseMatrix
state.s .= -NLSolversBase.hessian(d)\convert(Vector{T}, gradient(d))
else
state.F = cholesky!(Positive, NLSolversBase.hessian(d))
if typeof(gradient(d)) <: Array
# is this actually StridedArray?
ldiv!(state.s, state.F, -gradient(d))
else
# not Array, we can't do inplace ldiv
gv = Vector{T}(undef, length(gradient(d)))
copyto!(gv, -gradient(d))
copyto!(state.s, state.F\gv)
end
end

For a simple problem like 2*2 hessian, do we really need to do the cholesky factorization? Perhaps, it is better to have an option to let user compute H^-1 directly when H has a small size.

@timholy
Copy link
Contributor

timholy commented Jul 11, 2020

Is H positive-definite at the starting point? Newton's method is not guaranteed to converge until you modify the Hessian to make it positive-definitie. That's what cholesky!(Positive, H) does.

That said, not very many people have looked at how it works, and there is surely room for improvement.

@qzhu2017
Copy link
Author

@timholy You are right. So H is not positive definte at the starting point in my previous post.
If I change the starting point, it matches with my hand code.

image

But it seems that no modification on Hessian leads to quicker convergence. I am not sure if this means that the current code has a room for improvement.

@timholy
Copy link
Contributor

timholy commented Jul 12, 2020

Speed of convergence is often problem-dependent, so your results here may not be reflective of general trends. The bottom line is that Optim can't ignore this issue, because without it you can find situations where it would converge to a saddle point rather than a minimum. For example, try an unconditioned algorithm on "eigenvalue minimization"

f(x) = (x'*(A*x))/(x'*x)

for some matrix A with negative eigenvalues, e.g., Diagonal([-3, -2, -1, 1, 2, 3]).

There's a long literature on Hessian modification but none of it is terribly satisfying. This is discussed a bit in section 3.4 of Nocedal & Wright. The steps taken by Optim---particularly the second one---do seem a bit unfortunate. That's what I mean by "potential room for improvement." You could analyze the algorithm in PositiveFactorizations for opportunities that make this less bad.

@pkofod
Copy link
Member

pkofod commented Jul 30, 2020

Let me add another comment although Tim already answered above. Before we added these corrections the typical complaint was: why is my Newton run failing? We need a safeguarded method as the default choice. I can't remember if we allow for a linsolve choice, but if we don't it's in the making. Thanks for the question though!

@pkofod pkofod closed this as completed Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants