added optional initial bfgs inverse hessian #63

tcovert · 2014-06-24T15:14:59Z

This is my first attempt at a pull-request - a really simple additional option for the BFGS method to accept an initial inverse hessian. I need this feature for a problem in which the standard identity matrix generates errors in the line search routine.

johnmyleswhite · 2014-06-24T15:18:42Z

src/optimize.jl

@@ -471,3 +481,40 @@ function optimize(f::Function,
        throw(ArgumentError("Unknown method $method"))
    end
 end
+
+function optimize{T <: Real}(f::Function,


What's this section about?

I'm not sure - I think this is a result of me branching from code that is ~ 1 month old that has subsequently been deleted? Honestly, I'm still a git/github beginner so its possible that I made some mistake somewhere else.

Let's get rid of it if not intended. You can do another git commit, then do a git push to update your fork. The pull request will automatically track the status of your fork.

johnmyleswhite · 2014-06-24T15:19:19Z

This looks promising. I want to read through the rest of the codebase to see if this seems like the most general interface, but it's definitely a good start. Thanks for working on it.

mlubin · 2014-06-24T15:52:14Z

src/optimize.jl

@@ -8,7 +8,8 @@ function optimize(d::TwiceDifferentiableFunction,
                  store_trace::Bool = false,
                  show_trace::Bool = false,
                  extended_trace::Bool = false,
-                  linesearch!::Function = hz_linesearch!)
+                  linesearch!::Function = hz_linesearch!,
+                  bfgs_initial_invH::Matrix = eye(length(initial_x)))


This allocates a dense n x n matrix whenever someone calls optimize for any reason. It should only be allocated when actually needed.

Eeep. Thanks for catching that, Miles.

I've got a better idea. BFGS seems to want its pseudo-hessian to be PSD always, even though the true Hessian in a full newton-style method need not be (and indeed won't be until a locally optimal point is found). Since the whole point of this exercise isn't to get a true inverse Hessian, just a better conditioned initial pseudo-hessian, I'm going to respecify this to accept just a vector to go on the diagonal of the initial inverse pseudo-hessian, as opposed to the whole thing.

I'll update this pull request after I get some other github issues resolve on my machine...

That seems to unnecessarily lose generality and I'd still say that the allocation should be avoided if not using BFGS, even if it's just a vector.

Ok I'll avoid the allocation, but what generality am I losing? Would a more general (and mathematically precise) way to go forward be to require that the initial inverse Hessian be PSD? In practice, I'm not sure how useful that would be, since the true dense Hessian at an arbitrary starting value is not going to be PSD.

Since this is an advanced feature, I think it's okay to require the provided Hessian to be PSD. One can always add a perturbation on the diagonal to force PSD. At some later point we could try to do this automatically.

Ok, understood. However, now I am confused about how to do this without allocating when its not needed? Already, if you do a call to bfgs() with no pre-specified initial_invH, bfgs will default to generating a dense identity matrix of the appropriate size. Is the concern that now a call to optimize() for a non-bfgs optimization method allocates a needless identity matrix? Is there a way to get around this? I'm still pretty new to Julia so its not obvious for me.

Yes, it's fine to allocate this when calling bfgs, but it's not okay to make this allocation when the user wants to use another algorithm. For optimize, you can say bfgs_initial_invH=nothing and then just do something like:

if bfgs_initial_invH==nothing bfgs_initial_invH = eye(length(initial_x)) end

right before calling bfgs. There could be slightly cleaner ways, but this should work.

I've fixed the issues described above.

mlubin · 2014-06-24T20:48:16Z

src/bfgs.jl

@@ -106,6 +106,7 @@ function bfgs{T}(d::Union(DifferentiableFunction,

        # Refresh the line search cache
        dphi0 = dot(gr, s)
+


Extra whitespace here

mlubin · 2014-06-24T20:51:03Z

This looks better. What do you think, @johnmyleswhite?

There are still some git hiccups with the PR, it's not mergeable. I'd suggest making a new branch off of the most recent master and applying your changes there.

…ed whitespace in bfgs.jl

johnmyleswhite · 2014-06-24T20:53:44Z

I'll look at this after I get home from work. Hard to think clearly about Julia midday amid other responsibilities.

tcovert · 2014-06-24T20:56:14Z

@mlubin I'm happy to branch off of the most recent master if everything else looks fine. Thanks for helping me better understand Julia best practices and for teaching me more about git!

johnmyleswhite · 2014-07-12T18:40:29Z

Closing this as I understand everything was covered by the other PR.

added optional initial vfgs inverse hessian

d074aae

tcovert mentioned this pull request Jun 24, 2014

Allow specifying the initial Hessian #59

Closed

johnmyleswhite reviewed Jun 24, 2014
View reviewed changes

removed unnecessary code addition

efd6313

mlubin reviewed Jun 24, 2014
View reviewed changes

tcovert added 2 commits June 24, 2014 13:52

removed memory allocation issues with bfgs_initial_invH

ee869b5

removed memory allocation issues with bfgs_initial_invH

8a30e7e

mlubin reviewed Jun 24, 2014
View reviewed changes

fixed an erroneous Vector->Array type annotation and added back remov…

6db4b35

…ed whitespace in bfgs.jl

johnmyleswhite closed this Jul 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added optional initial bfgs inverse hessian #63

added optional initial bfgs inverse hessian #63

tcovert commented Jun 24, 2014

johnmyleswhite Jun 24, 2014

tcovert Jun 24, 2014

johnmyleswhite Jun 24, 2014

johnmyleswhite commented Jun 24, 2014

mlubin Jun 24, 2014

johnmyleswhite Jun 24, 2014

tcovert Jun 24, 2014

mlubin Jun 24, 2014

tcovert Jun 24, 2014

mlubin Jun 24, 2014

tcovert Jun 24, 2014

mlubin Jun 24, 2014

tcovert Jun 24, 2014

mlubin Jun 24, 2014

mlubin commented Jun 24, 2014

johnmyleswhite commented Jun 24, 2014

tcovert commented Jun 24, 2014

johnmyleswhite commented Jul 12, 2014

		@@ -106,6 +106,7 @@ function bfgs{T}(d::Union(DifferentiableFunction,

		# Refresh the line search cache
		dphi0 = dot(gr, s)

added optional initial bfgs inverse hessian #63

added optional initial bfgs inverse hessian #63

Conversation

tcovert commented Jun 24, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnmyleswhite commented Jun 24, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlubin commented Jun 24, 2014

johnmyleswhite commented Jun 24, 2014

tcovert commented Jun 24, 2014

johnmyleswhite commented Jul 12, 2014