-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added optional initial bfgs inverse hessian #63
Conversation
@@ -471,3 +481,40 @@ function optimize(f::Function, | |||
throw(ArgumentError("Unknown method $method")) | |||
end | |||
end | |||
|
|||
function optimize{T <: Real}(f::Function, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this section about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure - I think this is a result of me branching from code that is ~ 1 month old that has subsequently been deleted? Honestly, I'm still a git/github beginner so its possible that I made some mistake somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get rid of it if not intended. You can do another git commit
, then do a git push
to update your fork. The pull request will automatically track the status of your fork.
This looks promising. I want to read through the rest of the codebase to see if this seems like the most general interface, but it's definitely a good start. Thanks for working on it. |
@@ -8,7 +8,8 @@ function optimize(d::TwiceDifferentiableFunction, | |||
store_trace::Bool = false, | |||
show_trace::Bool = false, | |||
extended_trace::Bool = false, | |||
linesearch!::Function = hz_linesearch!) | |||
linesearch!::Function = hz_linesearch!, | |||
bfgs_initial_invH::Matrix = eye(length(initial_x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates a dense n x n
matrix whenever someone calls optimize
for any reason. It should only be allocated when actually needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eeep. Thanks for catching that, Miles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a better idea. BFGS seems to want its pseudo-hessian to be PSD always, even though the true Hessian in a full newton-style method need not be (and indeed won't be until a locally optimal point is found). Since the whole point of this exercise isn't to get a true inverse Hessian, just a better conditioned initial pseudo-hessian, I'm going to respecify this to accept just a vector to go on the diagonal of the initial inverse pseudo-hessian, as opposed to the whole thing.
I'll update this pull request after I get some other github issues resolve on my machine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to unnecessarily lose generality and I'd still say that the allocation should be avoided if not using BFGS, even if it's just a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'll avoid the allocation, but what generality am I losing? Would a more general (and mathematically precise) way to go forward be to require that the initial inverse Hessian be PSD? In practice, I'm not sure how useful that would be, since the true dense Hessian at an arbitrary starting value is not going to be PSD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an advanced feature, I think it's okay to require the provided Hessian to be PSD. One can always add a perturbation on the diagonal to force PSD. At some later point we could try to do this automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, understood. However, now I am confused about how to do this without allocating when its not needed? Already, if you do a call to bfgs() with no pre-specified initial_invH, bfgs will default to generating a dense identity matrix of the appropriate size. Is the concern that now a call to optimize() for a non-bfgs optimization method allocates a needless identity matrix? Is there a way to get around this? I'm still pretty new to Julia so its not obvious for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's fine to allocate this when calling bfgs
, but it's not okay to make this allocation when the user wants to use another algorithm. For optimize
, you can say bfgs_initial_invH=nothing
and then just do something like:
if bfgs_initial_invH==nothing
bfgs_initial_invH = eye(length(initial_x))
end
right before calling bfgs
. There could be slightly cleaner ways, but this should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the issues described above.
@@ -106,6 +106,7 @@ function bfgs{T}(d::Union(DifferentiableFunction, | |||
|
|||
# Refresh the line search cache | |||
dphi0 = dot(gr, s) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra whitespace here
This looks better. What do you think, @johnmyleswhite? There are still some git hiccups with the PR, it's not mergeable. I'd suggest making a new branch off of the most recent master and applying your changes there. |
…ed whitespace in bfgs.jl
I'll look at this after I get home from work. Hard to think clearly about Julia midday amid other responsibilities. |
@mlubin I'm happy to branch off of the most recent master if everything else looks fine. Thanks for helping me better understand Julia best practices and for teaching me more about git! |
Closing this as I understand everything was covered by the other PR. |
This is my first attempt at a pull-request - a really simple additional option for the BFGS method to accept an initial inverse hessian. I need this feature for a problem in which the standard identity matrix generates errors in the line search routine.