Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added optional initial bfgs inverse hessian #63

Closed
wants to merge 5 commits into from

Conversation

tcovert
Copy link
Contributor

@tcovert tcovert commented Jun 24, 2014

This is my first attempt at a pull-request - a really simple additional option for the BFGS method to accept an initial inverse hessian. I need this feature for a problem in which the standard identity matrix generates errors in the line search routine.

@@ -471,3 +481,40 @@ function optimize(f::Function,
throw(ArgumentError("Unknown method $method"))
end
end

function optimize{T <: Real}(f::Function,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this section about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - I think this is a result of me branching from code that is ~ 1 month old that has subsequently been deleted? Honestly, I'm still a git/github beginner so its possible that I made some mistake somewhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of it if not intended. You can do another git commit, then do a git push to update your fork. The pull request will automatically track the status of your fork.

@johnmyleswhite
Copy link
Contributor

This looks promising. I want to read through the rest of the codebase to see if this seems like the most general interface, but it's definitely a good start. Thanks for working on it.

@@ -8,7 +8,8 @@ function optimize(d::TwiceDifferentiableFunction,
store_trace::Bool = false,
show_trace::Bool = false,
extended_trace::Bool = false,
linesearch!::Function = hz_linesearch!)
linesearch!::Function = hz_linesearch!,
bfgs_initial_invH::Matrix = eye(length(initial_x)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates a dense n x n matrix whenever someone calls optimize for any reason. It should only be allocated when actually needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeep. Thanks for catching that, Miles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a better idea. BFGS seems to want its pseudo-hessian to be PSD always, even though the true Hessian in a full newton-style method need not be (and indeed won't be until a locally optimal point is found). Since the whole point of this exercise isn't to get a true inverse Hessian, just a better conditioned initial pseudo-hessian, I'm going to respecify this to accept just a vector to go on the diagonal of the initial inverse pseudo-hessian, as opposed to the whole thing.

I'll update this pull request after I get some other github issues resolve on my machine...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to unnecessarily lose generality and I'd still say that the allocation should be avoided if not using BFGS, even if it's just a vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll avoid the allocation, but what generality am I losing? Would a more general (and mathematically precise) way to go forward be to require that the initial inverse Hessian be PSD? In practice, I'm not sure how useful that would be, since the true dense Hessian at an arbitrary starting value is not going to be PSD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an advanced feature, I think it's okay to require the provided Hessian to be PSD. One can always add a perturbation on the diagonal to force PSD. At some later point we could try to do this automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, understood. However, now I am confused about how to do this without allocating when its not needed? Already, if you do a call to bfgs() with no pre-specified initial_invH, bfgs will default to generating a dense identity matrix of the appropriate size. Is the concern that now a call to optimize() for a non-bfgs optimization method allocates a needless identity matrix? Is there a way to get around this? I'm still pretty new to Julia so its not obvious for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's fine to allocate this when calling bfgs, but it's not okay to make this allocation when the user wants to use another algorithm. For optimize, you can say bfgs_initial_invH=nothing and then just do something like:

if bfgs_initial_invH==nothing
    bfgs_initial_invH = eye(length(initial_x))
end

right before calling bfgs. There could be slightly cleaner ways, but this should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the issues described above.

@@ -106,6 +106,7 @@ function bfgs{T}(d::Union(DifferentiableFunction,

# Refresh the line search cache
dphi0 = dot(gr, s)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace here

@mlubin
Copy link
Contributor

mlubin commented Jun 24, 2014

This looks better. What do you think, @johnmyleswhite?

There are still some git hiccups with the PR, it's not mergeable. I'd suggest making a new branch off of the most recent master and applying your changes there.

@johnmyleswhite
Copy link
Contributor

I'll look at this after I get home from work. Hard to think clearly about Julia midday amid other responsibilities.

@tcovert
Copy link
Contributor Author

tcovert commented Jun 24, 2014

@mlubin I'm happy to branch off of the most recent master if everything else looks fine. Thanks for helping me better understand Julia best practices and for teaching me more about git!

@johnmyleswhite
Copy link
Contributor

Closing this as I understand everything was covered by the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants