Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC/WIP] Rework *DifferentiableFunction #337

Merged
merged 15 commits into from Mar 11, 2017
Merged

[RFC/WIP] Rework *DifferentiableFunction #337

merged 15 commits into from Mar 11, 2017

Conversation

pkofod
Copy link
Member

@pkofod pkofod commented Jan 8, 2017

The main point is to create dispatch based objective evaluations, and use them in a smarter way. For example, f_calls+=1 now follows a value(df, x) call. This actually caught a few places where we forgot to increment these counters even if a call was made. It's nowhere near done, but I'm just putting it up here for people to see. I didn't put [RFC] because I don't consider it anywhere near done, so I don't expect people to spend time reviewing it. However, I do accept comments and discussion at this point as well, if people want to dive in.

  • figure out how to get proper calls when doing finite differences (I could add a type parameter to the types, but that seems a bit excessive to count the number of calls... maybe a mutable instead of a number (1 element array)) ref Finite differences: shouldn't we count f_calls? #219
  • take advantage of the finite difference code for the hessian if the gradient is available (lowers the number of f calls)
  • allow for other "automatic" differentiation mechanisms through the keyword (maybe go back to symbols)
  • avoid too many calculations by checking input
  • exploit last_x_h
  • loosen gradient and hessian fields slightly such that users can specify their own gradient and hessian types (need to deal with g_previous etc)
  • Document changes
  • restrict differentiable constructor with no gradient to vector
  • go over type annotations. Some were left open as an experiment, some will have to be parametrized and so on. Remember, there's a WIP tag on there :)
  • find out what is wrong with "Large Polynomial" (which should have a diagonal hessian) and finite differences... -> there must be a bug in Calculus
  • better name than "method" for keyword
    Checking input x before calculating
    I now have a way to avoid recalculating f for the same x. I'm not sure if this is too much, or if it addresses the things relevant to LineSearches.jl. This PR is introducing quite some machinery, and I am not exactly sure if it's a bit too much... But let's see where it takes us!

fixes/closes: #329 #306 #305 #287 #241 #219 #163

@codecov-io
Copy link

codecov-io commented Jan 9, 2017

Codecov Report

Merging #337 into master will decrease coverage by 1%.
The diff coverage is 75.37%.

@@            Coverage Diff             @@
##           master     #337      +/-   ##
==========================================
- Coverage   88.54%   87.54%   -1.01%     
==========================================
  Files          29       28       -1     
  Lines        1633     1574      -59     
==========================================
- Hits         1446     1378      -68     
- Misses        187      196       +9
Impacted Files Coverage Δ
src/utilities/generic.jl 100% <ø> (ø)
src/bfgs.jl 97.05% <100%> (-0.38%)
src/accelerated_gradient_descent.jl 100% <100%> (ø)
src/gradient_descent.jl 100% <100%> (ø)
src/particle_swarm.jl 88.15% <100%> (+0.55%)
src/simulated_annealing.jl 96.42% <100%> (-0.55%)
src/utilities/assess_convergence.jl 93.1% <100%> (ø)
src/fminbox.jl 84.92% <100%> (-0.24%)
src/momentum_gradient_descent.jl 100% <100%> (ø)
src/l_bfgs.jl 98.36% <100%> (-0.06%)
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 250b50a...3528c01. Read the comment docs.

n_x = length(x_seed)
f_calls = [0]
g_calls = [0]
function g!(x::Array, storage::Array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not something like:

function g!(x::Array, storage::Array)
    f_calls::Int # (maybe needed to avoid Core.Box)
    f_calls = 0
    Calculus.finite_difference!(x -> (f_calls += 1; f(x)), x, storage, g_method.method)
    f_calls[1] .+= f_calls
    return
end

to avoid having to predict how many times the gradient estimator evaluates the function. For automatic differentiation this is not always so obvious so this suggestion would remove that difficulty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will certainly do this! Thanks for that comment :)

@pkofod
Copy link
Member Author

pkofod commented Jan 9, 2017

In this PR you have to give the *Differentiable constructor an x seed, as also mentioned in elsewhere in an issue somewhere. I can personally live with that, and my advice to users would simply be to use optimize(f, g!, ...) and don't bother constructing the *Differentiable types yourself unless you want to have a special g or H storage.

@anriseth
Copy link
Contributor

Should this be part of Optim, or some separate package?
This could be useful for LineSearches (and I guess NLsolve?) as well.

@pkofod
Copy link
Member Author

pkofod commented Jan 10, 2017

I think I'll put it in here first, but if it fits with nlsolve and linesearches we can take it out and have it as a dependency.

@anriseth
Copy link
Contributor

I think LineSearches needs this, or will have to replicate the value and gradient functions in order to work with Optim.

@pkofod
Copy link
Member Author

pkofod commented Jan 10, 2017

I'm not removing or renaming the old fields, so I think it still works? Certainly the line searches still work in the tests here

@anriseth
Copy link
Contributor

Okay, as long as we can still communicate number of f- and g-calls back to Optim, it should be fine.

@KristofferC
Copy link
Contributor

I think Optim.jl will keep track of that itself.

@pkofod
Copy link
Member Author

pkofod commented Jan 10, 2017

Currently, line searches spits out the additional calls to f and g and then we increment the counter in optim. I would love for lineaearches to use the value and value_grad methods. So I am for the idea put forth, I just want this pr to be a bit more complete before doing it. It does require nlsolve to agree though. If not, we can just keep it as it is

@anriseth
Copy link
Contributor

Ah, I see do_linesearch still keeps track of function and gradient evaluations.

Will this PR make it easier for us to prevent multiple evaluations of functions or gradients at the same points? (As discussed in JuliaNLSolvers/LineSearches.jl#10 and #288)

@pkofod
Copy link
Member Author

pkofod commented Jan 10, 2017

If course, there's the simpler way where we just use the anon function trick as in the finite differences constructor in all incoming functions, and then lineaearches doesn't even have to worry about calls at all. Then we're back to d.f(x) and family again

@pkofod
Copy link
Member Author

pkofod commented Jan 10, 2017

Yes, I will get to that soon. It will check internally if the last x is input again.

@@ -2,7 +2,7 @@
# Test Optim.nelder_mead for all functions except Large Polynomials in Optim.UnconstrainedProblems.examples
for (name, prob) in Optim.UnconstrainedProblems.examples
f_prob = prob.f
res = Optim.optimize(f_prob, prob.initial_x, NelderMead(), Optim.Options(iterations = 10000))
@show res = Optim.optimize(f_prob, prob.initial_x, NelderMead(), Optim.Options(iterations = 10000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to commit the @show?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, thanks..

g!(x_seed, g_x)
Differentiable(f, g!, fg!, f(x_seed), g_x, copy(x_seed), [1], [1])
end
function Differentiable{T}(f::Function, x_seed::Array{T}; method = :finitediff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that typeof(f) <: Function ? Would the code work if the type of f simply implements (f::MyType)(x) = ...? if so we could consider not be overly restrictive here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, and it will be removed. I know @ChrisRackauckas would appreciate them gone as well :)

@@ -39,7 +39,7 @@ using ForwardDiff
plap(U; n = length(U)) = (n-1)*sum((0.1 + diff(U).^2).^2 ) - sum(U) / (n-1)
plap1 = ForwardDiff.gradient(plap)
precond(n) = spdiagm((-ones(n-1), 2*ones(n), -ones(n-1)), (-1,0,1), n, n)*(n+1)
df = DifferentiableFunction(x -> plap([0; X; 0]),
df = Differentiable(x -> plap([0; X; 0]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also need the x_seed now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why the docs box is not ticked yet :) but thanks for looking through the changes. This was just a search and replace :)

@@ -81,7 +81,7 @@ using Optim
initial_x = ...
buffer = Array(...) # Preallocate an appropriate buffer
last_x = similar(initial_x)
df = TwiceDifferentiableFunction(x -> f(x, buffer, initial_x),
df = TwiceDifferentiable(x -> f(x, buffer, initial_x),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x_seed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: maybe we should start adding some actual doctests

@pkofod
Copy link
Member Author

pkofod commented Jan 14, 2017

is this "check last_x" along the lines of what you had in mind @anriseth ?

@anriseth
Copy link
Contributor

is this "check last_x" along the lines of what you had in mind @anriseth ?

Thank you, it looks like a good way to deal with the double evaluation issues as far as I'm concerned. Are there situations where people would want to define such objective objects without an initial evaluation at x_seed? (E.g. in NLsolve or JuliaML, if we put use this in a OptimTests.jl, @KristofferC @Evizero )

@pkofod
Copy link
Member Author

pkofod commented Jan 15, 2017

well it depends on the general interface. Generally, I'm thinking that people in optim provide f g and h for example, and then this object is used internally. Then there is automatically a seed: the first x used for evaluation.

Copy link
Contributor

@anriseth anriseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these changes handle cases where the previous evaluation was only of the objective, or only of the gradient, but not both?

If not, maybe we would have to store last_x_f, last_x_g and last_x_h for the evaluation of f,g and h respectively.
Then for value_grad, there are different options of how to handle cases where last_x_g != last_x_f.

g_x, Array{T}(n_x, n_x), copy(x_seed), [1], [1], [0])
end

function value(obj, x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there situations where one would want to call f(x) without updating last_x?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, maybe we would have to store last_x_f, last_x_g and last_x_h for the evaluation of f,g and h respectively.

I certainly need to handle this!

@pkofod pkofod force-pushed the pkm/df branch 3 times, most recently from a5c025a to 67e3bb3 Compare January 21, 2017 22:23
src/bfgs.jl Outdated
linesearch!::L
initial_invH::H
resetalpha::Bool
immutable BFGS <: Optimizer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why was this changed from being parameterized, to leaving them all as type Any?
(I usually recommend to always use concrete types for type/immutable fields, parameterized if necessary)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because they don't want to recompile for each new function? But the linesearch! function, is that dependent on the user's function? If not and this is usually the same function (some default), then it should be more strictly typed. Anyways, resetalpha should still be typed to a Bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to stop this comment thread now because @ScottPJones should probably be banned from the JuliaOpt org's repos in the same way that he is currently banned from the JuliaLang org's repos. For me (as the original creator of this specific project), it is very important to respect the Julia community stewards' decision to ban Scott.

h!
f_x
g_x
h_storage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have this parameterized in the constructor, maybe it should also be parameterized here, for better performance?

@pkofod
Copy link
Member Author

pkofod commented Jan 23, 2017

Just let the function ones be free so that all callable types work?
I guess you mean:

type Type{T}
    f::T
end

Sure

@anriseth
Copy link
Contributor

anriseth commented Mar 8, 2017

Sure, but maybe in a separate PR. Then I can merge this this week, create a new pr for the gradient position change (to avoid confusion), merge and then tag. Sounds good?

Makes sense.
Are you going to separate out the *Function changes in this PR to NLSolversBase.jl, or merge directly to Optim?

@pkofod
Copy link
Member Author

pkofod commented Mar 8, 2017

Makes sense.
Are you going to separate out the *Function changes in this PR to NLSolversBase.jl, or merge directly to Optim?

I was just going to merge this actually, and then copy+paste the code to NLSolversBase.jl, just to have the change in the git history (so it is easier to follow what happened prior to moving code to another package).

@pkofod
Copy link
Member Author

pkofod commented Mar 8, 2017

I'm pushing some bug fixes tonight, and adding x_calls_limit for f, g, and h. Now is the time to speak up if you don't like these changes, as I'll merge soon.

n_x = length(x_seed)
f_calls = [1]
g_calls = [1]
if method == :finitediff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add these methods here, maybe we can call *Differentiable(f,x_init;method) directly from the optimize functions instead of having separate code for generating g!, fg! (and h!) there? I'm referring to e.g.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I forgot that was also done there.

@@ -2,18 +2,21 @@
for use_autodiff in (false, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is use_autodiff used anywhere now?

@pkofod pkofod force-pushed the pkm/df branch 8 times, most recently from 1a399fd to c075c97 Compare March 10, 2017 09:42
@pkofod
Copy link
Member Author

pkofod commented Mar 10, 2017

Right, so just to update everyone, this is approaching its end. With this PR I am no longer targeting v0.4 (yet to be changed) and I won't carry the deprecations forwards either. This means that this PR is heavily breaking, so tagging will have to be done with care (limits in metadata). I removed all deprecations as well.

V0.4 was so long ago, v0.6 is coming soon, and shortly after we have v1.0. Post v1.0 I will be very happy to be very careful in backwards compatibility, but as far as I'm concerned, we're approaching a sprint for v1.0 of JuliaLang, and some things will have to be done a little faster. The main reason is, that we don't have 1-2 people sitting here full time, making sure that everything works across v0.2-v0.6, mac, linux, windows, amiga, and raspberry pi.

@pkofod pkofod merged commit 7e1ef14 into master Mar 11, 2017
@pkofod pkofod deleted the pkm/df branch April 8, 2017 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finite differences and autodiff - bake into *DifferentiableFunction?
8 participants