Skip to content

Conversation

@timholy
Copy link
Member

@timholy timholy commented Jan 24, 2013

I made some improvements in the finite differencing. Here are some brief explanations:

  • Note that the new formula, 2 * sqrt(1e-12) * (1 + norm(x)), for abs(x) < 1 is approximately equal to 2e-6. The formula I used previously, eps(max(1.0, abs(x)))^(1/3), for abs(x) < 1 is equal to 6e-6, not very different. So contrary to your in-code comment, the small x behavior actually can't be very different for these two.
  • 1e-12 implicitly assumes Float64, whereas using the type-dispatched version of eps does not. I definitely run optimization with Float32s sometimes (when using huge datasets where all the computations are Float32).
  • The proper power scaling depends on the order of the derivative, and whether centered or forward differencing is used. You'll see the power I used in the revised version is 1/2, 1/3, or 1/4 depending on the circumstance. This is particularly complex for the Hessian.

I also removed the dtype parameter from finite_difference_hessian(f, x) because you basically have to use centered differencing for direct Hessian evaluation.

For your cases in test/finite_difference.jl, I think you'll mostly see improved accuracy. For central differencing the change is quite modest (usually less than one order of magnitude), but for forward differencing and the Hessian it can be dramatic. For example, with the old version

julia> norm(Calculus.finite_difference_hessian(fx, gx, [0.0, 0.0], :central) - [-sin(0.0) 0.0; 0.0 -cos(0.0)])
6.938866148331613e-6

and with the new

julia> norm(Calculus.finite_difference_hessian(fx, [0.0, 0.0]) - [-sin(0.0) 0.0; 0.0 -cos(0.0)])
7.450580596923828e-9

Presumably it should also be more robust in real-world problems, although that remains to be seen under usage.

@timholy
Copy link
Member Author

timholy commented Jan 24, 2013

I should have added one more point: despite their mathematical equivalence, there can be real-world differences between (f(x+epsilon)-f(x))/epsilon and xp = x+epsilon; (f(xp)-f(x))/(xp-x). The latter uses numbers that are exactly-represented by the machine, whereas with the former the gap in the numerator might not be equal to the gap in the denominator. In particular, this can be measurably-important when doing computations in Float32.

One concern, however, is whether the compiler might optimize this fine distinction away. CCing @JeffBezanson and @StefanKarpinski for guidance here about whether this is a likely problem and if so, whether there are tricks we can use to keep this from happening.

@JeffBezanson
Copy link

My guess is it will be fine; we won't optimize that away, and LLVM tends to respect float arithmetic by default.

@timholy
Copy link
Member Author

timholy commented Jan 24, 2013

I would recommend sqrt and cbrt over ^ here if that works.

Good catch, thanks!

johnmyleswhite added a commit that referenced this pull request Jan 25, 2013
Improved finite differencing
@johnmyleswhite johnmyleswhite merged commit e66cee4 into JuliaMath:master Jan 25, 2013
@johnmyleswhite
Copy link
Collaborator

Thanks for these patches, Tim. I'm going to merge them now because it's fairly clear that you're more knowledgeable on this topic than me. At some point, I'll try to pedal back and see if the example to select between values of epsilon that I previously used was corrupt in some way.

Sorry for the delay. I'm desperately trying to finish my dissertation in the next month, so I'll be slow with things.

@timholy
Copy link
Member Author

timholy commented Jan 25, 2013

Wow, good luck with the dissertation!

@johnmyleswhite
Copy link
Collaborator

Thanks. I'll need it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants