Skip to content

Conversation

ararslan
Copy link
Member

This introduces a type-based approach that works slightly differently than the current API. Upon construction of a FDMethod subtype, the order of the method and the order of the derivative are checked for agreement, and the grid and coefficients are computed and stored. Instances of the types are callable, and doing so produces the value of the derivative. It works recursively when adaptively computing the bound, which results in a better step size and thus a (hopefully) more robust estimate of the derivative.

"Why types?" you may be asking. It just seemed like an efficient way of expressing things and storing intermediate values.

"Where did this adaptive thing come from?" From none other than Wessel himself: https://github.com/wesselb/fdm.

This introduces a type-based approach that works slightly differently
than the current API. Upon construction of a `FDMethod` subtype, the
order of the method and the order of the derivative are checked for
agreement, and the grid and coefficients are computed and stored.
Instances of the types are callable, and doing so produces the value of
the derivative. It works recursively when adaptively computing the
bound, which results in a better step size and thus a (hopefully) more
robust estimate of the derivative.
@ararslan ararslan requested review from wesselb and willtebbutt May 16, 2019 23:14
@codecov
Copy link

codecov bot commented May 16, 2019

Codecov Report

Merging #17 into master will increase coverage by 2.5%.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #17     +/-   ##
======================================
+ Coverage    92.5%   95%   +2.5%     
======================================
  Files           4     4             
  Lines          80   120     +40     
======================================
+ Hits           74   114     +40     
  Misses          6     6
Impacted Files Coverage Δ
src/methods.jl 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d6f656...cc9bf63. Read the comment docs.

@ararslan
Copy link
Member Author

I've marked this as a draft because I'd like feedback in general, but especially on:

  • The implementation. Is this the best way to go about computing the bound?
  • The API. Should this replace the current API? How could it be improved? Currently it duplicates all existing logic and lives separately from the existing API.
  • Should we rename and reuse FDMReport for this? We could make it mutable and have it update the stored values on each adaptation step.
  • What are some corner cases that could be used to test that this provides improvements over the non-adaptive approach?

src/methods.jl Outdated

# Adaptively compute the bound on the function and derivative values, if applicable.
if 0 < adapt < p
newm = (M.name.wrapper)(p + 1, q)
Copy link
Member

@wesselb wesselb May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bound should be set to an upper bound on the orderth (i.e. pth) derivative, so I believe this should be newm = (M.name.wrapper)(p + 1, p). Here p + 1 comes from that the order must be strictly greater than the derivative to estimate (which also means that we have a choice here). Furthermore, order in our FDM setup refers to the number of grid points, which is easily interpretable, but which is also one off from what it typically means. Perhaps we should adjust this.

src/methods.jl Outdated
if 0 < adapt < p
newm = (M.name.wrapper)(p + 1, q)
dfdx, = fdm(newm, f, x; eps=eps, bound=bound, adapt=(adapt - 1))
bound = maxabs(dfdx)
Copy link
Member

@wesselb wesselb May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to be careful here to ensure that bound > 0. A simple solution is to add a tiny number, e.g. 1e-20. (Note that the case bound = 0 can easily occur, e.g. when adapting a p = 2 and q = 1 method for a linear function.)

@wesselb
Copy link
Member

wesselb commented May 17, 2019

First of all, this looks great!!

The implementation. Is this the best way to go about computing the bound?

I have added some comments regarding the implementation.

It might also be worthwhile to implement the condition number mechanism to make the procedure even more robust. I think robustness of the estimation is an important feature.

The API. Should this replace the current API? How could it be improved? Currently it duplicates all existing logic and lives separately from the existing API.

I think the new API with the types is pretty sensible and could replace the current API. We might still want to support the old API by redirecting to the new API to maintain compatibility.

Should we rename and reuse FDMReport for this? We could make it mutable and have it update the stored values on each adaptation step.

Yeah, I'm not sure if having a separate type (i.e. FDMReport) just to store this information is the best way to go about it. Perhaps the new types (i.e. Forward etc) can instead store this information, which would be overwritten at every computation, thus always referring to the last computation. Any ideas on this?

What are some corner cases that could be used to test that this provides improvements over the non-adaptive approach?

A nice example is the estimation of the derivative of x -> 1 / x at an x near zero:

>>> central_fdm(5, 1, adapt=0)(lambda x: 1 / x, 1e-3) + 1e6
670283.9150891255

>>> central_fdm(5, 1, adapt=1)(lambda x: 1 / x, 1e-3) + 1e6
2.133892849087715e-05

>>> central_fdm(5, 1, adapt=2)(lambda x: 1 / x, 1e-3) + 1e6
5.097826942801476e-07

In general, I have not found much use of having more than two adaptation steps, and adapt=1 should probably be the default. For difficult functions, having a high number of adaptation steps may even prove harmful, because that requires estimating high-order derivatives, which can fail catastrophically for difficult functions.

Furthermore, the cases bound = 0 and eps = 0 should be tested. Having bound > 0 and eps > 0 is absolutely critical. For example,

>>> central_fdm(3, 2, adapt=0)(lambda x: x, 1.0)
0.0

gives 0.0 exactly, and this could introduce division by zero, if we're not careful.

@wesselb
Copy link
Member

wesselb commented May 19, 2019

This looks really good to me now! Let's see what others have to say.

Another example where adaptation is useful is estimating the derivative of log near 0:

>>> forward_fdm(5, 1, adapt=0)(np.log, 1e-3)
996.0418094849194

>>> forward_fdm(5, 1, adapt=1)(np.log, 1e-3)
999.9999999445137

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this. It's going to make our lives much more straightforward :)

I'm happy for this to be merged as is, but it would be good to implement some tests around the log example @wesselb gave, and maybe something involving inverting a slightly awkward matrix.

We should also test that things work as expected with adaptation near the edge of domains. e.g. with the log example, you really do have to use forward_fdm when working close to zero. You could check that backward_fdm works with -log for numbers just less than zero etc.

A tiny number added to some quantities to ensure that division by 0 does not occur.
"""
const TINY = 1e-20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to replace this constant by eps()? Or have I misunderstood what this is going to be used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could do that. I was just following what's done in the Python implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiny number here should be absolutely small, so I'm not sure eps() makes sense. What we had before, eps(0.0), would be ideal, if it not were at the edge of the floating point range. 1e-20 is a compromise to have something absolutely small to what we usually work with, whilst not it being too small to cause issues with the numerics.

Copy link
Member

@wesselb wesselb May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the above comment doesn't really make sense. TINY is supposed to be small compared to any number that we typically encounter, but eps() is only small to numbers on the order of magnitude of 1 (or bigger). If we consider 1e-6 on the lower end of sizes of numbers we typically encounter, then eps(1e-6) = 2e-22 should be sufficient, which is where I originally got the 1e-20 from. Does that make more sense? 😬

eps::Real
bound::Real
step::Real
accuracy::Real
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care that the types of eps, bound, etc are abstract? Does this likely have performance implications?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but this is a mutable type anyway, so it will already not be terribly performant. The problem with having them concrete is that when the History object is constructed, we don't know the types of the values it will end up holding, since this is constructed before we call the FDMethod on a function. The function could return Float32, for example, and we wouldn't necessarily want to promote to Float64. (Though a case could be made to do so.)

@ararslan
Copy link
Member Author

Another thing I haven't yet done here is have the current API simply call the new API; the code for both exists and is largely identical.

@ararslan
Copy link
Member Author

For example,

>>> central_fdm(3, 2, adapt=0)(lambda x: x, 1.0)
0.0

gives 0.0 exactly, and this could introduce division by zero, if we're not careful.

julia> FDM.Central(3, 2, adapt=0)(identity, 1.0)
0.0

julia> central_fdm(3, 2)(identity, 1.0)
7.277897151650305e-7

Hmmmmm.

@wesselb
Copy link
Member

wesselb commented May 21, 2019

Hmm, interesting. The Python implementation gives me a different result:

>>> central_fdm(3, 2, adapt=0)(lambda x: x, 1.0)
0.0

>>> central_fdm(3, 2, adapt=1)(lambda x: x, 1.0)
0.0

I'll have a closer look at the code.

@ararslan
Copy link
Member Author

The behavior implemented here matches Python, but what's odd is that the original implementation does not.

@wesselb
Copy link
Member

wesselb commented May 21, 2019

Oh, I totally missed that the second call uses the old API; didn't have my morning coffee yet...

I think it's due to the condition number:

julia> central_fdm(3, 2; M=1)(x -> x, 1.0)
7.277897151650305e-7

julia> central_fdm(3, 2; M=100)(x -> x, 1.0)
0.0

This change consistutes a near rewrite of method handling.
@ararslan
Copy link
Member Author

Okay, assuming CI agrees, I think things should be in good shape now. Unfortunately method handling was almost entirely rewritten. 😐 Also note that I've added a Nonstandard method which represents a custom grid, otherwise we would lose that functionality with this change, however we can't do adaptive bounds computation on a custom grid.

@ararslan ararslan marked this pull request as ready for review May 27, 2019 21:30
@ararslan ararslan changed the title WIP/RFC: Add adaptive function bound computation Add adaptive function bound computation May 27, 2019
@ararslan
Copy link
Member Author

UGH. Adaptively computing the bounds actually results in more test failures in ChainRules. I was hoping it would make it more robust to changes in the random seed.

ararslan added 2 commits May 28, 2019 09:55
If it's too big, Bad Things can happen. See e.g. the added test.
@ararslan
Copy link
Member Author

This no longer destroys Nabla's tests once very minor adjustments are made in Nabla to facilitate this change.

@wesselb
Copy link
Member

wesselb commented May 29, 2019

This looks very good to me! Happy to merge, if everyone else is so as well.

@wesselb wesselb merged commit e40ed35 into master May 29, 2019
@ararslan ararslan deleted the aa/adapt branch May 29, 2019 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants