-
Notifications
You must be signed in to change notification settings - Fork 27
Add adaptive function bound computation #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This introduces a type-based approach that works slightly differently than the current API. Upon construction of a `FDMethod` subtype, the order of the method and the order of the derivative are checked for agreement, and the grid and coefficients are computed and stored. Instances of the types are callable, and doing so produces the value of the derivative. It works recursively when adaptively computing the bound, which results in a better step size and thus a (hopefully) more robust estimate of the derivative.
Codecov Report
@@ Coverage Diff @@
## master #17 +/- ##
======================================
+ Coverage 92.5% 95% +2.5%
======================================
Files 4 4
Lines 80 120 +40
======================================
+ Hits 74 114 +40
Misses 6 6
Continue to review full report at Codecov.
|
I've marked this as a draft because I'd like feedback in general, but especially on:
|
src/methods.jl
Outdated
|
||
# Adaptively compute the bound on the function and derivative values, if applicable. | ||
if 0 < adapt < p | ||
newm = (M.name.wrapper)(p + 1, q) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bound
should be set to an upper bound on the order
th (i.e. p
th) derivative, so I believe this should be newm = (M.name.wrapper)(p + 1, p)
. Here p + 1
comes from that the order must be strictly greater than the derivative to estimate (which also means that we have a choice here). Furthermore, order
in our FDM setup refers to the number of grid points, which is easily interpretable, but which is also one off from what it typically means. Perhaps we should adjust this.
src/methods.jl
Outdated
if 0 < adapt < p | ||
newm = (M.name.wrapper)(p + 1, q) | ||
dfdx, = fdm(newm, f, x; eps=eps, bound=bound, adapt=(adapt - 1)) | ||
bound = maxabs(dfdx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to be careful here to ensure that bound > 0
. A simple solution is to add a tiny number, e.g. 1e-20
. (Note that the case bound = 0
can easily occur, e.g. when adapting a p = 2
and q = 1
method for a linear function.)
First of all, this looks great!!
I have added some comments regarding the implementation. It might also be worthwhile to implement the condition number mechanism to make the procedure even more robust. I think robustness of the estimation is an important feature.
I think the new API with the types is pretty sensible and could replace the current API. We might still want to support the old API by redirecting to the new API to maintain compatibility.
Yeah, I'm not sure if having a separate type (i.e.
A nice example is the estimation of the derivative of >>> central_fdm(5, 1, adapt=0)(lambda x: 1 / x, 1e-3) + 1e6
670283.9150891255
>>> central_fdm(5, 1, adapt=1)(lambda x: 1 / x, 1e-3) + 1e6
2.133892849087715e-05
>>> central_fdm(5, 1, adapt=2)(lambda x: 1 / x, 1e-3) + 1e6
5.097826942801476e-07 In general, I have not found much use of having more than two adaptation steps, and Furthermore, the cases >>> central_fdm(3, 2, adapt=0)(lambda x: x, 1.0)
0.0 gives |
This looks really good to me now! Let's see what others have to say. Another example where adaptation is useful is estimating the derivative of >>> forward_fdm(5, 1, adapt=0)(np.log, 1e-3)
996.0418094849194
>>> forward_fdm(5, 1, adapt=1)(np.log, 1e-3)
999.9999999445137 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this. It's going to make our lives much more straightforward :)
I'm happy for this to be merged as is, but it would be good to implement some tests around the log
example @wesselb gave, and maybe something involving inverting a slightly awkward matrix.
We should also test that things work as expected with adaptation near the edge of domains. e.g. with the log
example, you really do have to use forward_fdm
when working close to zero. You could check that backward_fdm
works with -log
for numbers just less than zero etc.
A tiny number added to some quantities to ensure that division by 0 does not occur. | ||
""" | ||
const TINY = 1e-20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to replace this constant by eps()
? Or have I misunderstood what this is going to be used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could do that. I was just following what's done in the Python implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tiny number here should be absolutely small, so I'm not sure eps()
makes sense. What we had before, eps(0.0)
, would be ideal, if it not were at the edge of the floating point range. 1e-20
is a compromise to have something absolutely small to what we usually work with, whilst not it being too small to cause issues with the numerics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the above comment doesn't really make sense. TINY
is supposed to be small compared to any number that we typically encounter, but eps()
is only small to numbers on the order of magnitude of 1
(or bigger). If we consider 1e-6
on the lower end of sizes of numbers we typically encounter, then eps(1e-6) = 2e-22
should be sufficient, which is where I originally got the 1e-20
from. Does that make more sense? 😬
eps::Real | ||
bound::Real | ||
step::Real | ||
accuracy::Real |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care that the types of eps
, bound
, etc are abstract? Does this likely have performance implications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, but this is a mutable type anyway, so it will already not be terribly performant. The problem with having them concrete is that when the History
object is constructed, we don't know the types of the values it will end up holding, since this is constructed before we call the FDMethod
on a function. The function could return Float32
, for example, and we wouldn't necessarily want to promote to Float64
. (Though a case could be made to do so.)
Another thing I haven't yet done here is have the current API simply call the new API; the code for both exists and is largely identical. |
julia> FDM.Central(3, 2, adapt=0)(identity, 1.0)
0.0
julia> central_fdm(3, 2)(identity, 1.0)
7.277897151650305e-7 Hmmmmm. |
Hmm, interesting. The Python implementation gives me a different result: >>> central_fdm(3, 2, adapt=0)(lambda x: x, 1.0)
0.0
>>> central_fdm(3, 2, adapt=1)(lambda x: x, 1.0)
0.0 I'll have a closer look at the code. |
The behavior implemented here matches Python, but what's odd is that the original implementation does not. |
Oh, I totally missed that the second call uses the old API; didn't have my morning coffee yet... I think it's due to the condition number: julia> central_fdm(3, 2; M=1)(x -> x, 1.0)
7.277897151650305e-7
julia> central_fdm(3, 2; M=100)(x -> x, 1.0)
0.0 |
This change consistutes a near rewrite of method handling.
Okay, assuming CI agrees, I think things should be in good shape now. Unfortunately method handling was almost entirely rewritten. 😐 Also note that I've added a |
UGH. Adaptively computing the bounds actually results in more test failures in ChainRules. I was hoping it would make it more robust to changes in the random seed. |
If it's too big, Bad Things can happen. See e.g. the added test.
This no longer destroys Nabla's tests once very minor adjustments are made in Nabla to facilitate this change. |
This looks very good to me! Happy to merge, if everyone else is so as well. |
This introduces a type-based approach that works slightly differently than the current API. Upon construction of a
FDMethod
subtype, the order of the method and the order of the derivative are checked for agreement, and the grid and coefficients are computed and stored. Instances of the types are callable, and doing so produces the value of the derivative. It works recursively when adaptively computing the bound, which results in a better step size and thus a (hopefully) more robust estimate of the derivative."Why types?" you may be asking. It just seemed like an efficient way of expressing things and storing intermediate values.
"Where did this adaptive thing come from?" From none other than Wessel himself: https://github.com/wesselb/fdm.