-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic finite differences #666
Conversation
…o feature_result_from_history
…DCM/pyPESTO into feature_result_from_history
…DCM/pyPESTO into feature_result_from_history
|
Codecov Report
@@ Coverage Diff @@
## develop #666 +/- ##
============================================
- Coverage 87.71% 34.83% -52.88%
============================================
Files 95 96 +1
Lines 6136 6350 +214
============================================
- Hits 5382 2212 -3170
- Misses 754 4138 +3384
Continue to review full report at Codecov.
|
Co-authored-by: Paul Jonas Jost <70631928+PaulJonasJost@users.noreply.github.com>
Co-authored-by: Paul Jonas Jost <70631928+PaulJonasJost@users.noreply.github.com>
I currently don't see an example, might be nice to have something like that? :) |
One could make one, but I think that would be rather artificial at the moment and refactored anyway when we decide on how to structure the pyPESTO tutorial, therefore I would rather wait until then with an example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, looks good
return request.param | ||
|
||
|
||
def test_fds(fd_method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test, does exactly what it should!
delta_fun: | ||
FD step sizes for gradient and Hessian. | ||
Can be either a float, or a :class:`np.ndarray` of shape (n_par,) | ||
for different step sizes for different coordinates. | ||
delta_res: | ||
FD step sizes for residual sensitivities. | ||
Similar to `delta_fun`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a distinct step size for sres, but not for the Hessian? If we already open the box of allowing different step sizes for different derivatives: Then also provide one for "hessian via gradients"... right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, adding one
FD step sizes for residual sensitivities. | ||
Similar to `delta_fun`. | ||
method: | ||
Method to calculate FDs. Currently, only "center" is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall for good reasons... Would even omit this option totally. Central FD are typically (and from theory) much more accurate... I don't see a good reason for only allowing forward or backward ones...
hess[ix1, ix2] = hess[ix2, ix1] = \ | ||
(fpp - fpm - fmp + fmm) / (delta1_val * delta2_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a particular form of 2nd order FD for the diagonal entries of the Hessian, which should be however equivalent to this here... Probably, the particular form is just a simplification, which comes out after plugging in the values from this general form here. However, Might make sense to double check. At least, I would say this here is correct...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note the diagonal is computed a few lines above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, indeed... Sorry, seems like I missed that one...
Would also agree to add an example. However, also fine with adding one in the tutorial. In this case, please open an issue that this won't be forgotten! :) |
Added an example in the docstring now at least ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, only minor comments :) (I am a bit curious about the "inconsistency" in the hessian between mixed and non-mixed 2nd derivatives...)
or 1 in sensi_orders and not self.has_sres | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or 1 in sensi_orders and not self.has_sres | |
) | |
or 1 in sensi_orders and not self.has_sres | |
or 2 in sensi_orders | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if Hessian information is asked for in MODE_RES
? (Even though current residual-based optimizers might not need that info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not aware of any optimizer that does as most stick to the linear structure ... but could of course be added then.
|
||
if isinstance(delta_fun, str) or isinstance(delta_res, str): | ||
raise NotImplementedError( | ||
"Adaptive FD step sizes are not implemented yet.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, why it is getting sexy recently having a comma in the end of anything recently (My taste remains old-school there... :D) but if we do that, then I would do it consistently (e.g. two lines below as well...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean in f"Method must be one of {methods}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha it is recommended because it allows modifying less lines when e.g. adding additional arguments. true, adding one below.
Derivative method for the gradient. | ||
hess: | ||
Derivative method for the Hessian | ||
sres: | ||
Derivative method for the residual sensitivities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh I do not completely get from that, what e.g. grad=True
or grad=None
would mean. Where would I get the gradient from in these cases? The objective or FD? And also, why is the Derivative method
a boolean value? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{True, False, None} here is a ternary variable: None means whatever is available, True means always FD, False never. Documented that a few lines above, will add hints.
# This is the main method to overwrite from the base class, it handles | ||
# and delegates the actual objective evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> doc string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rather not use docstring here, as the docstring should be the one from the base class, this here is just a hint to the programmer.
elif self.method == FD.FORWARD: | ||
f2p = f_fval(x + 2 * delta1) | ||
fc = f_fval(x + delta1) | ||
f2m = fval | ||
elif self.method == FD.BACKWARD: | ||
f2p = fval | ||
fc = f_fval(x - delta1) | ||
f2m = f_fval(x - 2 * delta1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure about that, doesn't that essentially approximate the 2nd derivative at x+delta? But I guess that is what the forward/backward mode does for the 1st derivative as well, so should be fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is (f'(x) - f'(x-d)) / d \approx ( (f(x) - f(x-d)) / d - (f(x-d) - f(x-2d)) / d) / d, so yes should be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is (f'(x) - f'(x-d)) / d \approx ( (f(x) - f(x-d)) / d - (f(x-d) - f(x-2d)) / d) / d, so yes should be correct.
if self.method == FD.CENTRAL: | ||
fpp = f_fval(x + delta1 / 2 + delta2 / 2) | ||
fpm = f_fval(x + delta1 / 2 - delta2 / 2) | ||
fmp = f_fval(x - delta1 / 2 + delta2 / 2) | ||
fmm = f_fval(x - delta1 / 2 - delta2 / 2) | ||
elif self.method == FD.FORWARD: | ||
fpp = f_fval(x + delta1 + delta2) | ||
fpm = f_fval(x + delta1 + 0) | ||
fmp = f_fval(x + 0 + delta2) | ||
fmm = fval | ||
elif self.method == FD.BACKWARD: | ||
fpp = fval | ||
fpm = f_fval(x + 0 - delta2) | ||
fmp = f_fval(x - delta1 + 0) | ||
fmm = f_fval(x - delta1 - delta2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, there is the possibility to take the f_fval(x+delta1 + delta2)
here and elsewhere (so drop the /2
everywhere) and instead go for (fpp - fpm - fmp + fmm) / (4 * delta1_val * delta2_val)
. This would be consistent with the diagonal elements...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though now I am confused, why there is no 4*
in the diagonals...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the step size should always be delta, i.e. from x - delta/2 to x + delta/2 (central), or from x to x + delta (forward). I think that should be consistent ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the matlab implementation, it used a stepsize of delta for the off-diagonals, and delta/2 on the diagonals, which may be not the most consistent ... even though one would save a few evaluations when simultaneously computing gradients.
Boolean indicating whether mode is supported | ||
""" | ||
raise NotImplementedError() | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that in another PR but shouldn't we implement something that checks that the sensi_orders
are even valid? right now if we would call check_sensi_orders(sensi_orders=(3,))
it would return True
. How about adding an if(max(sensi_orders)>2): raise ValueError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make sense
Implementation of central finite differences as an objective class. Can be wrapped around existing objectives to provide finite difference approximations to derivatives (grad, hess, sres).
Not implemented yet:
Choice of method (forward, backward, centered=only),closes #18