-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
further fix for Float32 bisect infinite loop #10
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10 +/- ##
=======================================
Coverage 83.59% 83.59%
=======================================
Files 5 5
Lines 445 445
=======================================
Hits 372 372
Misses 73 73
Continue to review full report at Codecov.
|
The linesearch is not supposed to keep reducing the width of the interval satisfies |
By which I wanted to say, yes |
My case is reproducible but would be kind of non-trivial to get you set up running it. The gradient is correct but slightly innacurate. Its based on a target function which involves solving an ODE, with gradients of this function which solve the "gradient ODE" rather than taking e.g. an AD gradient through the original ODE solver (similar to what arises in neural ordinary differential equations), so its only as accurate as these ODE solves, which are of course only accurate to some tolerance. Here's a trace of the debug statement there for this run, not sure if there's anything there helpful for you to say anything beyond what you've already said (which makes sense):
|
src/linesearches.jl
Outdated
@@ -102,7 +102,7 @@ function bisect(iter::HagerZhangLineSearchIterator, a::LineSearchPoint, b::LineS | |||
fmax = p₀.f + ϵ | |||
numfg = 0 | |||
while true | |||
if (b.α - a.α) <= eps(typeof(b.α)) | |||
if (b.α - a.α) <= eps(b.α) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is even better because if b.α>1
then eps(b.α) > eps(Float32)
(I did run into a case where I needed this otherwise I still got stuck in an infinite loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but also the other way around. If b.alpha = 1e-3, then eps(b.alpha) is much smaller. Not sure if that would be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't appear to be a problem for me both of my original cases, where it caught the infinite loop. I think it makes some sense since (b.α - a.α) <= eps(b.α)
basically tells you theres no way to make b
any closer to a
given floating point precision and where b
currently is, so stop trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but if b.α
is already smaller than 1, it is unlikely say differences smaller than eps(typeof(b.α))
in the value of α
will still be accurately reflected in the differences between function value and gradient computed at those two points. So maybe something like eps(max(b.α, one(b.α)))
is the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like it would catch everything I ran into as well, and break even earlier in fact, so I'm happy to update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, let me explictly check that this is the case for the errors I had before, which I can do tomorrow.
So I assume that this is certainly due to inaccuracies, not because you have a highly irregular function and the peak is actually there. The values seem very large; is there a way to more properly normalize your function? There is one parameter in the linesearch, |
Thanks, very helpful description!
That's interesting, I hadn't appreciated that normalization here could matter. My function is basically a χ² in fairly high dimensions, hence the large values. I will play around with this |
Great, but the best solution would be to make sure that the gradients and function values are sufficiently accurate such that things like the above cannot happen. To make it more explicit: the current values are saying that the directional derivative along the line is -1825008.250000 in point a and -1810365.500000 in point b, whereas finite difference tells you that derivative along this line is approximately: |
Ok, I went with your suggestion, and confirmed it works on my original problem. Ready from my end. You're definitely right about the gradient accuracy, I did crank up my ODE accuracy and it helped. In any case, this fix was useful to have anyway, especially when I was trying to nail down the extremum as much as possible with lots of iterations, it would infinite loop near the end. |
Ok thanks. I'll merge. Just one final warning that relying on this branch to terminate your linesearch is rather fragile and indicative of an underlying problem ! |
I just ran into a case that was infinite looping because
(b.α - a.α)
exactly equaledeps(Float32)
and each successive iteration didn't changeb.α
ora.α
. This seems to fix it.As an aside, out of curiousity, but is there one of the HagerZhangLineSearch parameters I can change to make the linesearch not try so hard to get a step-size to machine precision (which I doubt I really need)? I'm not really familiar with the algorithm so don't know what its various parameters do.