Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

further fix for Float32 bisect infinite loop #10

Merged
merged 3 commits into from
Nov 25, 2020

Conversation

marius311
Copy link
Contributor

I just ran into a case that was infinite looping because (b.α - a.α) exactly equaled eps(Float32) and each successive iteration didn't change b.α or a.α. This seems to fix it.

As an aside, out of curiousity, but is there one of the HagerZhangLineSearch parameters I can change to make the linesearch not try so hard to get a step-size to machine precision (which I doubt I really need)? I'm not really familiar with the algorithm so don't know what its various parameters do.

@codecov-io
Copy link

codecov-io commented Nov 23, 2020

Codecov Report

Merging #10 (71dbe75) into master (e6e0a56) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #10   +/-   ##
=======================================
  Coverage   83.59%   83.59%           
=======================================
  Files           5        5           
  Lines         445      445           
=======================================
  Hits          372      372           
  Misses         73       73           
Impacted Files Coverage Δ
src/linesearches.jl 86.62% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6e0a56...71dbe75. Read the comment docs.

@Jutho
Copy link
Owner

Jutho commented Nov 23, 2020

The linesearch is not supposed to keep reducing the width of the interval satisfies b.α - a.α <= eps(typeof(b.α)). The fact that this case just returns instead of errors is indeed because I also see it happening sometimes, however, this happens during development and the cause is almost always that my gradient is not correct or accurate. However, if you have a reproducible test case where the gradient is correct and this happens, I would be more than happy to take a look.

@Jutho
Copy link
Owner

Jutho commented Nov 23, 2020

By which I wanted to say, yes <= is better than <, it's just, this should not be a case that is encountered in a typical application of these methods.

@marius311
Copy link
Contributor Author

My case is reproducible but would be kind of non-trivial to get you set up running it. The gradient is correct but slightly innacurate. Its based on a target function which involves solving an ODE, with gradients of this function which solve the "gradient ODE" rather than taking e.g. an AD gradient through the original ODE solver (similar to what arises in neural ordinary differential equations), so its only as accurate as these ODE solves, which are of course only accurate to some tolerance.

Here's a trace of the debug statement there for this run, not sure if there's anything there helpful for you to say anything beyond what you've already said (which makes sense):

Linesearch bisect: [a, b] = [1.000000, 3.000000], b-a = 2.000000, dϕᵃ = -318039.656250, dϕᵇ = -7621403213824.000000, (ϕᵇ - ϕᵃ) = 993761.000000
↪︎ c = 2.000000, dϕᶜ = -697125760.000000, ϕᶜ - ϕᵃ = 4349.000000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.000000, 2.000000], b-a = 1.000000, dϕᵃ = -318039.656250, dϕᵇ = -697125760.000000, (ϕᵇ - ϕᵃ) = 4349.000000
↪︎ c = 1.500000, dϕᶜ = -15467180.000000, ϕᶜ - ϕᵃ = 530.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.000000, 1.500000], b-a = 0.500000, dϕᵃ = -318039.656250, dϕᵇ = -15467180.000000, (ϕᵇ - ϕᵃ) = 530.750000
↪︎ c = 1.250000, dϕᶜ = -2234933.500000, ϕᶜ - ϕᵃ = 214.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.000000, 1.250000], b-a = 0.250000, dϕᵃ = -318039.656250, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 214.500000
↪︎ c = 1.125000, dϕᶜ = -846204.187500, ϕᶜ - ϕᵃ = 131.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.125000, 1.250000], b-a = 0.125000, dϕᵃ = -846204.187500, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 83.250000
↪︎ c = 1.187500, dϕᶜ = -1388984.125000, ϕᶜ - ϕᵃ = 5.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.187500, 1.250000], b-a = 0.062500, dϕᵃ = -1388984.125000, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 78.000000
↪︎ c = 1.218750, dϕᶜ = -1759182.875000, ϕᶜ - ϕᵃ = 2.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.218750, 1.250000], b-a = 0.031250, dϕᵃ = -1759182.875000, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 75.500000
↪︎ c = 1.234375, dϕᶜ = -1985041.875000, ϕᶜ - ϕᵃ = 64.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.218750, 1.234375], b-a = 0.015625, dϕᵃ = -1759182.875000, dϕᵇ = -1985041.875000, (ϕᵇ - ϕᵃ) = 64.250000
↪︎ c = 1.226563, dϕᶜ = -1867974.000000, ϕᶜ - ϕᵃ = 48.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.218750, 1.226563], b-a = 0.007813, dϕᵃ = -1759182.875000, dϕᵇ = -1867974.000000, (ϕᵇ - ϕᵃ) = 48.250000
↪︎ c = 1.222656, dϕᶜ = -1817276.875000, ϕᶜ - ϕᵃ = -16.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.226563], b-a = 0.003906, dϕᵃ = -1817276.875000, dϕᵇ = -1867974.000000, (ϕᵇ - ϕᵃ) = 65.000000
↪︎ c = 1.224609, dϕᶜ = -1844881.375000, ϕᶜ - ϕᵃ = 49.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.224609], b-a = 0.001953, dϕᵃ = -1817276.875000, dϕᵇ = -1844881.375000, (ϕᵇ - ϕᵃ) = 49.250000
↪︎ c = 1.223633, dϕᶜ = -1847311.750000, ϕᶜ - ϕᵃ = 59.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.223633], b-a = 0.000977, dϕᵃ = -1817276.875000, dϕᵇ = -1847311.750000, (ϕᵇ - ϕᵃ) = 59.250000
↪︎ c = 1.223145, dϕᶜ = -1829651.000000, ϕᶜ - ϕᵃ = 104.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.223145], b-a = 0.000488, dϕᵃ = -1817276.875000, dϕᵇ = -1829651.000000, (ϕᵇ - ϕᵃ) = 104.750000
↪︎ c = 1.222900, dϕᶜ = -1809481.125000, ϕᶜ - ϕᵃ = 23.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222900], b-a = 0.000244, dϕᵃ = -1817276.875000, dϕᵇ = -1809481.125000, (ϕᵇ - ϕᵃ) = 23.250000
↪︎ c = 1.222778, dϕᶜ = -1818077.500000, ϕᶜ - ϕᵃ = 34.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222778], b-a = 0.000122, dϕᵃ = -1817276.875000, dϕᵇ = -1818077.500000, (ϕᵇ - ϕᵃ) = 34.250000
↪︎ c = 1.222717, dϕᶜ = -1821230.000000, ϕᶜ - ϕᵃ = 55.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222717], b-a = 0.000061, dϕᵃ = -1817276.875000, dϕᵇ = -1821230.000000, (ϕᵇ - ϕᵃ) = 55.500000
↪︎ c = 1.222687, dϕᶜ = -1828160.500000, ϕᶜ - ϕᵃ = 70.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222687], b-a = 0.000031, dϕᵃ = -1817276.875000, dϕᵇ = -1828160.500000, (ϕᵇ - ϕᵃ) = 70.500000
↪︎ c = 1.222672, dϕᶜ = -1824028.000000, ϕᶜ - ϕᵃ = 20.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222672], b-a = 0.000015, dϕᵃ = -1817276.875000, dϕᵇ = -1824028.000000, (ϕᵇ - ϕᵃ) = 20.250000
↪︎ c = 1.222664, dϕᶜ = -1810365.500000, ϕᶜ - ϕᵃ = 21.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222664], b-a = 0.000008, dϕᵃ = -1817276.875000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 21.500000
↪︎ c = 1.222660, dϕᶜ = -1825008.250000, ϕᶜ - ϕᵃ = -24.000000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222660, 1.222664], b-a = 0.000004, dϕᵃ = -1825008.250000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 45.500000
↪︎ c = 1.222662, dϕᶜ = -1809843.125000, ϕᶜ - ϕᵃ = -91.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222662, 1.222664], b-a = 0.000002, dϕᵃ = -1809843.125000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 137.000000
↪︎ c = 1.222663, dϕᶜ = -1806411.125000, ϕᶜ - ϕᵃ = 17.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222663, 1.222664], b-a = 0.000001, dϕᵃ = -1806411.125000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 119.250000
↪︎ c = 1.222663, dϕᶜ = -1821687.250000, ϕᶜ - ϕᵃ = -18.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222663, 1.222664], b-a = 0.000000, dϕᵃ = -1821687.250000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 138.000000
↪︎ c = 1.222664, dϕᶜ = -1817648.125000, ϕᶜ - ϕᵃ = 113.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222664, 1.222664], b-a = 0.000000, dϕᵃ = -1817648.125000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 24.750000
↪︎ c = 1.222664, dϕᶜ = -1816927.875000, ϕᶜ - ϕᵃ = -1.000000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222664, 1.222664], b-a = 0.000000, dϕᵃ = -1816927.875000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 25.750000
↪︎ c = 1.222664, dϕᶜ = -1810365.500000, ϕᶜ - ϕᵃ = 25.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222664, 1.222664], b-a = 0.000000, dϕᵃ = -1816927.875000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 25.750000
(repeats forever...)

@@ -102,7 +102,7 @@ function bisect(iter::HagerZhangLineSearchIterator, a::LineSearchPoint, b::LineS
fmax = p₀.f + ϵ
numfg = 0
while true
if (b.α - a.α) <= eps(typeof(b.α))
if (b.α - a.α) <= eps(b.α)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is even better because if b.α>1 then eps(b.α) > eps(Float32) (I did run into a case where I needed this otherwise I still got stuck in an infinite loop)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but also the other way around. If b.alpha = 1e-3, then eps(b.alpha) is much smaller. Not sure if that would be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't appear to be a problem for me both of my original cases, where it caught the infinite loop. I think it makes some sense since (b.α - a.α) <= eps(b.α) basically tells you theres no way to make b any closer to a given floating point precision and where b currently is, so stop trying.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but if b.α is already smaller than 1, it is unlikely say differences smaller than eps(typeof(b.α)) in the value of α will still be accurately reflected in the differences between function value and gradient computed at those two points. So maybe something like eps(max(b.α, one(b.α))) is the best choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it would catch everything I ran into as well, and break even earlier in fact, so I'm happy to update the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let me explictly check that this is the case for the errors I had before, which I can do tomorrow.

@Jutho
Copy link
Owner

Jutho commented Nov 24, 2020

Linesearch bisect: [a, b] = [1.222660, 1.222664], b-a = 0.000004, dϕᵃ = -1825008.250000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 45.500000 ↪︎ c = 1.222662, dϕᶜ = -1809843.125000, ϕᶜ - ϕᵃ = -91.500000, wolfe = 0, approxwolfe = 0
So what these things tell you is:
The local derivative along the line that is being searched is -1825008.250000 in point a and -1810365.500000 in point b, so its still strongly descending, but levelling off slightly (i.e. convex). However, despite the points being close together and both on a descending slope, the difference in function values is (ϕᵇ - ϕᵃ) = 45.500000, i.e. b has a higher function value than a. So if these values are correct and accurate, this can only be if there is another steep ascending slope in between a and b. The linesearch continues as it tries to resolve this additional peak in between.

So I assume that this is certainly due to inaccuracies, not because you have a highly irregular function and the peak is actually there. The values seem very large; is there a way to more properly normalize your function? There is one parameter in the linesearch, ϵ, which determines that small increases can be accepted, and here, small is of course an absolute measure, as the function value of an optimization problem does not have an absolute meaning in itself (i.e. it can be positive and negative, you can add a constant to it). But this means that this parameter, you should probably set different if indeed you expect huge function values.

@marius311
Copy link
Contributor Author

Thanks, very helpful description!

The values seem very large; is there a way to more properly normalize your function?

That's interesting, I hadn't appreciated that normalization here could matter. My function is basically a χ² in fairly high dimensions, hence the large values. I will play around with this ϵ parameter and/or renormalizing before feeding into OptimKit.

@Jutho
Copy link
Owner

Jutho commented Nov 24, 2020

Great, but the best solution would be to make sure that the gradients and function values are sufficiently accurate such that things like the above cannot happen. To make it more explicit: the current values are saying that the directional derivative along the line is -1825008.250000 in point a and -1810365.500000 in point b, whereas finite difference tells you that derivative along this line is approximately:
45.500000/0.000004 = +1.1375e7 .

@marius311
Copy link
Contributor Author

marius311 commented Nov 25, 2020

Ok, I went with your suggestion, and confirmed it works on my original problem. Ready from my end.

You're definitely right about the gradient accuracy, I did crank up my ODE accuracy and it helped. In any case, this fix was useful to have anyway, especially when I was trying to nail down the extremum as much as possible with lots of iterations, it would infinite loop near the end.

@Jutho
Copy link
Owner

Jutho commented Nov 25, 2020

Ok thanks. I'll merge. Just one final warning that relying on this branch to terminate your linesearch is rather fragile and indicative of an underlying problem !

@Jutho Jutho merged commit 753f69d into Jutho:master Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants