further fix for Float32 bisect infinite loop #10

marius311 · 2020-11-23T08:46:48Z

I just ran into a case that was infinite looping because (b.α - a.α) exactly equaled eps(Float32) and each successive iteration didn't change b.α or a.α. This seems to fix it.

As an aside, out of curiousity, but is there one of the HagerZhangLineSearch parameters I can change to make the linesearch not try so hard to get a step-size to machine precision (which I doubt I really need)? I'm not really familiar with the algorithm so don't know what its various parameters do.

codecov-io · 2020-11-23T08:48:36Z

Codecov Report

Merging #10 (71dbe75) into master (e6e0a56) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master      #10   +/-   ##
=======================================
  Coverage   83.59%   83.59%           
=======================================
  Files           5        5           
  Lines         445      445           
=======================================
  Hits          372      372           
  Misses         73       73

Impacted Files	Coverage Δ
src/linesearches.jl	`86.62% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6e0a56...71dbe75. Read the comment docs.

Jutho · 2020-11-23T08:57:09Z

The linesearch is not supposed to keep reducing the width of the interval satisfies b.α - a.α <= eps(typeof(b.α)). The fact that this case just returns instead of errors is indeed because I also see it happening sometimes, however, this happens during development and the cause is almost always that my gradient is not correct or accurate. However, if you have a reproducible test case where the gradient is correct and this happens, I would be more than happy to take a look.

Jutho · 2020-11-23T08:58:17Z

By which I wanted to say, yes <= is better than <, it's just, this should not be a case that is encountered in a typical application of these methods.

marius311 · 2020-11-23T09:27:59Z

My case is reproducible but would be kind of non-trivial to get you set up running it. The gradient is correct but slightly innacurate. Its based on a target function which involves solving an ODE, with gradients of this function which solve the "gradient ODE" rather than taking e.g. an AD gradient through the original ODE solver (similar to what arises in neural ordinary differential equations), so its only as accurate as these ODE solves, which are of course only accurate to some tolerance.

Here's a trace of the debug statement there for this run, not sure if there's anything there helpful for you to say anything beyond what you've already said (which makes sense):

Linesearch bisect: [a, b] = [1.000000, 3.000000], b-a = 2.000000, dϕᵃ = -318039.656250, dϕᵇ = -7621403213824.000000, (ϕᵇ - ϕᵃ) = 993761.000000
↪︎ c = 2.000000, dϕᶜ = -697125760.000000, ϕᶜ - ϕᵃ = 4349.000000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.000000, 2.000000], b-a = 1.000000, dϕᵃ = -318039.656250, dϕᵇ = -697125760.000000, (ϕᵇ - ϕᵃ) = 4349.000000
↪︎ c = 1.500000, dϕᶜ = -15467180.000000, ϕᶜ - ϕᵃ = 530.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.000000, 1.500000], b-a = 0.500000, dϕᵃ = -318039.656250, dϕᵇ = -15467180.000000, (ϕᵇ - ϕᵃ) = 530.750000
↪︎ c = 1.250000, dϕᶜ = -2234933.500000, ϕᶜ - ϕᵃ = 214.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.000000, 1.250000], b-a = 0.250000, dϕᵃ = -318039.656250, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 214.500000
↪︎ c = 1.125000, dϕᶜ = -846204.187500, ϕᶜ - ϕᵃ = 131.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.125000, 1.250000], b-a = 0.125000, dϕᵃ = -846204.187500, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 83.250000
↪︎ c = 1.187500, dϕᶜ = -1388984.125000, ϕᶜ - ϕᵃ = 5.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.187500, 1.250000], b-a = 0.062500, dϕᵃ = -1388984.125000, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 78.000000
↪︎ c = 1.218750, dϕᶜ = -1759182.875000, ϕᶜ - ϕᵃ = 2.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.218750, 1.250000], b-a = 0.031250, dϕᵃ = -1759182.875000, dϕᵇ = -2234933.500000, (ϕᵇ - ϕᵃ) = 75.500000
↪︎ c = 1.234375, dϕᶜ = -1985041.875000, ϕᶜ - ϕᵃ = 64.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.218750, 1.234375], b-a = 0.015625, dϕᵃ = -1759182.875000, dϕᵇ = -1985041.875000, (ϕᵇ - ϕᵃ) = 64.250000
↪︎ c = 1.226563, dϕᶜ = -1867974.000000, ϕᶜ - ϕᵃ = 48.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.218750, 1.226563], b-a = 0.007813, dϕᵃ = -1759182.875000, dϕᵇ = -1867974.000000, (ϕᵇ - ϕᵃ) = 48.250000
↪︎ c = 1.222656, dϕᶜ = -1817276.875000, ϕᶜ - ϕᵃ = -16.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.226563], b-a = 0.003906, dϕᵃ = -1817276.875000, dϕᵇ = -1867974.000000, (ϕᵇ - ϕᵃ) = 65.000000
↪︎ c = 1.224609, dϕᶜ = -1844881.375000, ϕᶜ - ϕᵃ = 49.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.224609], b-a = 0.001953, dϕᵃ = -1817276.875000, dϕᵇ = -1844881.375000, (ϕᵇ - ϕᵃ) = 49.250000
↪︎ c = 1.223633, dϕᶜ = -1847311.750000, ϕᶜ - ϕᵃ = 59.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.223633], b-a = 0.000977, dϕᵃ = -1817276.875000, dϕᵇ = -1847311.750000, (ϕᵇ - ϕᵃ) = 59.250000
↪︎ c = 1.223145, dϕᶜ = -1829651.000000, ϕᶜ - ϕᵃ = 104.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.223145], b-a = 0.000488, dϕᵃ = -1817276.875000, dϕᵇ = -1829651.000000, (ϕᵇ - ϕᵃ) = 104.750000
↪︎ c = 1.222900, dϕᶜ = -1809481.125000, ϕᶜ - ϕᵃ = 23.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222900], b-a = 0.000244, dϕᵃ = -1817276.875000, dϕᵇ = -1809481.125000, (ϕᵇ - ϕᵃ) = 23.250000
↪︎ c = 1.222778, dϕᶜ = -1818077.500000, ϕᶜ - ϕᵃ = 34.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222778], b-a = 0.000122, dϕᵃ = -1817276.875000, dϕᵇ = -1818077.500000, (ϕᵇ - ϕᵃ) = 34.250000
↪︎ c = 1.222717, dϕᶜ = -1821230.000000, ϕᶜ - ϕᵃ = 55.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222717], b-a = 0.000061, dϕᵃ = -1817276.875000, dϕᵇ = -1821230.000000, (ϕᵇ - ϕᵃ) = 55.500000
↪︎ c = 1.222687, dϕᶜ = -1828160.500000, ϕᶜ - ϕᵃ = 70.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222687], b-a = 0.000031, dϕᵃ = -1817276.875000, dϕᵇ = -1828160.500000, (ϕᵇ - ϕᵃ) = 70.500000
↪︎ c = 1.222672, dϕᶜ = -1824028.000000, ϕᶜ - ϕᵃ = 20.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222672], b-a = 0.000015, dϕᵃ = -1817276.875000, dϕᵇ = -1824028.000000, (ϕᵇ - ϕᵃ) = 20.250000
↪︎ c = 1.222664, dϕᶜ = -1810365.500000, ϕᶜ - ϕᵃ = 21.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222656, 1.222664], b-a = 0.000008, dϕᵃ = -1817276.875000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 21.500000
↪︎ c = 1.222660, dϕᶜ = -1825008.250000, ϕᶜ - ϕᵃ = -24.000000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222660, 1.222664], b-a = 0.000004, dϕᵃ = -1825008.250000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 45.500000
↪︎ c = 1.222662, dϕᶜ = -1809843.125000, ϕᶜ - ϕᵃ = -91.500000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222662, 1.222664], b-a = 0.000002, dϕᵃ = -1809843.125000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 137.000000
↪︎ c = 1.222663, dϕᶜ = -1806411.125000, ϕᶜ - ϕᵃ = 17.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222663, 1.222664], b-a = 0.000001, dϕᵃ = -1806411.125000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 119.250000
↪︎ c = 1.222663, dϕᶜ = -1821687.250000, ϕᶜ - ϕᵃ = -18.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222663, 1.222664], b-a = 0.000000, dϕᵃ = -1821687.250000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 138.000000
↪︎ c = 1.222664, dϕᶜ = -1817648.125000, ϕᶜ - ϕᵃ = 113.250000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222664, 1.222664], b-a = 0.000000, dϕᵃ = -1817648.125000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 24.750000
↪︎ c = 1.222664, dϕᶜ = -1816927.875000, ϕᶜ - ϕᵃ = -1.000000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222664, 1.222664], b-a = 0.000000, dϕᵃ = -1816927.875000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 25.750000
↪︎ c = 1.222664, dϕᶜ = -1810365.500000, ϕᶜ - ϕᵃ = 25.750000, wolfe = 0, approxwolfe = 0
Linesearch bisect: [a, b] = [1.222664, 1.222664], b-a = 0.000000, dϕᵃ = -1816927.875000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 25.750000
(repeats forever...)

marius311 · 2020-11-23T11:26:37Z

src/linesearches.jl

@@ -102,7 +102,7 @@ function bisect(iter::HagerZhangLineSearchIterator, a::LineSearchPoint, b::LineS
    fmax = p₀.f + ϵ
    numfg = 0
    while true
-        if (b.α - a.α) <= eps(typeof(b.α))
+        if (b.α - a.α) <= eps(b.α)


I think this is even better because if b.α>1 then eps(b.α) > eps(Float32) (I did run into a case where I needed this otherwise I still got stuck in an infinite loop)

Yes but also the other way around. If b.alpha = 1e-3, then eps(b.alpha) is much smaller. Not sure if that would be a problem.

It didn't appear to be a problem for me both of my original cases, where it caught the infinite loop. I think it makes some sense since (b.α - a.α) <= eps(b.α) basically tells you theres no way to make b any closer to a given floating point precision and where b currently is, so stop trying.

Yes, but if b.α is already smaller than 1, it is unlikely say differences smaller than eps(typeof(b.α)) in the value of α will still be accurately reflected in the differences between function value and gradient computed at those two points. So maybe something like eps(max(b.α, one(b.α))) is the best choice.

That seems like it would catch everything I ran into as well, and break even earlier in fact, so I'm happy to update the PR.

Actually, let me explictly check that this is the case for the errors I had before, which I can do tomorrow.

Jutho · 2020-11-24T09:40:24Z

Linesearch bisect: [a, b] = [1.222660, 1.222664], b-a = 0.000004, dϕᵃ = -1825008.250000, dϕᵇ = -1810365.500000, (ϕᵇ - ϕᵃ) = 45.500000 ↪︎ c = 1.222662, dϕᶜ = -1809843.125000, ϕᶜ - ϕᵃ = -91.500000, wolfe = 0, approxwolfe = 0
So what these things tell you is:
The local derivative along the line that is being searched is -1825008.250000 in point a and -1810365.500000 in point b, so its still strongly descending, but levelling off slightly (i.e. convex). However, despite the points being close together and both on a descending slope, the difference in function values is (ϕᵇ - ϕᵃ) = 45.500000, i.e. b has a higher function value than a. So if these values are correct and accurate, this can only be if there is another steep ascending slope in between a and b. The linesearch continues as it tries to resolve this additional peak in between.

So I assume that this is certainly due to inaccuracies, not because you have a highly irregular function and the peak is actually there. The values seem very large; is there a way to more properly normalize your function? There is one parameter in the linesearch, ϵ, which determines that small increases can be accepted, and here, small is of course an absolute measure, as the function value of an optimization problem does not have an absolute meaning in itself (i.e. it can be positive and negative, you can add a constant to it). But this means that this parameter, you should probably set different if indeed you expect huge function values.

marius311 · 2020-11-24T09:52:44Z

Thanks, very helpful description!

The values seem very large; is there a way to more properly normalize your function?

That's interesting, I hadn't appreciated that normalization here could matter. My function is basically a χ² in fairly high dimensions, hence the large values. I will play around with this ϵ parameter and/or renormalizing before feeding into OptimKit.

Jutho · 2020-11-24T09:56:55Z

Great, but the best solution would be to make sure that the gradients and function values are sufficiently accurate such that things like the above cannot happen. To make it more explicit: the current values are saying that the directional derivative along the line is -1825008.250000 in point a and -1810365.500000 in point b, whereas finite difference tells you that derivative along this line is approximately:
45.500000/0.000004 = +1.1375e7 .

marius311 · 2020-11-25T03:00:27Z

Ok, I went with your suggestion, and confirmed it works on my original problem. Ready from my end.

You're definitely right about the gradient accuracy, I did crank up my ODE accuracy and it helped. In any case, this fix was useful to have anyway, especially when I was trying to nail down the extremum as much as possible with lots of iterations, it would infinite loop near the end.

Jutho · 2020-11-25T08:17:41Z

Ok thanks. I'll merge. Just one final warning that relying on this branch to terminate your linesearch is rather fragile and indicative of an underlying problem !

further fix for Float32 bisect infinite loop

c87cf07

more robust fix

4b5a739

marius311 commented Nov 23, 2020

View reviewed changes

final fix

71dbe75

Jutho merged commit 753f69d into Jutho:master Nov 25, 2020

marius311 mentioned this pull request Feb 12, 2021

Still an infinite loop in bisect #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

further fix for Float32 bisect infinite loop #10

further fix for Float32 bisect infinite loop #10

marius311 commented Nov 23, 2020

codecov-io commented Nov 23, 2020 •

edited

Loading

Jutho commented Nov 23, 2020

Jutho commented Nov 23, 2020

marius311 commented Nov 23, 2020

marius311 Nov 23, 2020

Jutho Nov 23, 2020

marius311 Nov 23, 2020

Jutho Nov 24, 2020

marius311 Nov 24, 2020

marius311 Nov 24, 2020

Jutho commented Nov 24, 2020 •

edited

Loading

marius311 commented Nov 24, 2020

Jutho commented Nov 24, 2020

marius311 commented Nov 25, 2020 •

edited

Loading

Jutho commented Nov 25, 2020

further fix for Float32 bisect infinite loop #10

further fix for Float32 bisect infinite loop #10

Conversation

marius311 commented Nov 23, 2020

codecov-io commented Nov 23, 2020 • edited Loading

Codecov Report

Jutho commented Nov 23, 2020

Jutho commented Nov 23, 2020

marius311 commented Nov 23, 2020

marius311 Nov 23, 2020

Choose a reason for hiding this comment

Jutho Nov 23, 2020

Choose a reason for hiding this comment

marius311 Nov 23, 2020

Choose a reason for hiding this comment

Jutho Nov 24, 2020

Choose a reason for hiding this comment

marius311 Nov 24, 2020

Choose a reason for hiding this comment

marius311 Nov 24, 2020

Choose a reason for hiding this comment

Jutho commented Nov 24, 2020 • edited Loading

marius311 commented Nov 24, 2020

Jutho commented Nov 24, 2020

marius311 commented Nov 25, 2020 • edited Loading

Jutho commented Nov 25, 2020

codecov-io commented Nov 23, 2020 •

edited

Loading

Jutho commented Nov 24, 2020 •

edited

Loading

marius311 commented Nov 25, 2020 •

edited

Loading