[WIP] reuse option in Newton Raphson method #206

frankschae · 2023-09-13T13:38:44Z

TODOs:

add a check for convergence, and automatically compute a new Jacobian if convergence slows or starts to diverge. Do we have to find a good heuristic ourselves or is there a paper that has some details about it?

codecov · 2023-09-13T13:51:33Z

Codecov Report

Merging #206 (093490a) into master (c6c875f) will decrease coverage by 0.36%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master     #206      +/-   ##
==========================================
- Coverage   48.45%   48.10%   -0.36%     
==========================================
  Files          19       19              
  Lines        1814     1846      +32     
==========================================
+ Hits          879      888       +9     
- Misses        935      958      +23

Files	Coverage Δ
src/raphson.jl	`57.89% <50.00%> (-15.13%)`	⬇️

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

frankschae · 2023-10-19T02:18:22Z

I had a discussion with @FHoltorf regarding an effective heuristic for updating the Jacobian:

Reusing the Jacobian becomes advantageous when taking relatively small Newton steps. If the function is smooth, the Jacobian shouldn't exhibit significant changes during these steps.
The proposition is to track the step size between Jacobian updates. When the norm of this size exceeds a predefined hyperparameter (HP1), we trigger a Jacobian update.
I will also check if we need additional checks, e.g. with respect to the residual.
I haven't mapped out the performance as a function of HP1 yet (I saw for a few examples roughly what we would expect, more function evaluations but less Jacobian evaluations).

ChrisRackauckas · 2023-10-19T02:34:39Z

That is precisely the thinking of Hairer II's approach. However, with a lot of benchmarking we have found that the optimal HP1 tends to be ... zero a lot of the time. So basically, just take new Jacobians when you start diverging instead of converging. That is a pretty hard heuristic to beat. In theory you could take it earlier if convergence slows, but we haven't found great schema for that. So for the first version I'd keep it simple just based off of a convergence heuristic set to zero.

ChrisRackauckas · 2023-10-19T02:35:01Z

You may want to base it off of the reuse heuristics of OrdinaryDiffEq.jl. Take a look at its nonlinear solvers which documents this.

frankschae · 2023-10-19T14:55:58Z

That is precisely the thinking of Hairer II's approach. However, with a lot of benchmarking we have found that the optimal HP1 tends to be ... zero a lot of the time.

Yeah seems to be the case for the 23 test problems as well (red line below = speed without reuse). I am wondering though why they look so similar.

using NonlinearSolve, LinearAlgebra, LinearSolve, NonlinearProblemLibrary, Test, BenchmarkTools, Plots

problems = NonlinearProblemLibrary.problems
dicts = NonlinearProblemLibrary.dicts

function test_on_library(problems, dicts, ϵ=1e-5)

    samples = 100
    secs = 5

    HP = [1e-14, 1e-13, 1e-12, 1e-11, 1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5]

    # store a list of all Plots
    plts = []

    alg = NewtonRaphson(; reuse=false)
    broken_tests = (1,6) # test problems where method does not converge to small residual

    for (idx, (problem, dict)) in enumerate(zip(problems, dicts))
        title = dict["title"]
        @testset "$title" begin
            # Newton without reuse
            x = dict["start"]
            res = similar(x)
            nlprob = NonlinearProblem(problem, x)
            sol = solve(nlprob, alg, abstol = 1e-18, reltol = 1e-18)
            problem(res, sol.u, nothing)
            broken = idx in broken_tests ? true : false
            @test norm(res)≤ϵ broken=broken

            balg = @benchmarkable solve(nlprob, $alg, abstol=1e-18, reltol=1e-18)
            talg = run(balg, samples=samples, seconds=secs)

            # Newton with reuse
            ts = []
            for reusetol in HP
                @show reusetol
                sol = solve(nlprob, NewtonRaphson(; reuse=true, reusetol=reusetol), abstol=1e-18, reltol=1e-18)
                problem(res, sol.u, nothing)
                broken = idx in broken_tests ? true : false
                @test norm(res) ≤ ϵ broken = broken


                balg2 = @benchmarkable solve(nlprob, NewtonRaphson(; reuse=true, reusetol=$reusetol), abstol=1e-18, reltol=1e-18)
                talg2 = run(balg2, samples=samples, seconds=secs)

                push!(ts, mean(talg2.times) / mean(talg.times))
            end

            pl = scatter(HP, ts, xaxis=:log, xticks=HP, label=false, xlabel="HP", ylabel="mean(time w/ reuse)/mean(time  wo/ reuse)", title=title)
            hline!([1.0], color = "red", label = false)
            display(pl)
            savefig(pl, "Reuse-Newton-" * title * ".png")
            push!(plts, pl)
        end
    end
    return plts
end

# NewtonRaphson
plts = test_on_library(problems, dicts)

# Merge the plots into a single plot
merged_plot = plot(plts..., layout=(5, 5), size=(2500, 2500), margin=10*Plots.mm)
savefig(merged_plot, "Reuse-Newton.png")

So basically, just take new Jacobians when you start diverging instead of converging. That is a pretty hard heuristic to beat.

So only recomputing when the residual increases?

You may want to base it off of the reuse heuristics of OrdinaryDiffEq.jl. Take a look at its nonlinear solvers which documents this.

I only scanned quickly through
https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/newton.jl
but didn't immediately see where the reuse happens. Is it in there?

ChrisRackauckas · 2023-10-19T16:22:25Z

So only recomputing when the residual increases?

Yup.

but didn't immediately see where the reuse happens. Is it in there?

https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/nlsolve.jl#L63-L96

FHoltorf · 2023-10-20T16:17:26Z

One heuristic (or part of one) that might be interesting to also try and incorporate is to recompute Jacobians when the Newton direction associated with the old Jacobian stops being a descent direction for the residual norm. If for nothing else but the fact that it would leverage the AD system nicely.

One would need to compute the gradient of the residual norm with reverse mode AD but, even though more expensive then just checking the residual, it would make things play nicely with linesearch. (If you don't make any of such checks, I am guessing you could run into trouble if you want to combine Jacobian reuse with linesearch?)

ChrisRackauckas · 2023-10-20T16:21:31Z

Ahh yes that would be an interesting approach to try.

frankschae · 2023-11-07T16:45:02Z

For the gradient of the residual norm, I think we would need to change the default norm.

https://github.com/SciML/DiffEqBase.jl/blob/master/src/common_defaults.jl#L26C1-L33C4 drops the gradient

function res_norm1(u)
    b = f(u, p)
    sqrt(sum(abs2, b) / length(u))
end

function res_norm2(u)
    b = f(u, p)
    DiffEqBase.ODE_DEFAULT_NORM(b, nothing)
end

ForwardDiff.gradient(res_norm2, u0) # => zero(u0)

(Why is it actually sqrt(sum(abs2, b) / length(u)) and not sqrt(sum(abs2, b)) / length(u)?

ChrisRackauckas · 2023-11-07T16:48:26Z

(Why is it actually sqrt(sum(abs2, b) / length(u)) and not sqrt(sum(abs2, b)) / length(u)?

That's just matching Hairer, I don't think it really matters which one it is.

https://github.com/SciML/DiffEqBase.jl/blob/master/src/common_defaults.jl#L26C1-L33C4 drops the gradient

We just shouldn't be differentiating the solvers here at all

ChrisRackauckas · 2023-11-07T16:48:50Z

how did the AD thing come up?

frankschae · 2023-11-07T16:53:19Z

One heuristic (or part of one) that might be interesting to also try and incorporate is to recompute Jacobians when the Newton direction associated with the old Jacobian stops being a descent direction for the residual norm. If for nothing else but the fact that it would leverage the AD system nicely.

One would need to compute the gradient of the residual norm with reverse mode AD but, even though more expensive then just checking the residual, it would make things play nicely with linesearch. (If you don't make any of such checks, I am guessing you could run into trouble if you want to combine Jacobian reuse with linesearch?)

Only for testing this idea, which might be necessary for Jacobian reuse if $\Delta x$ is not a decent direction. So I thought we'd need to compare $\Delta x$ (with the reused Jacobian) to $\frac{d ||f(x,p)||} {dx}$.

ChrisRackauckas · 2023-11-08T06:41:03Z

ahh yeah, the ODE one purposely changes away from differentiating the norm, which isn't what you want here.

frankschae · 2023-12-19T02:38:33Z

@ChrisRackauckas: @yonatanwesen, @FHoltorf, and I were talking again about a good example today. Was there any specific ODE system for which the reuse strategy in https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/nlsolve.jl#L63-L96 was particularly good? (To have another starting point.)

ChrisRackauckas · 2023-12-19T02:48:51Z

Anything sufficiently large. Bruss with N=32 should do.

avik-pal · 2023-12-19T21:41:51Z

SciML/SciMLBenchmarks.jl#796 has a benchmarking script. Drop the sundials and minpack versions and you should be able to test quite rapidly

avik-pal · 2023-12-20T14:53:06Z

https://sundials.readthedocs.io/en/latest/kinsol/Mathematics_link.html#jacobian-information-update-strategy sundials also reuses jacobian

avik-pal · 2023-12-20T14:53:37Z

though their scheme might not be ideal, given how long it takes for bruss

yonatanwesen · 2023-12-20T20:32:28Z

The results from benchmarking what we currently we have on 2d brusselator problem doesn't show any significant improvement with the jacobian reuse.

ChrisRackauckas · 2023-12-20T21:59:28Z

what's the cost breakdown like?

yonatanwesen · 2023-12-20T22:18:59Z

what's the cost breakdown like?

Do you mean the cost when I profile it?

ChrisRackauckas · 2023-12-21T01:47:45Z

Yes share some flamegraphs https://github.com/tkluck/StatProfilerHTML.jl

yonatanwesen · 2023-12-21T18:19:44Z

Yes share some flamegraphs https://github.com/tkluck/StatProfilerHTML.jl

Newton-reuse.zip
Newton-without-reuse.zip

avik-pal · 2023-12-23T15:54:39Z

Check the stats and trace to see how many times the factorization is reused

avik-pal · 2024-01-15T11:47:24Z

#345 supercedes this

ChrisRackauckas mentioned this pull request Sep 17, 2023

Line Search for Newton-Raphson #194

Closed

ChrisRackauckas mentioned this pull request Oct 18, 2023

Completeness for use in OrdinaryDiffEq.jl #247

Closed

frankschae force-pushed the reuse branch from 4dcc2f6 to 2c4009a Compare October 19, 2023 02:07

frankschae force-pushed the reuse branch from 2c4009a to 7018113 Compare November 2, 2023 15:33

frankschae added 3 commits November 3, 2023 12:01

rebase and add update heuristic.. need to fix hyperparameter

e52ef5a

avoid re-computing matrix factorization

01c2a37

add norm of res as update criterion

043d2b9

frankschae force-pushed the reuse branch from b061a52 to 043d2b9 Compare November 3, 2023 23:11

ChrisRackauckas requested a review from avik-pal November 4, 2023 07:28

frankschae and others added 2 commits November 6, 2023 08:20

Merge branch 'master' into reuse

f8f2362

default to false, make true optional (for large problems)

093490a

avik-pal closed this Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] reuse option in Newton Raphson method #206

[WIP] reuse option in Newton Raphson method #206

frankschae commented Sep 13, 2023

codecov bot commented Sep 13, 2023 •

edited

Loading

frankschae commented Oct 19, 2023

ChrisRackauckas commented Oct 19, 2023

ChrisRackauckas commented Oct 19, 2023

frankschae commented Oct 19, 2023

ChrisRackauckas commented Oct 19, 2023

FHoltorf commented Oct 20, 2023

ChrisRackauckas commented Oct 20, 2023

frankschae commented Nov 7, 2023

ChrisRackauckas commented Nov 7, 2023

ChrisRackauckas commented Nov 7, 2023

frankschae commented Nov 7, 2023

ChrisRackauckas commented Nov 8, 2023

frankschae commented Dec 19, 2023

ChrisRackauckas commented Dec 19, 2023

avik-pal commented Dec 19, 2023

avik-pal commented Dec 20, 2023

avik-pal commented Dec 20, 2023

yonatanwesen commented Dec 20, 2023

ChrisRackauckas commented Dec 20, 2023

yonatanwesen commented Dec 20, 2023

ChrisRackauckas commented Dec 21, 2023

yonatanwesen commented Dec 21, 2023

avik-pal commented Dec 23, 2023

avik-pal commented Jan 15, 2024

[WIP] reuse option in Newton Raphson method #206

[WIP] reuse option in Newton Raphson method #206

Conversation

frankschae commented Sep 13, 2023

codecov bot commented Sep 13, 2023 • edited Loading

Codecov Report

frankschae commented Oct 19, 2023

ChrisRackauckas commented Oct 19, 2023

ChrisRackauckas commented Oct 19, 2023

frankschae commented Oct 19, 2023

ChrisRackauckas commented Oct 19, 2023

FHoltorf commented Oct 20, 2023

ChrisRackauckas commented Oct 20, 2023

frankschae commented Nov 7, 2023

ChrisRackauckas commented Nov 7, 2023

ChrisRackauckas commented Nov 7, 2023

frankschae commented Nov 7, 2023

ChrisRackauckas commented Nov 8, 2023

frankschae commented Dec 19, 2023

ChrisRackauckas commented Dec 19, 2023

avik-pal commented Dec 19, 2023

avik-pal commented Dec 20, 2023

avik-pal commented Dec 20, 2023

yonatanwesen commented Dec 20, 2023

ChrisRackauckas commented Dec 20, 2023

yonatanwesen commented Dec 20, 2023

ChrisRackauckas commented Dec 21, 2023

yonatanwesen commented Dec 21, 2023

avik-pal commented Dec 23, 2023

avik-pal commented Jan 15, 2024

codecov bot commented Sep 13, 2023 •

edited

Loading