Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very long compilation time for solve() #117

Closed
cpf-work opened this issue Oct 9, 2023 · 20 comments
Closed

Very long compilation time for solve() #117

cpf-work opened this issue Oct 9, 2023 · 20 comments

Comments

@cpf-work
Copy link

cpf-work commented Oct 9, 2023

I've upgraded to v5.1 today and found very long compilation times when trying the simple pendulum example.

My base environment:
[6e4b80f9] BenchmarkTools v1.3.2
[7073ff75] IJulia v1.24.2
[295af30f] Revise v3.5.6
[0f1e0344] WebIO v0.8.21
[de0858da] Printf

I created a minimal environment for testing:
[764a87c0] BoundaryValueDiffEq v5.1.0
[91a5bcdd] Plots v1.39.0

I'm using Julia 1.9.3.

I run "Example 1: Simple pendulum" from the docs and find

@time sol1 = solve(bvp1, MIRK4(), dt = 0.05)
225.577779 seconds (289.71 M allocations: 60.160 GiB, 6.70% gc time, 99.98% compilation time)

Second execution is fast:
@time sol1 = solve(bvp1, MIRK4(), dt = 0.05)
0.001343 seconds (6.28 k allocations: 879.165 KiB)

I'm quite confident this did not happend when I played around 1-2 weeks ago with an earlier version. I tried reverting to v5.0.0, but I'm not experienced in doing stuff like this and obtained an error message upon solve(), so I can't tell whether this also happens for me with v5.0.0 now.

I managed to revert to v4.0.1
⌃ [764a87c0] BoundaryValueDiffEq v4.0.1

And I get more reasonable compilation times:
@time sol1 = solve(bvp1, MIRK4(), dt = 0.05)
28.803127 seconds (42.95 M allocations: 6.039 GiB, 6.34% gc time, 99.76% compilation time)

@mateuszbaran
Copy link

👍

It's even worse in Manifolds.jl (JuliaManifolds/Manifolds.jl#657):

13210.001646 seconds (24.48 G allocations: 7.394 TiB, 9.48% gc time)

vs BoundaryValueDiffEq v4.0.1:

142.380662 seconds (234.62 M allocations: 17.268 GiB, 4.31% gc time, 82.72% compilation time: 1% of which was recompilation)

@ChrisRackauckas
Copy link
Member

Does it go away with Julia v1.10?

@cpf-work
Copy link
Author

Yes. With Julia 1.10.0-beta3 and base environment
[6e4b80f9] BenchmarkTools v1.3.2
[764a87c0] BoundaryValueDiffEq v5.1.0

@time sol1 = solve(bvp1, MIRK4(), dt = 0.05)
11.107603 seconds (12.80 M allocations: 860.423 MiB, 3.11% gc time, 99.92% compilation time)

@mateuszbaran
Copy link

Yes, kind of -- after just 22s I get that failure: #118 .

@mateuszbaran
Copy link

I've rerun CI and now it doesn't even finish within 6 hours.

@ChrisRackauckas
Copy link
Member

Check if SciML/NonlinearSolve.jl#261 made a difference.

If not, can you share an invalidation report? Someone make a sample code just like https://sciml.ai/news/2022/09/21/compile_time/#profiling_and_fixing_sources_of_invalidations and share the plot and the major invalidators. That would highlight what needs to be fixed.

@mateuszbaran
Copy link

I've made a (somewhat) minimal example:

using Manifolds
using BoundaryValueDiffEq
M = Manifolds.EmbeddedTorus(3, 2)
A = Manifolds.DefaultTorusAtlas()
p0x = [0.5, -1.2]
a2 = [-0.5, 0.3]
sol_log = Manifolds.solve_chart_log_bvp(M, p0x, a2, A, (0, 0))

It didn't finish computing within 30 minutes. It feels more like Julia compiler can't handle the code rather than just invalidations. Anyway, I will try leaving it running for longer.

@cpf-work
Copy link
Author

Does the following help you? I used exactly the code from @ChrisRackauckas link and put the simple pendulum example in it. I don't really know what it's doing, but hopefully that's what was asked for :)

[764a87c0] BoundaryValueDiffEq v5.2.0
[91a5bcdd] Plots v1.39.0
[aa65fe97] SnoopCompile v2.10.8

using SnoopCompile

invalidations = @snoopr begin
    using BoundaryValueDiffEq

    const g = 9.81
    const L = 1.0
    tspan = (0.0, pi / 2)
    function simplependulum!(du, u, p, t)
        θ = u[1]
        dθ = u[2]
        du[1] = dθ
        du[2] = -(g / L) * sin(θ)
    end
    
    function bc1!(residual, u, p, t)
        residual[1] = u[end ÷ 2][1] + pi / 2
        residual[2] = u[end][1] - pi / 2
    end
    
    bvp1 = BVProblem(simplependulum!, bc1!, [pi / 2, pi / 2], tspan)
    sol1 = solve(bvp1, MIRK4(), dt = 0.05)
end;

trees = SnoopCompile.invalidation_trees(invalidations);

@show length(SnoopCompile.uinvalidated(invalidations)) # show total invalidations

show(trees[end]) # show the most invalidated method

# Count number of children (number of invalidations per invalidated method)
n_invalidations = map(trees) do methinvs
    SnoopCompile.countchildren(methinvs)
end

gives the following output

length(SnoopCompile.uinvalidated(invalidations)) = 1386
inserting unwrapcontext(io::IJulia.IJuliaStdio) @ IJulia C:\Users\m129319\.julia\packages\IJulia\Vo51o\src\stdio.jl:23 invalidated:
   backedges: 1: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (190 children)
              2: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (62 children)
              3: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (1 children)
              4: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (1 children)
              5: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (6 children)
              6: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (2 children)
              7: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (7 children)
              8: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (3 children)
              9: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (1 children)
113-element Vector{Int64}:
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   1
   1
   ⋮
  85
  85
  85
  85
  85
  85
  85
  85
  85
  85
 128
 273

and the plot:

import Plots
Plots.plot(
    1:length(trees),
    n_invalidations;
    markershape=:circle,
    xlabel="i-th method invalidation",
    label="Number of children per method invalidations"
)

image

@mateuszbaran
Copy link

Here is a reproducer without external dependencies:

using SnoopCompile

function affine_connection(a, Xc, Yc)
    MR = 3.0
    Mr = 2.0
    Zc = similar(Xc)
    θ = a[1]
    sinθ, cosθ = sincos(θ)
    Γ¹₂₂ = (MR + Mr * cosθ) * sinθ / Mr
    Γ²₁₂ = -Mr * sinθ / (MR + Mr * cosθ)

    Zc[1] = Xc[2] * Γ¹₂₂ * Yc[2]
    Zc[2] = Γ²₁₂ * (Xc[1] * Yc[2] + Xc[2] * Yc[1])
    return Zc
end

function chart_log_problem!(du, u, params, t)
    mid = div(length(u), 2)
    a = u[1:mid]
    dx = u[(mid + 1):end]
    ddx = -affine_connection(a, dx, dx)
    du[1:mid] .= dx
    du[(mid + 1):end] .= ddx
    return du
end

using BoundaryValueDiffEq
a1 = [0.5, -1.2]
a2 = [-0.5, 0.3]

dt = 0.05
solver = MIRK4()

tspan = (0.0, 1.0)
function bc1!(residual, u, p, t)
    mid = div(length(u[1]), 2)
    residual[1:mid] = u[1][1:mid] - a1
    return residual[(mid + 1):end] = u[end][1:mid] - a2
end
bvp1 = BVProblem(
    chart_log_problem!,
    bc1!,
    (p, t) -> vcat(t * a1 + (1 - t) * a2, zero(a1)),
    tspan,
)
sol1 = solve(bvp1, solver, dt=dt)

It works fine with BoundaryValueDiffEq v4. I'm running invalidation checking locally and it's still running after about 2 hours.

@avik-pal
Copy link
Member

@mateuszbaran Are this with the latest release of NonlinearSolve or the master?

@mateuszbaran
Copy link

This is with master branch of NonlinearSolve.

@avik-pal
Copy link
Member

I can reproduce it as well. @ChrisRackauckas does the heavy use of closures here affect the timings? I can lift them out if needed

@avik-pal
Copy link
Member

bvp1 = BVProblem(
    chart_log_problem!,
    bc1!,
    (p, t) -> vcat(t * a1 + (1 - t) * a2, zero(a1)),
    tspan,
)

u0 = (p, t) -> vcat(t * a1 + (1 - t) * a2, zero(a1)) is not supported

@mateuszbaran
Copy link

What should I use instead then?

@avik-pal
Copy link
Member

pass in a vector of arrays with a uniform discretization

@mateuszbaran
Copy link

Thanks, that works. From my POV the issue is resolved then 👍 .

@mateuszbaran
Copy link

I'd just suggest updating the docs which imply that this kind of u0 is supported: https://docs.sciml.ai/DiffEqDocs/stable/types/bvp_types/#SciMLBase.BVProblem (look for initialGuess(t))

@ChrisRackauckas
Copy link
Member

It's supposed to be supported. The docs are the correct one here. Did we update it to include parameters with the breaking? @avik-pal can we make sure it's supported in the mirks?

@ChrisRackauckas
Copy link
Member

Should be fixed by #127

@avik-pal
Copy link
Member

avik-pal commented Nov 4, 2023

It's supposed to be supported. The docs are the correct one here. Did we update it to include parameters with the breaking? @avik-pal can we make sure it's supported in the mirks?

#133 will allow u0 as a function. I realized even currently we are able to handle it since it turns the initial guess function into u0(p, t0) which is not ideal. After that PR is merged, we can initialize the complete grid with this.

Did we update it to include parameters with the breaking?

No we did not. But I am supporting it via static_hasmethod. We can drop the only t version in the next breaking release i guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants