Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large allocation count from VariableRateJump #969

Closed
meson800 opened this issue Jul 17, 2023 · 8 comments · Fixed by SciML/JumpProcesses.jl#334
Closed

Large allocation count from VariableRateJump #969

meson800 opened this issue Jul 17, 2023 · 8 comments · Fixed by SciML/JumpProcesses.jl#334

Comments

@meson800
Copy link

meson800 commented Jul 17, 2023

I have a system that I would like to simulate, that contains both a core ODE model, combined with discrete stochastic equations driven by VariableRateJump's, in addition to a few ContinuousCallbacks. I've been observing large amounts of allocations even when all saving is turned off (around 1M allocations) despite optimization, and doing weird stuff like reducing the root finder tolerance reduces the allocations can cause the allocation count to drop.

I've reduced this system down to the following minimum working (failing?) example which doesn't save anything and has callbacks and jumps that never actually occur (but should still trigger the callback checks):

using DifferentialEquations
using Profile

function mwe_derivatives!(du, u, p, t)
    du[1] = -u[1]
end

mwe_problem = ODEProblem{true,SciMLBase.FullSpecialize}(
    mwe_derivatives!,
    [10.0],
    [0.0, 1000.0],
);

mwe_callback_problem = ODEProblem{true,SciMLBase.FullSpecialize}(
    mwe_derivatives!,
    [10.0],
    [0.0, 1000.0],
    callback=ContinuousCallback((_,_,_)->1.0, (integrator)->nothing)
);

mwe_jump_problem = JumpProblem(mwe_problem, Direct(), ConstantRateJump(
    (_,_,_)->0.0,
    (integrator) -> nothing
    ), save_everystep=false, save_positions=(false,false),
);

mwe_variable_jump_problem = JumpProblem(mwe_problem, Direct(), VariableRateJump(
    (_,_,_)->0.0,
    (integrator) -> nothing
    ), save_everystep=false, save_positions=(false,false),
);

print("ODE: ")
solve(mwe_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);
@time solve(mwe_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);

print("ODE + callback: ")
solve(mwe_callback_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);
@time solve(mwe_callback_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);

print("ODE + ConstantRateJump: ")
solve(mwe_jump_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);
@time solve(mwe_jump_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);

print("ODE + VariableRateJump: ")
solve(mwe_variable_jump_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);
@time solve(mwe_variable_jump_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);

Profile.clear()
Profile.init(delay=0.00001)
@profile solve(mwe_variable_jump_problem, BS3(), dtmax=0.1, save_on=false, save_start=false, save_end=false);
open("profile.txt", "w") do s
    Profile.print(IOContext(s, :displaysize => (24, 15000)), mincount=100)
end

I expect that the ContinuousCallback should be useless (because it never crosses zero), and similarly expect the JumpProblem jumps to be useless, since their rate is uniformly zero. However, these four problems show a dramatic difference in the number of allocations (and also differ by an order of magnitude in terms of runtime). In my real system, I'm seeing a solution take ~40 seconds vs ~4 seconds if I try to minimally remove VariableRateJumps (though this makes it a different system than I'd like to actually simulate).

ODE:   0.002365 seconds (33 allocations: 4.062 KiB)
ODE + callback:   0.011512 seconds (35 allocations: 4.422 KiB)
ODE + ConstantRateJump:   0.003341 seconds (63 allocations: 6.016 KiB)
ODE + VariableRateJump:   0.017667 seconds (40.12 k allocations: 636.422 KiB)

Things I have tried

  • This is not related to the ODE solver; both Tsit5 or BS3 have the same allocation behavior. I tuned down the dtmax just to get enough iterations to show a difference in the MWE.
  • My actual system does need VariableRateJumps, I can't get away with just ConstantRateJumps.
  • I tried to take a memory profile with both --track-allocations and Profile.Allocs.@profile. The first doesn't show anything particularly interesting in the user code, and the new allocation profiler just returns Profile.Allocs.UnknownType as > 98% of the allocations, with a stacktrace that I can't make heads of tails of.
  • Using the stacktrace profile, it seems like profiling the case with the allocations is spending most of its time in DiffEqBase/src/internal_falsi.jl and OrdinaryDiffEq/src/dense/generic_dense.jl. I've attached the profile generated by the above MWE here (profile.txt). I looked at the relevant lines, and these functions seem to be non-allocating.

Relevant environment

  • Julia 1.9.2
  • DifferentialEquations 7.8.0
  • DiffEqBase 6.126.0
  • JumpProcesses 9.6.4
  • DiffEqCallbacks 2.26.1
@isaacsas
Copy link
Member

Did you try save_positions = (false,false) to JumpProblem? That is what controls saving before/after a jump. save_everystep doesn't do anything to my knowledge.

@meson800
Copy link
Author

meson800 commented Jul 17, 2023

Good thought, I added that/edited the above, and the allocations still occur unfortunately.

In the MWE above, I was already doing save_on=false, which should disable all saving since there is an early-return out of _savevalues!. Also, I've set it up so that the callbacks or jumps never actually occur, so there shouldn't have been any saving anyway.

In my actual system I need saving of course, but it's weird that the allocations still happen with all saving disabled.

@meson800
Copy link
Author

meson800 commented Jul 17, 2023

My current best guess from the profile trace is that the allocations are coming from this stacktrace in the profile:

  218  @OrdinaryDiffEq/src/dense/generic_dense.jl:166; current_interpolant!
   187  @OrdinaryDiffEq/src/dense/generic_dense.jl:123; ode_interpolant!
    187  @OrdinaryDiffEq/src/dense/generic_dense.jl:577; ode_interpolant!
     187  @OrdinaryDiffEq/src/dense/generic_dense.jl:589; _ode_interpolant!
      181  @OrdinaryDiffEq/src/dense/generic_dense.jl:630; hermite_interpolant!(out::ExtendedJumpArray{Float64, 1, Vector{Float64}, Vector{Float64}}, Θ::Float64, dt::Float64, y₀::ExtendedJumpArray{Float64, 1, Vector{Float64}, Vector{Float64}}, y₁::ExtendedJumpArray{Float64, 1, Vector{Float64}, Vector{Float64}}, k::Vector{ExtendedJumpArray{Float64, 1, Vector{Float64}, Vector{Float64}}}, idxs::Nothing, T::Type{Val{0}})

but this isn't allocating when we're using normal Array{Float64,1} (e.g. the cases without a VariableRateJump). It's only something about the ExtendedJumpArray that causes thousands of ~16B allocations.

I'll try to investigate more.

@isaacsas
Copy link
Member

ExtendedJumpArrays have other known issues with broadcasting so perhaps this is related. They're missing a bunch of broadcast overloads. (Though I don't know if simply missing an overload could cause a type unstable fallback, which it seems like you may be hitting here somewhere.)

@meson800
Copy link
Author

meson800 commented Jul 18, 2023

I located where the allocations are coming from. Pkg.develop is really cool.

It seems like the 16B allocations are coming from calls that look like this, in any of the ODE solvers:

integrator.EEst = integrator.opts.internalnorm(atmp, t)

When the problem uses an ExtendedJumpArray, this allocates memory. When it doesn't (when VariableRateJumps) aren't used, then it no longer allocates memory.

Going one step deeper, it seems like we are falling back to this definition:

@inline function ODE_DEFAULT_NORM(u::AbstractArray, t)
    Base.FastMath.sqrt_fast(UNITLESS_ABS2(u) / max(recursive_length(u), 1))
end

e.g. if I drop a @time here, I see 3 out of the four allocations per loop.

Doing a nice code_warntype has three red Any's hanging out; I assume the fourth allocation per loop is because integrator.EEst is getting set equal to an Any :

julia> jump_array = ExtendedJumpArray{Float64, 1, Vector{Float64}, Vector{Float64}}([0.0, 0.0], [0.0])
julia> @code_warntype DiffEqBase.ODE_DEFAULT_NORM(jump_array, 0.0)

-- snip --
│    %15 = DiffEqBase.UNITLESS_ABS2(u)::Any
│    %16 = DiffEqBase.recursive_length(u)::Int64
│    %17 = DiffEqBase.max(%16, 1)::Int64
│    %18 = (%15 / %17)::Any
-- snip --
│    %51 = val::Any
└───       return %51

Is this something that could be fixed with more broadcast overloads? I'll see if I can figure this out and PR but would appreciate suggestions.

@isaacsas
Copy link
Member

You could probably just define a dispatch of it in JumpProcesses src/extended_jump_array.jl file for ExtendedJumpArrays.

@inline function DiffEqBase.ODE_DEFAULT_NORM(u::ExtendedJumpArray, t)
...
end

@isaacsas
Copy link
Member

If you want to make a JumpProcesses PR with a custom dispatch that would be great (please add some tests too in the appropriate test file!).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants