Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solver arrays not being cleared from memory. #680

Closed
JcbMussi opened this issue Sep 30, 2020 · 7 comments
Closed

Solver arrays not being cleared from memory. #680

JcbMussi opened this issue Sep 30, 2020 · 7 comments

Comments

@JcbMussi
Copy link

I have a recurring issue where, since I'm running a simulation on a PDE with a wide range of initial conditions, after the solve() call is finished large amounts of memory are still allocated and refuse to be garbage collected-etc.

function domainSetup(;XLIM=200,N=1000)
    #This only needs to be run once
    global xlim=XLIM
    global n=N
    global S=Fourier(-xlim..xlim)
    global x=points(S,n)
    global T=ApproxFun.plan_transform(S,Complex{Float64},n)
    global Ti=ApproxFun.plan_itransform(S,Complex{Float64},n)
    global D2=Derivative(S,2)
    global L=-1im*D2[1:n,1:n]
    global A =DiffEqArrayOperator(Diagonal(L))
    return nothing
end
function SNL(du,u,tmp,t)
    #This is the nonlinear operator solved during that phase of the split step.
  mul!(tmp,Ti,u)
  @. tmp.=-1im*tmp*(1-exp(-tmp*conj(tmp)))
  mul!(du,T,tmp)
end
domainSetup()

I'll run a block of code like this several times with different initial conditions.

u0=sech.(.34*x);
fu0=T*u0;
prob=SplitODEProblem(A, SNL, fu0,(0,400.0),similar(fu0));
sol=solve(prob, ETDRK4(),dt=0.01);

Running additional solves will keep compounding memory. It's not an issue at this scale but I need to be able to run solutions for 2 to 3 times longer in some cases and running batches of those regularly crashes my kernel.

I'm still very green when it comes to Julia and numerical methods in general so there's probably a big issue with the way I've laid out the problem but I can't seem to figure it out.

Any commentary would be appreciated.

@ChrisRackauckas
Copy link
Member

If you do sol=nothing and then call GC.gc() does it clear the memory? It should, and if that doesn't there's a weird error to investigate.

Another thing that's helpful is to use things like saveat, save_idxs, and SavingCallback to reduce what's saved. All are in the documentation.

@JcbMussi
Copy link
Author

Thanks.

Just tried it out.
sol=nothing + a garbage collection does clear everything. I'm a bit confused though. wouldn't reassigning sol every trial replace any data related to sol? Why was the memory allocation growing with each run of solve()?

@ChrisRackauckas
Copy link
Member

My guess is that it doesn't clear the memory for sol until after the computation is done, since in theory it could be modified in the function?

@helengracehuang
Copy link

helengracehuang commented Dec 12, 2022

Is there anything the developers of the package can do about this issue? I recently ran into this problem when I parallelized solving many large ODE models with multithreading. It kept crashing (out of memory for 370GB RAM servers) and I spent a long time troubleshooting this memory leak until finding this post. I tried the solution Chris mentioned and it finally worked. But isn't it a little ridiculous that the solver arrays are not collected by the GC? Especially if they are already reasigned or are out of the function?
Julia version: 1.8.3 (latest stable version)

@ChrisRackauckas
Copy link
Member

parallelized solving many large ODE models with multithreading. It kept crashing (out of memory for 370GB RAM servers) and I spent a long time troubleshooting this memory leak until finding this post

That's not a memory leak. If you are solving many large ODE models simultaneously with multithreading, then they all have to be held in memory. Multithreading is by definition shared memory multithreading, and the objects have to live in the memory pool. You have not described anything that resembles a memory leak, so please do not use that term. A memory leak is a case where memory that can be freed is not freed. But if you are multithreading, then you cannot safely remove the objects because they are still being computed on. Freeing those objects will result in incorrect computations and segfaults. This is true in any language: you cannot just free memory of things that are currently being computed!

If what you need is to reduce memory of an ensemble, a good way to do this is to not save any solutions and instead write them to disk. In terms of the ensemble interface https://docs.sciml.ai/DiffEqDocs/stable/features/ensemble/#Building-a-Problem, you can do something like define a reduction function that, instead of saving a solution in memory, writes that solution out to a .jld file. If you do something where you have an array of solutions, of course every solution has to be in memory and it would be incorrect (or segfault) if those values were changed (GC'd).

But isn't it a little ridiculous that the solver arrays are not collected by the GC?

No. If it's in the REPL you can in theory still access it, so it's incorrect to GC it if you can access the values. Otherwise you could access a value that could already have been GC'd, and thus you'd get incorrect junk. If you want something to be GC'd, you should remove all references. This is true for every single GC language, and if it wasn't true, that would be a non-deterministic correctness bug in the language!

But this is out of the purview of this package. DifferentialEquations.jl doesn't do anything special with memory handling, it's just using the GC and getting standard GC behavior. If you think that there is an issue with the GC, please report it with a reproducible case to https://github.com/JuliaLang/julia.

But I want to stress again that if you're running out of memory saving huge arrays, then don't save huge arrays. saveat, save_idxs, SavingCallback, etc. are all documented features that allow for saving in ways that has a much smaller footprint and thus uses a lot less memory. If the GC is working in a manner that is correct and you cannot free anything earlier because you need the references, then save less.

Let me know if you have any questions. But please, if you believe there is a memory leak somewhere, share an example and report it.

@helengracehuang
Copy link

I had to use the huge arrays because I was simulating a nonstiff model first followed by a stiff one that depends on the solution (at each time step) of the nonstiff model, if that makes sense. But I was fine with the objects living in the memory. The issue was that it should get deleted every time I finished that iteration and move on to simulating a new ODE. Instead, it didn't get cleared and just piled up over the course of the whole simulation. For example, I expected each iteration of ODE solving to take 1GB of memory. With 64 CPU cores, I should have steady 64GB memory usage throughout. Instead, It was 64GB at the start and slowing growing to the total RAM (370GB) of my server and crashed.

Anyways, my problem was fully solved (steady 64GB) by doing this ⬇️

If you do sol=nothing and then call GC.gc() does it clear the memory? It should, and if that doesn't there's a weird error to investigate.

And thanks for the suggestion about the ensemble!

If what you need is to reduce memory of an ensemble, a good way to do this is to not save any solutions and instead write them to disk. In terms of the ensemble interface https://docs.sciml.ai/DiffEqDocs/stable/features/ensemble/#Building-a-Problem, you can do something like define a reduction function that, instead of saving a solution in memory, writes that solution out to a .jld file. If you do something where you have an array of solutions, of course every solution has to be in memory and it would be incorrect (or segfault) if those values were changed (GC'd).

@ChrisRackauckas
Copy link
Member

Maybe there's a confusion as to what's going on here that I should describe in a bit more detail. This is something that can be seen without any ODE solvers involved. Say you have a 16 GB of RAM machine, and you're making arrays of size 12 GB. If you do:

x = ... # 12 GB array
x = ... # 12 GB array

you will OOM your machine because that requires more than 12 GB. You could think that it should only require 12 total, since it could GC the first array while defining the second, but it cannot in general. For example:

x = ... # 12 GB array
x = [x[1], ...]

it could have dependencies on the original array. If it did, then deleting the original array before creating the next one is dangerous because you can have undefined behavior for what you have done with the memory for how x[1] is defined. Now you could say, check if x shows up in the other array's definition, but then:

x = ... # 12 GB array
y = @view x[1:1]
x = [y[1], ...]

would have the same issue with an "arbitrarily" different array, so then you need some advanced aliasing analysis and such would be difficult to have correct in the most general of cases. So, the GC does not apply to the first array until after the second is defined. In fact, it acts like it first creates the new array, then binds it to the name x, and after the second array is bound to the name x, then the first array is unbound to any name and therefore it's GC eligible. That makes the whole system follow one rule: arrays get GC'd after they don't have a valid reference. But as a consequence this can cause a bit more memory that you may expect.

In order to prevent this, you just do:

x = ... # 12 GB array
x = nothing
x = ... # 12 GB array

because in the x = nothing step, the first array is unbound to any reference and thus will be GC'd because the second operation, thus never peaking above 12 GB.

So then:

Instead, it didn't get cleared and just piled up over the course of the whole simulation. For example, I expected each iteration of ODE solving to take 1GB of memory. With 64 CPU cores, I should have steady 64GB memory usage throughout. Instead, It was 64GB at the start and slowing growing to the total RAM (370GB) of my server and crashed.

My best guess as to what would be happening here is that it's related to GC behavior under multithreading and likely due to late GC-ing because the mark and sweep passes are not multithreaded themselves. This is why Julia's GC currently has subpar performance in multithreaded contexts (though that's being worked on), and by doing sol = nothing you're effectively giving it a much earlier marking point to GC the old arrays, which helps it then not stack up memory more. Maybe it should kick in earlier, but it doesn't.

But anyways, that would be some Julia Base issue with multithreading in GC contexts, and not something we would solve in the ODE solver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants