-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Interrupted. Larger maxiters is needed." warning when garbage collection is added #1001
Comments
Please share the MWE. Are you appropriately setting the heap size when moving it to the HPC setting? And BTW, did you try other BVP solvers like the MIRK ones? Generally those will scale better. |
Dear Dr. Rackauckas, I'll develop an MWE. I can't share the code publicly, but I strongly suspect I can replicate it with a generic stiff problem. The heap size is appropriate (I've tried 1G, 10G, and 50G. The total memory available to the program is 100G. They all behave the same) and I've shown that the problem is with garbage collection, not with the high performance computing environment: I can induce the same issue on my laptop if I add GC.gc() calls to the code. It only showed up on the HPC because garbage collection had to be called because of the limited heap size. Regarding BVP solvers, this system is highly unstable, and will often have a lot of +infinity/-infinity values around the correct local minimum, so, for example, the Regula Falsi method doesn't find the local minimum. I'll look into the BVP solvers more, but I'm skeptical because its so unstable. |
I haven't quite replicated the issue, but I'm seeing a situation where garbage collection causes a change that seems related. The two attached codes are the same except that one runs garbage collection and the other doesn't. The one without garbage collection gives a lot of "Warning: dt() <= dtmin() at t =0.0" The one with garbage collection gives far fewer warnings like this, and none after garbage collection runs the first time. in this case that doesn't stop it from pushing through (if anything it improves its performance) but its not clear to me why the appearance of these warnings depends on a garbage collection call later in the code. |
I'm not sure I understand the MWE:
My immediate guess just by looking at it is that there must be some uninitialized memory that is being used in the RadauIIA5 Jacobain, because it's an odd one. So two questions:
|
"tol_solve_ivp = 10^(-30) that's impossible to achieve with Float64 if relative values are around 10^4 for some things." That's odd to me because I've had it successfully converge with Radau. I used Matlab's symbolic math toolbox to get my Jacobian. Where's the error? |
Adding jac_mat .= 0 to the start of the Jacobians in my original code fixed the problem, at least on my PC, alongside fixing a typo (in one of my Jacobians I had a mistake of the form |
Okay awesome, this narrowed it down. Basically, for one of the Jacobians in RadauIIA5 there was uninitialized memory being used. In almost any normal case, someone would fill the
In theory it's possible, though Float64 can only guarantee a relative accuracy of around 1e-16, i.e.
The file that you sent had: function Jacobian_func(jac_mat, u, p, t)
p1, p2, p3, p4 = p
end Not sure if that was intentional but it did expose the fact that the
Yup so it was using the uninitialized memory, awesome this should have at least deterministic behavior now of treating the Jacobian as zero (and we should find a better way to error in these cases). |
I am solving a boundary value problem using a shooting method, by repeatedly calling an ODE solver from within an outer ODE solver. This code runs on my PC, but when I moved it to an HPC environment that uses slurm, it crashed.
I discovered (https://discourse.julialang.org/t/is-there-a-way-to-limit-memory-as-reported-by-sys-total-memory/93909) that activating Garbage Collection periodically prevents the crash by stopping the code from exceeding the amount of memory slurm allocates to it.
However, when I add Garbage Collection I see a type of warning I haven't seen before: similar to this issue #739 I see messages saying "Interrupted. Larger maxiters is needed. If you are using an integrator for non-stiff ODEs or an automatic switching algorithm (the default), you may want to consider using a method for stiff equations. See the solver pages for more details". I'm using RadauIIA5() so stiffness isn't the issue and I don't get this error message on my PC (or on the login node of the HPC system).
This doesn't seem to fix the problem: #680
The text was updated successfully, but these errors were encountered: