Skip to content

Latest commit

 

History

History
354 lines (283 loc) · 15.2 KB

faq.md

File metadata and controls

354 lines (283 loc) · 15.2 KB

Frequently Asked Questions

This page is a compilation of frequently asked questions and answers.

Performance

Do you support GPUs? Multithreading? Distributed computation?

Yes. The *DiffEq.jl libraries (OrdinaryDiffEq.jl, StochasticDiffEq.jl, and DelayDiffEq.jl) are all written to be generic to the array and number types. This means they will adopt the implementation that is given by the array type. The in-place algorithms internally utilize Julia's broadcast (with some exceptions due to a Julia bug for now, see this issue) and Julia's A_mul_B! in-place matrix multiplication function. The out-of-place algorithms utilize standard arithmetical functions. Both additionally utilize the user's norm specified via the common interface options and, if a stiff solver, ForwardDiff/DiffEqDiffTools for the Jacobian calculation, and Base linear factorizations for the linear solve. For your type, you may likely need to give a better form of the norm, Jacobian, or linear solve calculations to fully utilize parallelism.

GPUArrays.jl, ArrayFire.jl, DistributedArrays.jl have been tested and work in various forms, where the last one is still not recommended for common use yet.

The next question is whether it matters. Generally, your system has to be large for parallelism to matter. Using a multithreaded array for broadcast we find helpful around N>1000, though the Sundials manual says N>100,000. For high order Runge-Kutta methods it's likely lower than the Sundials estimate because of more operations packed into each internal step, but as always that will need more benchmarks to be precise and will depend on the problem being solved. GPUs generally require some intensive parallel operation in the user's f function to be viable, for example a matrix multiplication for a stencil computation in a PDE. If you're simply solving some ODE element-wise on a big array it likely won't do much or it will slow things down just due to how GPUs work. DistributedArrays require parallel linear solves to really matter, and thus are only recommended when you have a problem that cannot fit into memory or are using a stiff solver with a Krylov method for the linear solves.

My ODE is solving really slow... what do I do?

First, check for bugs. These solvers go through a ton of convergence tests and so if there's a solver issue, it's either just something to do with how numerical methods work or it's a user-error (generally the latter, though check the later part of the FAQ on normal numerical errors). User-errors in the f function causing a divergence of the solution is the most common reason for reported slow codes.

If you have no bugs, great! The standard tricks for optimizing Julia code then apply. What you want to do first is make sure your function does not allocate. If your system is small (<=100 ODEs/SDEs/DDEs/DAEs?), then you should set your system up to use StaticArrays.jl. This is demonstrated in the ODE tutorial with static matrices. Static vectors/arrays are stack-allocated, and thus creating new arrays is free and the compiler doesn't have to heap-allocate any of the temporaries (that's the expensive part!). These have specialized super fast dispatches for arithmetic operations and extra things like QR-factorizations, and thus they are preferred when possible. However, they lose efficiency if they grow too large.

For anything larger, you should use the in-place syntax f(du,u,p,t) and make sure that your function doesn't allocate. Assuming you know of a u0, you should be able to do:

du = similar(u0)
@time f(du,u0,p,t)

and see close to zero allocations and close to zero memory allocated. If you see more, then you might have a type-instability or have temporary arrays. To find type-instabilities, you should do:

@code_warntype f(du,u,p,t)

and read the printout to see if there's any types that aren't inferred by the compiler, and fix them. If you have any global variables, you should make them const. As for allocations, some common things that allocate are:

  • Array slicing, like u[1:5]. Instead, use @view u[1:5]
  • Matrix multiplication with *. Instead of A*b, use A_mul_B!(c,A,b) for some pre-allocated cache vector c.
  • Non-broadcasted expressions. Every expression on arrays should .= into another array, or it should be re-written to loop and do computations with scalar (or static array) values.

For an example of optimizing a function resulting from a PDE discretization, see this blog post.

The stiff solver takes forever to take steps for my PDE discretization! Why?

The solvers for stiff solvers require solving a nonlinear equation each step. In order to do so, they have to do a few Newton steps. By default, these methods assume that the Jacobian is dense, automatically calculate the Jacobian for you, and do a dense factorization. However, in many cases you may want to use alternatives that are more tuned for your problem.

First of all, when available, it's recommended that you pass a function for computing your Jacobian. This is discussed in the performance overloads section. Jacobians are especially helpful for Rosenbrock methods.

Secondly, if your Jacobian isn't dense, you shouldn't use a dense Jacobian! In the Sundials algorithm you can set linear_solver=:Band for banded Jacobians for example. More support is coming for this soon.

But lastly, you shouldn't use a dense factorization for large sparse matrices. Instead, if you're using a *DiffEq library you should specify a linear solver. For Sundials.jl, you should change the linear_solver option. See the ODE solve Sundials portion for details on that. Right now, Sundials.jl is the recommended method for stiff problems with large sparse Jacobians. linear_solver=:Band should be used if your Jacobian is banded and you can specify the band sizes. If you only know the Jacobian is sparse, linear_solver=:GMRES is a good option. Once again, a good reference for how to handle PDE discretizations can be found at this blog post.

Complicated Models

Can I switch my ODE function in the middle of integration?

There are a few ways to do this. The simplest way is to just have a parameter to switch between the two. For example:

function f(du,u,p,t)
  if p == 0
    du[1] = 2u[1]
  else
    du[1] = -2u[1]
  end
  du[2] = -u[2]
end

Then in a callback you can make the affect! function modify integrator.prob.p. For example, we can make it change when u[2]<0.5 via:

condition(t,u,integrator) = u[2] - 0.5
affect!(integrator) = integrator.prob.p = 1

Then it will change betweeen the two ODE choices for du1 at that moment. Another way to do this is to make the ODE functions all be the same type via FunctionWrappers.jl, but that is unnecessary. With the way that modern processors work, there exists branch prediction and thus execution of a conditional is free if it's predictable which branch will be taken. In this case, almost every call to f takes the p==0 route until the callback, at which point it is almost always the else route. Therefore the processor will effectively get rid of the computational cost associated with this, so you're likely over-optimizing if you're going further (unless this change happens every step, but even then this is probably the cheapest part of the computation...).

Numerical Error

The solver doesn't obey physical law X (e.g. conservation of energy)

Yes, this is because the numerical solution of the ODE is not the exact solution. There are a few ways that you can handle this problem. One way is to get a more exact solution. Thus instead of

sol = solve(prob,alg)

use

sol = solve(prob,alg,abstol=1e-10,reltol=1e-10)

Of course, there's always a tradeoff between accuracy and efficiency, so play around to find out what's right for your problem.

Another thing you can do is use a callback. There are some premade callbacks in the callback library which handle these sorts of things like projecting to manifolds and preserving positivity.

The symplectic integrator doesn't conserve energy?

Yes, symplectic integrators do not exactly conserve energy. It is a common misconception that they do. What symplectic integrators actually do is solve for a trajectory which rests on a symplectic manifold that is perturbed from the true solution's manifold by the truncation error. This means that symplectic integrators do not experience (very much) long time drift, but their orbit is not exactly the same as the true solution in phase space and thus you will see differences in energy that tend to look periodic. There is a small drift which grows linearly and is related to floating point error, but this drift is much less than standard methods. This is why symplectic methods are recommended for long time integration.

For conserving energy, there are a few things you can do. First of all, the energy error is related to the integration error, so simply solving with higher accuracy will reduce the error. The results in the DiffEqBenchmarks show that using a DPRKN method with low tolerance can be a great choice. Another thing you can do is use the ManifoldProjection callback from the callback library.

How do I get to zero error?

You can't. For floating point numbers, you shouldn't use below abstol=1e-14 and reltol=1e-14. If you need lower than that, use arbitrary precision numbers like BigFloats or ArbFloats.jl.

Autodifferentiation and Dual Numbers

Are the native Julia solvers compatible with autodifferentiation?

Yes! However, you should first look into sensitivity analysis as a (possibly in the future) more efficient method for calculating derivatives of the solution and functionals of the solution.

But if you still want to do it the naive way, here's how you do it. If the algorithm does not use adaptive time stepping, then you simply need to make the initial condition have elements of Dual numbers. If the algorithm uses Dual numbers, you need to make sure that time is also given by Dual numbers. A quick explanation of this is because changing the value of the initial condition will change the error in the steps, thus causing different steps to be taken changing the time values.

To show this in action, let's say we want to find the Jacobian of solution of the Lotka-Volterra equation at t=10 with respect to the parameters.

function func(du,u,p,t)
  du[1] = p[1] * u[1] - p[2] * u[1]*u[2]
  du[2] = -3 * u[2] + u[1]*u[2]
end
function f(p)
  prob = ODEProblem(func,eltype(p).([1.0,1.0]),eltype(p).((0.0,10.0)),p)
  solve(prob,Tsit5(),save_everystep=false)[end]
end

This function takes in new parameters and spits out the solution at the end. We make the inital condition eltype(p).([1.0,1.0]) so that way it's typed to be Dual numbers whenever p is an array of Dual numbers, and we do the same for the timespan. Then we can take the Jacobian via ForwardDiff.jl:

using ForwardDiff
ForwardDiff.jacobian(f,[1.5,1.0])

2×2 Array{Float64,2}:
  2.20214   0.189782
 -6.2273   -0.700188

and compare it to Calculus.jl:

Calculus.jacobian(f,[1.5,1.0],:central)

2×2 Array{Float64,2}:
  2.20214   0.189782
 -6.2273   -0.700188

I get Dual number errors when I solve my ODE with Rosenbrock or SDIRK methods...?

This is because you're using a cache which is not compatible with autodifferentiaion via ForwardDiff.jl. For example, if we use the ODE function:

const tmp = zeros(4)
const A = rand(4,4)
function f(du,u,p,t)
  A_mul_B!(tmp,A,u)
  du .= tmp .+ u
end

Here we use a cached temporary array in order to avoid the allocations of matrix multiplication. When autodifferentiation occurs, the element type of u is Dual numbers, so A*u produces Dual numbers, so the error arises when it tries to write into tmp. There are two ways to avoid this. The first way, the easy way, is to just turn off autodifferentiation with the autodiff=false option in the solver. Every solver which uses autodifferentiation has this option. Thus we'd solve this with:

prob = ODEProblem(f,rand(4),(0.0,1.0))
sol = solve(prob,Rosenbrock23(autodiff=false))

and it will use a numerical differentiation fallback (DiffEqDiffTools.jl) to calculate Jacobians.

** Warning: Advanced **

The more difficult way is to create a Dual cache which dispatches for the cache choice based on the element type of u. This is done by the following:

using ForwardDiff
struct MyTag end
immutable DiffCache{T<:AbstractArray, S<:AbstractArray}
    du::T
    dual_du::S
end

function DiffCache{chunk_size}(T, size, ::Type{Val{chunk_size}})
    DiffCache(zeros(T, size...), zeros(ForwardDiff.Dual{nothing,T,chunk_size}, size...))
end

DiffCache(u::AbstractArray) = DiffCache(eltype(u),size(u),Val{ForwardDiff.pickchunksize(length(u))})

get_tmp{T<:ForwardDiff.Dual}(dc::DiffCache, ::Type{T}) = dc.dual_du
get_tmp(dc::DiffCache, T) = dc.du

Now we can get a cache that by dispatch either gives a cache array of Dual numbers or just floating point numbers:

const dual_cache = DiffCache(rand(4)) # Build the cache, this must match your IC
du = get_tmp(dual_cache,typeof(rand(4))) # Gives a Array{Float64}
dual_du = get_tmp(dual_cache,typeof(ForwardDiff.Dual(0.2,3.0))) # Gives Array{Dual}

Note that you have to make sure that your chunk size matches the choice in the ODE solver (by default, it uses ForwardDiff.pickchunksize(length(u)) as well, so you only need to change this if you explicitly set chunksize = ...). Now we can setup and solve our ODE using this cache:

function f(du,u,p,t)
  # Get du from cache
  tmp = get_tmp(dual_cache,eltype(u))
  # Fix tag
  _tmp = reinterpret(eltype(u),tmp)
  A_mul_B!(_tmp,A,u)
  du .= _tmp .+ u
end
prob = ODEProblem(f,rand(4),(0.0,1.0))
sol = solve(prob,Rosenbrock23())

Small explanation is in order. tmp = get_tmp(dual_cache,eltype(u)) makes tmp match u in terms of Dual or not, but it doesn't necessarily match the tag. So now we reinterpret our Dual array to put the right tag on there. Note that this simply changes type information and thus does not create any temporary arrays. Once we do that, our cached array is now typed correctly to hold the result of A_mul_B!.