# Multithreaded Runge-Kutta Methods

## The Goal

To achieve its mission as an efficient package for HPC and research purposes, DifferentialEquations.jl aims to provide multithreaded versions of the most common algorithms. In this notebook we will look at some results for the `:DP5` solver. As shown in the other benchmarks, DifferentialEquations.jl's `:DP5` is already more efficient than the classic Hairer `:dopri5`, and vastly outperforms ODE.jl's `:ode45`. This multithreading is meant to increase the performance gap even further for larger problems.

## The Problem

For a simple test problem, we will start by taking the linear ODE on 200x200 matrices:

In [26]:
using DifferentialEquations

# 2D Linear ODE
f = (t,u,du) -> begin
  Threads.@threads for i in 1:length(u)
    du[i] = 1.01*u[i]
  end
end
analytic = (t,u₀) -> u₀*exp(1.01*t)

tspan = [0,10];

For reference we will start by using 4 threads on a 2x Intel Xeon E5-2667 V3 3.2GHz Eight Core 20MB 135W

In [2]:
Threads.nthreads()

16

We will test against the various Dormand-Prince 4/5 solvers from the wild.

## Effect of Problem Size

The multithreading makes more of a difference at medium problem sizes. This is because for large problems, less of the function time is in the calculation of `f`, and thus the speed of the method's calculations makes more of a difference. But for small problems, the overhead of parallelism doesn't beat out the cost. These results show that multi-threading within the method begin to give reliable gains at problem sizes of about 50x50, doing really well in the 100x100 to 200x200 range, before trailing off. 

These numbers are likely shifted downwards for less threads, but also have less of an effect. The effect size is probably larger for higher order methods since there will be more "method calculations" per step.

### 15x15

In [25]:
prob = ODEProblem(f,rand(15,15),analytic=analytic)
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names,numruns=1000)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DifferentialEquations
Efficiencies: [20286.1,9767.21,619.558,1175.16]
EffRatios: [1.0,2.07696,32.7429,17.2624]
Times: [0.0027758,0.00576523,0.00329817,0.00143547]
Errors: [0.0177588,0.0177588,0.489379,0.592803]



### 20x20

In [24]:
prob = ODEProblem(f,rand(20,20),analytic=analytic)
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names,numruns=500)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DEThreaded
Efficiencies: [16298.1,16847.4,474.522,1268.18]
EffRatios: [1.0337,1.0,35.504,13.2847]
Times: [0.00420233,0.00406532,0.00418012,0.00129387]
Errors: [0.0146006,0.0146006,0.504144,0.609436]



### 25x25

In [23]:
prob = ODEProblem(f,rand(25,25),analytic=analytic)
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names,numruns=500)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DifferentialEquations
Efficiencies: [22769.0,18941.7,363.859,1054.65]
EffRatios: [1.0,1.20205,62.5763,21.5892]
Times: [0.00376508,0.00452582,0.00564267,0.00160806]
Errors: [0.011665,0.011665,0.487059,0.589645]



### 50x50

In [20]:
prob = ODEProblem(f,rand(50,50),analytic=analytic)
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names,numruns=100)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DEThreaded
Efficiencies: [10934.3,11745.7,90.8588,386.966]
EffRatios: [1.07421,1.0,129.275,30.3534]
Times: [0.0133777,0.0124535,0.0220197,0.00426488]
Errors: [0.0068364,0.0068364,0.49983,0.605927]



### 75x75

In [19]:
prob = ODEProblem(f,rand(75,75),analytic=analytic)
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names,numruns=100)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DEThreaded
Efficiencies: [3378.56,4098.89,29.7724,182.401]
EffRatios: [1.21321,1.0,137.674,22.4719]
Times: [0.0611684,0.0504188,0.0667198,0.00897493]
Errors: [0.00483884,0.00483884,0.503421,0.610861]



### 100x100

In [6]:
setups = [Dict(:alg=>:DP5)
          Dict(:alg=>:DP5Threaded)
          Dict(:abstol=>1e-3,:reltol=>1e-6,:alg=>:ode45) # Fix ODE to be normal
          Dict(:alg=>:dopri5)]
names = ["DifferentialEquations";"DEThreaded";"ODE";"ODEInterface"]
prob = ODEProblem(f,rand(100,100),analytic=analytic)
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names,numruns=100)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DEThreaded
Efficiencies: [3071.59,4032.21,20.0129,121.29]
EffRatios: [1.31274,1.0,201.481,33.2444]
Times: [0.0878561,0.0669256,0.100509,0.0136708]
Errors: [0.00370565,0.00370565,0.497149,0.603091]



### 200x200

In [27]:
setups = [Dict(:alg=>:DP5)
          Dict(:alg=>:DP5Threaded)
          Dict(:abstol=>1e-3,:reltol=>1e-6,:alg=>:ode45) # Fix ODE to be normal
          Dict(:alg=>:dopri5)]
prob = ODEProblem(f,rand(200,200),analytic=analytic)
names = ["DifferentialEquations";"DEThreaded";"ODE";"ODEInterface"]
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DEThreaded
Efficiencies: [889.369,1040.5,2.93414,15.4634]
EffRatios: [1.16993,1.0,354.618,67.288]
Times: [0.557148,0.476223,0.679618,0.106341]
Errors: [0.00201812,0.00201812,0.501481,0.608127]



### 300x300

In [7]:
prob = ODEProblem(f,rand(300,300),analytic=analytic)
shoot = ode_shootout(prob,tspan,setups;Δt=1/2^(10),names=names)
println(shoot)
plot(shoot)

Names: AbstractString["DifferentialEquations","DEThreaded","ODE","ODEInterface"], Winner: DEThreaded
Efficiencies: [548.843,588.264,1.72093,7.56518]
EffRatios: [1.07183,1.0,341.828,77.7595]
Times: [1.29587,1.20903,1.16263,0.218089]
Errors: [0.00140601,0.00140601,0.499796,0.606104]



## Conclusion

This is only a very early form of the within-method multithreaded versions, and it already shows promising results for large problems. There are still some major problems which Julia's threading which is not letting it get maximum performance. Hopefully these issues will get worked out soon.