Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Automatic Differentiation #2

Closed
dpsanders opened this issue Jun 1, 2017 · 19 comments
Closed

Use Automatic Differentiation #2

dpsanders opened this issue Jun 1, 2017 · 19 comments

Comments

@dpsanders
Copy link

For the moment, I would suggest to concentrate on using Automatic Differentiation, rather than hand-coded Jacobians or symbolic.

Do you have examples where you have checked that hand-coded Jacobians are actually much faster than ForwardDiff.jl?

cc @jrevels

@Datseris
Copy link
Member

Datseris commented Jun 1, 2017

I've set up this example for the lorentz system:

using ForwardDiff, BenchmarkTools

function eom!(du, u)
 du[1] = 10.0(u[2]-u[1])
 du[2] = u[1]*(28.0-u[3]) - u[2]
 du[3] = u[1]*u[2] - (8/3)*u[3]
end

function jacob!(J, u)
  i = one(eltype(J))
  J[1,1] = -10i; J[1,2] = 10i; J[1,3] = 0i
  J[2,1] = 28i - u[3]; J[2,2] = -i; J[2,3] = -u[1]
  J[3,1] = u[2]; J[3,2] = u[1]; J[3,3] = -8i/3
  return J
end# should give exponents [0.9056, 0, -14.5723]

jacob_fd! = (J,u) -> ForwardDiff.jacobian!(J, eom!, zeros(u), u)

println("Benchmarking hand-coded vs. ForwardDiff")
function bench(N)
  println("for N = $N")
  J = rand(3,3); u0 = rand(3)
  a = @benchmark for i in 1:$N
    jacob!($J, $u0)
  end
  b = @benchmark for i in 1:$N
    jacob_fd!($J, $u0)
  end
  println("Coded: $(median(a))")
  println("ForwardDiff: $(median(b))")
end

bench(100)
bench(10000)
bench(1000000)

Which prints:

Benchmarking hand-coded vs. ForwardDiff
for N = 100
Coded: TrialEstimate(576.026 ns)
ForwardDiff: TrialEstimate(529.606 μs)
for N = 10000
Coded: TrialEstimate(56.656 μs)
ForwardDiff: TrialEstimate(60.783 ms)
for N = 1000000
Coded: TrialEstimate(5.879 ms)
ForwardDiff: TrialEstimate(5.686 s)

Now, I am aware that the line: jacob_fd! = (J,u) -> ForwardDiff.jacobian!(J, eom!, zeros(u), u)
is problematic in the sense that it creates a zeros(u) everytime, but I've benchmarked with given a pre-allocated array in its place and it does not make much of a difference.

@jrevels
Copy link

jrevels commented Jun 1, 2017

If you're trying to get optimal performance, you probably want to preallocate the JacobianConfig as well.

@Datseris
Copy link
Member

Datseris commented Jun 1, 2017

@jrevels I appreciate your joining on the conversation. Replacing the line
jacob_fd! = (J,u) -> ForwardDiff.jacobian!(J, eom!, zeros(u), u)
with these two

jac_config = ForwardDiff.JacobianConfig(eom!, zeros(3), zeros(3))
jacob_fd! = (J,u) -> ForwardDiff.jacobian!(J, eom!, zeros(u), u, jac_config)

makes it go much faster (I wonder why??? I simply used the JacobianConfig that the jacobian! uses by default). The results now are:

Benchmarking hand-coded vs. ForwardDiff
for N = 100
Coded: TrialEstimate(563.454 ns)
ForwardDiff: TrialEstimate(14.779 μs)
for N = 10000
Coded: TrialEstimate(55.014 μs)
ForwardDiff: TrialEstimate(1.502 ms)
for N = 1000000
Coded: TrialEstimate(5.788 ms)
ForwardDiff: TrialEstimate(155.175 ms)

They are are still though orders of magnitude larger. For me this seems reasonable (I could not expect automatic derivatives to be as fast as plain old coded ones). But just to be sure, am I doing something wrong?

P.s. I have absolutely no clue what the Chunk thing is.

@dpsanders
Copy link
Author

OK, I stand corrected. Maybe try ReverseDiff.jl?

@Datseris
Copy link
Member

Datseris commented Jun 1, 2017

oooouf took me all this time to make it work! Using:

#reverse diff:
using ReverseDiff
const f_tape = ReverseDiff.JacobianTape(eom!, rand(3), rand(3))
const compiled_f_tape = ReverseDiff.compile(f_tape)
jacob_rd! = (J, u) -> ReverseDiff.jacobian!(J, compiled_f_tape, u)


println("Benchmarking hand-coded vs. ForwardDiff")
function bench(N)
  println("for N = $N")
  J = rand(3,3); u0 = rand(3)
  a = @benchmark for i in 1:$N
    jacob!($J, $u0)
  end
  b = @benchmark for i in 1:$N
    jacob_rd!($J, $u0)
  end
  println("Coded: $(median(a))")
  println("ForwardDiff: $(median(b))")
end

bench(10000)

Outputs:

Benchmarking hand-coded vs. ForwardDiff
for N = 10000
Coded: TrialEstimate(56.655 μs)
ForwardDiff: TrialEstimate(11.750 ms)

Which is not an improvement!
I am really glad for all the help and discussions by everybody! I have to go to bed now otherwise my supervisor won't be happy tomorrow when he will be waiting for me at our meeting :D

I will catch up to the discussion in about 12 hours from now!

@jrevels
Copy link

jrevels commented Jun 2, 2017

It's expected that ReverseDiff will be quite a bit worse for Jacobians of this dimensionality - ForwardDiff is algorithmically the better choice here.

I rewrote your benchmark script to

  • Force inlining on all the functions to make sure we're on a level playing field
  • Test out StaticArrays as well as Base Arrays
  • Produce more representative timings. Manually looping over the kernel inside of @benchmark isn't likely to give reasonable results if you're trying to measure per-call kernel performance. BenchmarkTools uses heuristics that determine the best number of kernel evaluations per sample for you, and can report a few estimators for many samples.

Here's the new script:

using ForwardDiff, BenchmarkTools, StaticArrays

# Using StaticArrays #
#--------------------#

@inline eom(u) = @SVector [10.0(u[2]-u[1]), u[1]*(28.0-u[3]) - u[2], u[1]*u[2] - (8/3)*u[3]]

@inline function manual_jacobian(u)
    i = one(eltype(u))
    return @SMatrix [       -10i    10i     0i;
                     (28i - u[3])    -i  -u[1];
                             u[2]  u[1]  -8i/3]
end

@inline forward_jacobian(u) = ForwardDiff.jacobian(eom, u)

u = SVector{3}(rand(3))

# sanity check
@assert manual_jacobian(u) == forward_jacobian(u)

println("---- eom(::SVector) benchmark results ----")
display(@benchmark eom($u))

println("---- manual_jacobian(::SVector) benchmark results ----")
display(@benchmark manual_jacobian($u))

println("---- forward_jacobian(::SVector) benchmark results ----")
display(@benchmark forward_jacobian($u))

# Using Base Arrays #
#-------------------#

@inline function eom!(du, u)
    du[1] = 10.0(u[2]-u[1])
    du[2] = u[1]*(28.0-u[3]) - u[2]
    du[3] = u[1]*u[2] - (8/3)*u[3]
end

@inline function manual_jacobian!(J, u)
    i = one(eltype(J))
    J[1,1] = -10i
    J[1,2] = 10i
    J[1,3] = 0i
    J[2,1] = 28i - u[3]
    J[2,2] = -i
    J[2,3] = -u[1]
    J[3,1] = u[2]
    J[3,2] = u[1]
    J[3,3] = -8i/3
    return J
end

@inline forward_jacobian!(J, du, u, cfg) = ForwardDiff.jacobian!(J, eom!, du, u, cfg)

u = rand(3)
du = zeros(3)
J = zeros(3, 3)
cfg = ForwardDiff.JacobianConfig(eom!, du, u)

# sanity check
@assert manual_jacobian!(J, u) == forward_jacobian!(J, du, u, cfg)

println("---- eom!(::Vector, ::Vector) benchmark results ----")
display(@benchmark eom!($du, $u))

println("---- manual_jacobian!(::Matrix, ::Vector) benchmark results ----")
display(@benchmark manual_jacobian!($J, $u))

println("---- forward_jacobian!(::Matrix, ::Vector, ::Vector, ::JacobianConfig) benchmark results ----")
display(@benchmark forward_jacobian!($J, $du, $u, $cfg))

...and here are the results on my machine (using ForwardDiff v0.5 and Julia v0.6):

---- eom(::SVector) benchmark results ----
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.943 ns (0.00% GC)
  median time:      2.952 ns (0.00% GC)
  mean time:        2.955 ns (0.00% GC)
  maximum time:     11.160 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
---- manual_jacobian(::SVector) benchmark results ----
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.904 ns (0.00% GC)
  median time:      3.917 ns (0.00% GC)
  mean time:        3.926 ns (0.00% GC)
  maximum time:     12.365 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
---- forward_jacobian(::SVector) benchmark results ----
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.551 ns (0.00% GC)
  median time:      4.561 ns (0.00% GC)
  mean time:        4.572 ns (0.00% GC)
  maximum time:     13.199 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
---- eom!(::Vector, ::Vector) benchmark results ----
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.234 ns (0.00% GC)
  median time:      4.242 ns (0.00% GC)
  mean time:        4.262 ns (0.00% GC)
  maximum time:     15.073 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
---- manual_jacobian!(::Matrix, ::Vector) benchmark results ----
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.849 ns (0.00% GC)
  median time:      5.860 ns (0.00% GC)
  mean time:        5.877 ns (0.00% GC)
  maximum time:     8.275 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
---- forward_jacobian!(::Matrix, ::Vector, ::Vector, ::JacobianConfig) benchmark results ----
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     90.940 ns (0.00% GC)
  median time:      91.572 ns (0.00% GC)
  mean time:        91.629 ns (0.00% GC)
  maximum time:     254.181 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     956

So we see there's only a small overhead for StaticArrays, and a bit larger overhead for Base Arrays.

@Datseris
Copy link
Member

Datseris commented Jun 2, 2017

@jrevels Thank you so much for all the help and input! This is beyond amazing.
Since you are here, I am going to take the oportunity to ask you two questions that will help me understand this stuff much more:

  1. How can it possibly be that using ForwardDiff with SVector is almost as fast as using a hand-coded one? I mean, how can an automatic differentiation be so much faster simply due to the use of the SVector (because on the use of the Base Arrays, it is slower and it is reasonable (for someone like me)).
  2. On the case of the usage of the StaticArrays, I've noticed that you did not define/preallocate the JacobianConfig structure. Why is that?

@jrevels
Copy link

jrevels commented Jun 2, 2017

  1. Stack-only, fully inlined operations can be heavily optimized by Julia/LLVM. ForwardDiff's dual numbers are totally stack-allocated, and can track multiple partial derivatives at once. If you're working in the low-dimensional regime of SArrays, this means you only have to call your target function once to get its full Jacobian, and the target function's execution can be optimized aggressively by the compiler.

  2. ForwardDiff's differentiation API has specialized methods for SArrays that try to avoid the heap for the reasons addressed in 1. The main point of JacobianConfig is to preallocate necessary work buffers on the heap, but since these heap-allocated work buffers aren't used for the special SArray methods, we don't need the JacobianConfig at all.

@Datseris
Copy link
Member

Datseris commented Jun 2, 2017

But how can it be that

---- eom(::SVector) benchmark results ----
BenchmarkTools.Trial:
  memory estimate:  0 bytes

??? An SVector instance is created on the call of eom(u), right? And it is also faster than the method eom!(Vector, Vector) which writes everything in place. I am so confused right now. I am feeling like I really don't know anything.

@jrevels
Copy link

jrevels commented Jun 2, 2017

Those metrics mean "Julia's GC is reporting 0 heap allocations were performed and 0 bytes of heap memory were used." It's worth doing some Googling to learn about the performance characteristics of stack allocation/access vs. heap allocation/access (in general, not just in Julia).

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Jun 2, 2017

For the moment, I would suggest to concentrate on using Automatic Differentiation, rather than hand-coded Jacobians or symbolic.
Do you have examples where you have checked that hand-coded Jacobians are actually much faster than ForwardDiff.jl?

The benchmarks @jrevels gives are about the same as what I've seen. In general, hand-coded Jacobians eek out a tadbit more performance, but ForwardDiff on these kinds of systems does really well if the Jacobian is quite dense. The difference grows if you get to something like 15 ODEs with sparse interactions, but even then it's not that big. Allowing the user to give a Jocobian is nice but only actually necessary when ForwardDiff doesn't apply to the equation, which shouldn't happen this case (all of the dynamical systems here are "just math", not using any interop).

Symbolic then has about the same speedup in this case as analytical, since you're solving for a symbolic Jacobian once and then re-using it every iteration. So it's really not that big of a deal if you're only getting the Jacobian. However, the reason to use ParameterizedFunctions.jl is because it gives a lot more. The stiff diffeq "Master Equation" is solving the linear system (I - gamma*J)x = b repeatedly, i.e. inverting the Rosenbrock-W I-gamma*J where J is the Jacobian. So you get huge speedups using symbolic differentiation with a stiff solver if what you're getting symbolically is inv(I - gamma*J) and using that directly instead of solving a linear system every iteration (which is what ParameterizedFunctions.jl does). But if the dynamical systems you're looking at are nonstiff and you're only using the Jacobian for stability analysis, you don't get this extra speedup and so the difference isn't major.

The downside to symbolic is that you need the expression that the user defines the expression with, or some other DSL. That's why ParameterizedFunctions.jl has to be used through the macro @ode_def to do symbolic calculations. But if that's done, there's a lot that's gained when the equation is stiff. Even if it is stiff, you might be solving a chaotic system with an RK method with small timesteps to get extremely low error anyways (the problems get error-bound instead of stiffness bound, even explicit on stiff equations in this case). If that's what's happening, then the advantages of symbolic are very diminished.

@dpsanders
Copy link
Author

Thanks @jrevels, very nice results!

@Datseris
Copy link
Member

Datseris commented Jun 2, 2017

Yes they were quite good! However, I have spend the last 2-3 hours re-implementing these benchmarks in a more "realistic setting" and there are a lot of things changing. Because my benchmark codes are around 300 lines, I will link them instead. I have also changed the fact that half the elements of the Jacobian are simple numbers without any operation happening.

Basically what I did was benchmark 5 different ways one could set up the equations of motion of a system:

println("\nVersion 1: StaticArray returned: eom_1(u) -> un::SVector")
println("\nVersion 2: Base Array returned eom_2(x) -> xn::Vector ")
println("\nVersion 3: MVector returned eom_3(x) -> xn::MVector ")
println("\nVersion 4: In place with 2 arguments: eom_towel!(xn, x) (changes xn)")
println("\nVersion 5: In place with 2 arguments & using MVector")

I used these e.o.m. to benchmark 2 different things: First, evolving the system AND updating the output. This is different than simply calling eom(x)! E.g.:

mutable struct system_1
  u::SVector{3, <:Real}
end
evolve1(s::system_1) = (un = eom_1(s.u); s.u = un)

This was a crucial thing missing in @jrevels benchmarks. Of course this makes the use of SVector not as favourable: In summary the e.o.m. benchmarks found here show that the e.o.m. that use SVector are not favoured (Since one also has to update a field of a dynamical system). it is much better to use an in-place equation of motion, with MVector, and have the system also have the same field type. Results:

comparison of evolve call:
v2 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -38.01% => improvement (5.00% tolerance)
  memory: -57.75% => improvement (1.00% tolerance)
v3 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -88.58% => improvement (5.00% tolerance)
  memory: -73.94% => improvement (1.00% tolerance)
v4 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -87.33% => improvement (5.00% tolerance)
  memory: -71.83% => improvement (1.00% tolerance)
v5 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -84.89% => improvement (5.00% tolerance)
  memory: -71.83% => improvement (1.00% tolerance)

As far as the Jacobian calculation is concerned, the results are here. I used the JacobianConfig construct for versions 2 to 4, but not for version 1 (which is with StaticVector).

In summary, the method with the StaticVectors is the fastest. However, it has to be said that the method that has the e.o.m. in place is very close, within 20-40% of the speed.

It is also worth noting that (for the Jacobian case) the in-place e.o.m. that uses simple Vectors is actually faster than the one that uses MVectors. Results:

comparison of in-place jacobian call:
v2 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +6951.54% => regression (5.00% tolerance)
  memory: +950.00% => regression (1.00% tolerance)
v3 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +38.53% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
v4 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +35.02% => regression (5.00% tolerance)
  memory: -100.00% => improvement (1.00% tolerance)
v5 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +36.49% => regression (5.00% tolerance)
  memory: -50.00% => improvement (1.00% tolerance)

I think @dpsanders and @jrevels might want to check out the actual benchmarking code. Since as you already know very well, I am not a Benchmarking expert, nor StaticArrays expert, nor Julia expert! :D

@jrevels
Copy link

jrevels commented Jun 3, 2017

I didn't read too closely, but there seem to be a couple of issues with your code.

I'm assuming your examples are representative of the actual dimensionality you care about. For dimensions this small, there's no reason not to use StaticArrays almost entirely, and you should avoid manipulating heap-allocated mutable objects. Instead, you should be operating on immutable values - constructing new objects on the stack (even incrementally) can be often be faster than mutating heap-allocated objects when the dimensions involved are this low. Besides being more performant, it'll also make the code cleaner.

If for some reason you had to have a mutable system object (as opposed to an immutable one), and you want a field's load/stores to be performant, you should fully expose that field's type. For example, instead of this:

# Operations involving `u` will be slow because Julia can't specialize 
# on the element type of `u` when compiling functions for `System`
mutable struct System
    u::SVector{3,<:Real}
end

...you should write this:

mutable struct System{T<:Real}
    u::SVector{3,T}
end

@Datseris
Copy link
Member

Datseris commented Jun 3, 2017

Something crazy happened for me: Using your suggestion and making the system type a parametric type, changes the benchmarks significantly! Besides everything being much faster, the comparison between different benchmarks also change!
Now, the evolve call benchmarks become:
(evolve is simply a function that calls eom and updates the systems system.u)

comparison of evolve call:
v2 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +105.88% => regression (5.00% tolerance)
  memory: +250.00% => regression (1.00% tolerance)
v3 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -10.62% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
v4 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +51.80% => regression (5.00% tolerance)
  memory: +250.00% => regression (1.00% tolerance)
v5 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -59.71% => improvement (5.00% tolerance)
  memory: -100.00% => improvement (1.00% tolerance)

@jrevels The results state that using mutable staticarrays and operating them in-place is the best approach. Combining it with the fact that the Jacobian calculation for mutable statics and staticarrays is not very far of in the sense of performance, I think using mutable static arrays may be the best approach. Would you agree, or you would still suggest to use exclusively static arrays, even with these new benchmarks? There is no reason to specifically have the system's field to be a mutable type, as long as it is efficient to update it (since this will happen all the time). For my package the systems will have small dimensionality, I would guess all usage will be limited in $D&lt;30$.

For dimensions this small, there's no reason not to use StaticArrays almost entirely,

Well the simple reason to not use StaticArrays is because evolve is simply much faster for Mutable staticarrays.

P.S.: EDIT: Initially I could not understand why there should be a difference between the two ways to write the struct, but now I understand it! I guess the first way I wrote it was simply noobish!

@jrevels
Copy link

jrevels commented Jun 3, 2017

making the system type a parametric type, changes the benchmarks significantly!

Yup, giving the compiler "full" type information for performance hotspots can make a big difference in Julia!

Would you agree, or you would still suggest to use exclusively static arrays, even with these new benchmarks? There is no reason to specifically have the system's field to be a mutable type, as long as it is efficient to update it (since this will happen all the time).
...
Well the simple reason to not use StaticArrays is because evolve is simply much faster for Mutable staticarrays.

Your benchmarks aren't testing the design I'm suggesting, which is to make the system type itself immutable. For example, evolve(s::SystemState) would return a new SystemState instance, where the new instance contains the appropriately "incremented" fields. For example:

using StaticArrays, BenchmarkTools

# I'm adding the `i` field just for the sake of example, to
# motivate even defining a system type at all. Assuming 
# your benchmarks are representative of your actual use
# case, defining a new type here would only be useful if 
# you have multiple fields to keep track of. Otherwise, it 
# would be cleaner to just write all of your operations on 
# generic arrays.
struct SystemState{T<:Real}
    u::SVector{3,T}
    i::Int
end

@inline SystemState(u::SVector) = SystemState(u, 1)

@inline evolve(s::SystemState) = SystemState(eom(s.u), s.i + 1)

@inline function eom(x)
    x1, x2, x3 = x[1], x[2], x[3]
    return SVector(3.8*x1*(1-x1)-0.05*(x2+0.35)*(1-2*x3),
                   0.1*((x2+0.35)*(1-2*x3)-1 )*(1-1.9*x1),
                   3.78*x3*(1-x3)+0.2*x2)
end

@benchmark evolve(s) setup=(s = SystemState(SVector{3}(rand(3))))

On my machine, that last line produces:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.165 ns (0.00% GC)
  median time:      6.174 ns (0.00% GC)
  mean time:        6.182 ns (0.00% GC)
  maximum time:     14.547 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Compare that to the output of @benchmark evolve1($s1):

BenchmarkTools.Trial:
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     28.358 ns (0.00% GC)
  median time:      28.886 ns (0.00% GC)
  mean time:        31.701 ns (7.57% GC)
  maximum time:     2.038 μs (92.91% GC)
  --------------
  samples:          10000
  evals/sample:     995

@Datseris
Copy link
Member

Datseris commented Jun 3, 2017

Hey @jrevels thanks so much for all the input. (Seriously I appreciate it a lot. I would've never been able to explore all these possibilities just by myself).

Now understand your approach! However, I need to define my system in a way that contains all the information of the system, in the scientific sense. That is, the current state and the equations of motion. Therefore, my type has to be something like:

mutable struct system_1{T<:Real}
  u::SVector{3, T}
  eom::Function
end

or

mutable struct system_4{T<:Real}
  u::MVector{3, T}
  eom::Function
end

It is really important to have the equations of motion inside the system type and I am really sorry I was not clear about how I wanted to define the system

Having the eom in the type is good not only because this is what makes sense (in the sense of what a dynamical system is) but also because it makes the User Interface beautiful. I consider the last part really important. For example, the user could then do:

λ = lyapunov(s::DiscreteDynamicalSystem)
or
λ = lyapunov(s::ContinuousDynamicalSystem)

and it would work without wonky calls that need to pass 2,3 or even 4 unnecessary arguments to the function.

I have written this benchmark file that creates the system as I would need them in the real setting. (I am not pasting it since it is ~150 lines). Version 3 is your suggestion.
Here are the system types:

println("\nVersion 1: e.o.m. return static array: eom_1(u) -> un::SVector")
println("and system.u is also an SVector")

println("\nVersion 2: e.o.m. are in-place with Base Arrays")
println("and system.u is also a Base Array")

println("\nVersion 3: system is an immutable struct with SVector")

println("\nVersion 4: e.o.m. are in-place with Mutable StaticArrays")
println("and system.u is also a MutableStaticArray")

(Notice: all of the systems have a field .eom that has the equations of motion). The results of the benchmark comparisons are (EDITED):

-----Comparison of evolve call:-----
v2 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +279.30% => regression (5.00% tolerance)
  memory: +Inf% => regression (1.00% tolerance)
v3 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -55.17% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
v4 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -20.69% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-----Comparison of in-place jacobian call:-----
v2 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +94.45% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
v3 versus v1
BenchmarkTools.TrialJudgement: 
  time:   -5.55% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
v4 versus v1
BenchmarkTools.TrialJudgement: 
  time:   +134.33% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
Conclusions:
Minimum time of evolve call for method v3
Minimum time of jacob call for method v3

Sooo I guess you are right? That makes me extremely happy!!!!!!! Because I can finally forget all that stuff since the decision is clear and start coding the actual implementation!

@Datseris
Copy link
Member

Datseris commented Jun 3, 2017

Oh damn. This is killing my brain... I now think that if I use that approach, there is no reason to have the system state as part of the system... Since it won't be updated in place, what is the reason for that?

@Datseris
Copy link
Member

Thanks everybody for the really helpful discussions! Now everything in the package uses automatic derivatives from ForwardDiff if a user one is not supplied, even the continuous systems!

I want to especially thank @jrevels for helping me so much with the benchmarks and opening my eyes to how fast StaticArrays truly are!

To my astonishment, even the integration of the DifferentialEquations package is faster using a e.o.m. that returns a static array!

Datseris pushed a commit that referenced this issue Oct 9, 2017
Update to the latest version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants