Updated `@muladd` and broadcasts #104

devmotion · 2017-08-04T16:12:15Z

This PR improves the application of @muladd and broadcasts to all algorithms and the calculation of initial time steps. As a test I additionally reverted methods with 12+ broadcasts fusing to the so-called "mixed" implementation - it was only necessary for a few methods like Feagin, Vern, BS5, and Tsit5, and also works with StaticArrays. The preferred broadcast version is added as a comment. Moreover, I tried to clean the code a bit by removing unneeded unpacking or packing of parameters and e.g. merging duplicate initializations of symplectic caches.

Locally all tests passed.

ChrisRackauckas · 2017-08-04T16:17:31Z

Spring cleaning haha. Thanks for tackling this. This is looks really nice.

if typeof(u) <: AbstractArray

Instead of using that in the mixed implementation, can you instead make it send StaticArrays to the other?

if typeof(u) <: AbstractArray && !(typeof(u) <: SArray)

The reason is similar(u) is an MVector which is actually will allocate to the heap and not the stack. If everything is fully optimized, then this will slow down static arrays since they normally will allocate to the stack and thus be "free".

ChrisRackauckas · 2017-08-04T17:22:56Z

so-called "mixed" implementation - it was only necessary for a few methods like Feagin, Vern, BS5, and Tsit5

Only necessary there because those were the only ones which hit the broadcast limit?

devmotion · 2017-08-04T17:26:56Z

I changed it according to your suggestion (with some minor additional changes such that residuals can be calculated for StaticArrays). Yes, only those hit the limit.

ChrisRackauckas · 2017-08-04T17:30:59Z

src/integrators/exponential_rk_integrators.jl

  rtmp = integrator.fsalfirst
  A = f[1]
-  u = expm(dt*A)*(@. uprev + dt*rtmp)
+  @muladd u = expm(dt*A)*(@. uprev + dt*rtmp)


does @muladd work with matrix multiplication?

@muladd does not change or interfere with matrix multiplications. This generates the following expression:

julia> @macroexpand( @muladd u = expm(dt*A)*(@. uprev + dt*rtmp) ) :(u = expm(dt * A) * (muladd).(dt, rtmp, uprev))

So this should work as expected, shouldn't it? By the way, because of matrix multiplications I thought it would be better to only apply muladd as a dot call if both operators are dotted; i.e. expressions like @muladd A*x .+ b stay untouched whereas @muladd a.*b.+c leads to (muladd).(a,b,c).

Oh yes, now I remember you made it handle that. Cool. Just making sure it would stay correct 👍

ChrisRackauckas · 2017-08-04T17:37:16Z

src/integrators/iif_integrators.jl

@@ -42,7 +42,7 @@ end
  if typeof(cache) <: IIF1ConstantCache
    tmp = expm(A*dt)*(uprev)
  elseif typeof(cache) <: IIF2ConstantCache
-    tmp = expm(A*dt)*(@. uprev + 0.5dt*uhold[1]) # This uhold only works for non-adaptive
+    @muladd tmp = expm(A*dt)*(@. uprev + 0.5dt*uhold[1]) # This uhold only works for non-adaptive


another @muladd on matrix exponential

ChrisRackauckas · 2017-08-04T17:44:13Z

src/integrators/sdirk_integrators.jl

@@ -224,15 +223,15 @@ end
    W = I - dto2*J
  else
    J = ForwardDiff.derivative(uf,uprev)
-    W = 1 - dto2*J
+    W = @. 1 - dto2*J


This shouldn't @. because it's Number only. If it's not a Number and instead is an AbstractArray, this would actually be incorrect since 1 would have to be the identity matrix. But that's the first case.

Sure, I'll change it 👍

Just to be sure: does not the same problem occur in lines https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/rosenbrock_integrators.jl#L246-L252? Shouldn't we use the identity matrix instead of 1 in line 252 if J is an AbstractArray?

Julia has been throwing a warning when using I like a number there since v0.5. It sucks so this is just a workaround to have it not throw the warning. The out-of-place Rosenbrock methods should be fixed to have the I route.

So I'll change it there as well such that W is already calculated inside the if-else expression - once with identity matrix I and once just as a number?

That sounds good.

ChrisRackauckas · 2017-08-04T17:45:02Z

src/integrators/sdirk_integrators.jl

@@ -459,7 +456,7 @@ end
    W = I - d*dt*J
  else
    J = ForwardDiff.derivative(uf,uprev)
-    W = 1 - d*dt*J
+    W = @. 1 - d*dt*J


Same as above.

ChrisRackauckas · 2017-08-04T17:48:40Z

Alright cool. Looks great! How do the benchmarks look?

Can we list out all of the methods which are not fully broadcasting? I think it's all of the RK methods with order >4 and that's it, right? That includes Nystrom5. Might want to document that in an issue on broadcasting, since that means that arrays with broadcasting but no indices (I'm look at you, ArrayPartition of GPUArrays) cannot be used there, and it'll be something we will want to fix in the future when Julia gets fixed.

devmotion · 2017-08-04T18:04:56Z

Which benchmarks? What exactly do you want to compare? I did some benchmarks of the residual calculation, BS3, and BS5, which I put up here. For the residuals and BS3 the results were in favour of the broadcast implementation in higher dimensional spaces and gave almost similar results as the current implementation in one-dimensional spaces; however, as we hit the broadcast limit in BS5 the broadcast implementation leads to a huge regression in that case. However, the mixed implementation clearly outperformed even the current implementation for BS5. Hence I assumed, the benchmarks would show similar results for other methods I haven't tested so far - as long as we don't hit the broadcast limit performance is fine and otherwise we get a huge regression which can be circumvented by the mixed implementation.

ChrisRackauckas · 2017-08-04T18:07:46Z

Could you quickly run something like the first few plots of https://github.com/JuliaDiffEq/DiffEqBenchmarks.jl/blob/master/StiffODE/ROBER.ipynb to make sure that TRBDF2 and Rosenbrock methods don't slow down? That'll make me finally believe in broadcast.

Then other than the two @. 1 - d*dt*J changes, this is ready to merge when tests pass.

ChrisRackauckas · 2017-08-04T18:22:24Z

Is there a reason why these weren't broadcasted?

https://github.com/devmotion/OrdinaryDiffEq.jl/blob/f4521b0ef5a937d0dac18d366fd1615f1466719b/src/integrators/sdirk_integrators.jl#L124

devmotion · 2017-08-04T20:32:15Z

Just broadcasting leads to the warning

WARNING: J::UniformScaling - x::Number is deprecated, use J.λ - x instead.

and, even worse, sometimes to incorrect calculations:

julia> @. I - 3*[0.5  0.25; 1.5  2.5]
WARNING: J::UniformScaling - x::Number is deprecated, use J.λ - x instead.
Stacktrace:
 [1] depwarn(::String, ::Symbol) at ./deprecated.jl:70
 [2] -(::UniformScaling{Int64}, ::Float64) at ./deprecated.jl:57
 [3] macro expansion at ./broadcast.jl:153 [inlined]
 [4] macro expansion at ./simdloop.jl:73 [inlined]
 [5] macro expansion at ./broadcast.jl:147 [inlined]
 [6] _broadcast!(::##7#8, ::Array{Float64,2}, ::Tuple{Tuple{},Tuple{Bool,Bool}}, ::Tuple{Tuple{},Tuple{Int64,Int64}}, ::UniformScaling{Int64}, ::Tuple{Array{Float64,2}}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{2}}) at ./broadcast.jl:139
 [7] broadcast_t(::Function, ::Type{T} where T, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{2}}, ::UniformScaling{Int64}, ::Array{Float64,2}) at ./broadcast.jl:268
 [8] broadcast_c at ./broadcast.jl:314 [inlined]
 [9] broadcast(::Function, ::UniformScaling{Int64}, ::Array{Float64,2}) at ./broadcast.jl:434
 [10] eval(::Module, ::Any) at ./boot.jl:235
 [11] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:66
 [12] macro expansion at ./REPL.jl:97 [inlined]
 [13] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
while loading no file, in expression starting on line 0
2×2 Array{Float64,2}:
 -0.5   0.25
 -3.5  -6.5

Thus also convergence tests fail after changing these lines to broadcasts. I'm not sure what's the best way to deal with this - my idea would be something like

if typeof(mass_matrix) <: UniformScaling
    W .= mass_matrix - dt*J
else
    @. W = mass_matrix - dt*J
end

ChrisRackauckas · 2017-08-04T20:33:44Z

Open a separate issue for that and we'll do that in another PR.

devmotion · 2017-08-04T21:09:14Z

OK, I'll open a separate PR for that issue. I redid the plots of https://github.com/JuliaDiffEq/DiffEqBenchmarks.jl/blob/master/StiffODE/ROBER.ipynb and got:

ChrisRackauckas · 2017-08-05T02:45:54Z

src/initdt.jl

-  T0 = typeof(d₀)
-  T1 = typeof(d₁)
-  if d₀ < T0(1//10^(5)) || d₁ < T1(1//10^(5))
+  if d₀ < 1//10^(5) || d₁ < 1//10^(5)


does this obey units? Did you run units_tests.jl?

Yes, I did run all long tests as well, including units_tests.jl. They all passed.

tmp is unitless, and hence both d₀ and d₁ are unitless as well.

ChrisRackauckas · 2017-08-05T02:46:44Z

src/initdt.jl

  sk = @. abstol+abs(u0)*reltol
-  d₀ = internalnorm(u0./sk)
+  d₀ = internalnorm(@. u0/sk)


same with the units here.

Same here. u0 and sk have both the same unit, hence u0./sk and also d₀ are unitless.

devmotion · 2017-08-05T21:55:33Z

It seems the initdt fix introduces a test error in data_array_test.jl...

ChrisRackauckas · 2017-08-05T22:09:38Z

Does broadcasting a DataArray return a DataArray? I think in the long run we should do something like

https://github.com/JuliaDiffEq/MultiScaleArrays.jl/blob/master/src/math.jl#L9

but we cannot expect most user defined types to broadcast correctly. I would revert that part and change back to pre-allocating and using the inplace update because that's type-safe.

ChrisRackauckas · 2017-08-07T09:51:13Z

I reverted to use similar again in initdt and then merged. All of the performance regressions seemed to be weeded out, but the tests take noticeably longer. This may be in large part due to compilation time increases, which would happen when inlining is overly aggressive and there's a lot of broadcasting. That makes #120 important.

devmotion added 22 commits August 4, 2017 11:01

Clean up initdt and synchronize in-place and not in-place method

958adfe

Update explicit RK integrators

d1fdeb5

Revert broadcasts in not in-place Feagin methods

f767f40

Clean fixed time step integrators and update to broadcasts

674eebd

Apply to generic implicit integrators

ebf33e0

Revert broadcasts for not in-place high order RK methods

2023aac

Fix missing dotted assignment

b6132c3

Fix unpack and add missing inbounds

4c6ffff

Apply muladd and broadcasts to implicit RK integrators

e91dea4

Update low order RK integrators

c498069

Apply muladd and broadcasts to RK Nystrom methods

23b46eb

Fix typo in RK Nystrom methods

54eb543

Apply muladd and broadcasts to Rosenbrock methods

cd94389

Apply muladd and broadcasts to split methods

1c556e6

Apply muladd and broadcasts to SSPRK methods

b37434b

Apply muladd and broadcasts to symplectic methods

1868ab5

Apply muladd and broadcasts to threaded methods

59a62b8

Apply muladd and broadcasts to Verner RK methods

ee4d75e

Remove unneeded unpack and pack of parameters

8919b4c

Fix calculation of initial dt for units

d12e3fb

Merge branch 'master' into muladd_integrators

6e69ed5

Set lower bound on MuladdMacro version

4bc2bf3

Temporary fix for StaticArrays

f4521b0

ChrisRackauckas reviewed Aug 4, 2017

View reviewed changes

Remove unnecessary broadcasts

cc9e3c8

Replace abs by internalnorm

781f835

devmotion mentioned this pull request Aug 4, 2017

Not fully broadcasting methods #106

Closed

39 tasks

ChrisRackauckas reviewed Aug 5, 2017

View reviewed changes

ChrisRackauckas mentioned this pull request Aug 5, 2017

Regression tracking #107

Closed

devmotion added 2 commits August 5, 2017 20:11

Clean and optimize initdt

d91da02

Fix Rodas5

d5e5da8

devmotion mentioned this pull request Aug 5, 2017

Broadcasting a DataArray SciML/DiffEqBase.jl#63

Closed

revert initdt allocating style

5fd16ad

ChrisRackauckas merged commit 5fd16ad into SciML:master Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated `@muladd` and broadcasts #104

Updated `@muladd` and broadcasts #104

devmotion commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

ChrisRackauckas Aug 4, 2017

devmotion Aug 4, 2017

ChrisRackauckas Aug 4, 2017

ChrisRackauckas Aug 4, 2017

ChrisRackauckas Aug 4, 2017

devmotion Aug 4, 2017

devmotion Aug 4, 2017

ChrisRackauckas Aug 4, 2017

devmotion Aug 4, 2017

ChrisRackauckas Aug 4, 2017

ChrisRackauckas Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

ChrisRackauckas Aug 5, 2017

devmotion Aug 5, 2017

ChrisRackauckas Aug 5, 2017

devmotion Aug 5, 2017

devmotion commented Aug 5, 2017

ChrisRackauckas commented Aug 5, 2017

ChrisRackauckas commented Aug 7, 2017

Updated @muladd and broadcasts #104

Updated @muladd and broadcasts #104

Conversation

devmotion commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

ChrisRackauckas commented Aug 4, 2017

devmotion commented Aug 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devmotion commented Aug 5, 2017

ChrisRackauckas commented Aug 5, 2017

ChrisRackauckas commented Aug 7, 2017

Updated `@muladd` and broadcasts #104

Updated `@muladd` and broadcasts #104